pre-9.5 PostgreSQL upsert
With the recent PostgreSQL 9.5 release, one of the exciting new features is “upsert” support, which is where you want to either update an existing row or insert a new one if it doesn’t already exist. If you’re still on an old version of PostgreSQL, however, you here’s how I’ve been getting that accomplished. This example uses the Perl DBI mechanism, but I think it’s readable enough that it’d be easy to convert to anything else with parametrized query preparation.
I have a manager_map table which maps a person’s assigned alias to a manager. If the person is new, I want to create a new entry. If the person is not new, though, I just want to update their manager to match the new value. Note that This query takes two parameters (manager’s alias is $1, subordinate’s alias is $2). Using the pg_placeholder_dollaronly parameter in the prepare allows me to put the parameters wherever in the query without worrying about the order they appear in the query string, and allows them to be duplicated in the query without having to specify them 2-3 times in the execute line later. This isn’t present in all DBI modules or even remotely portable, but you don’t need to do this on all RDBMS types, anyway. :)
$q_manager_set = $secdbh->prepare(q{
WITH upsert AS (
UPDATE public.manager_map
SET manager = $1
WHERE alias = $2
RETURNING *
)
INSERT INTO public.manager_map
(manager, alias)
SELECT $1, $2
WHERE NOT EXISTS (
SELECT 1
FROM upsert
WHERE alias = $2
)
}, {pg_placeholder_dollaronly => 1} );
The update query can be set to return the rows which were affected by the update. If no rows were updated, no rows will be returned. So, we take advantage of that by saving the results using a with clause (it could be done in the insert directly, but this is easier to read, IMHO). Then I construct a new temporary table consisting of the values I want to insert – which is just one set (one row) in this case. I subtract all of the rows where the primary key (alias, in this example) is in the list of rows affected by the update. Anything left gets inserted.
This works really nicely, because the update doesn’t complain if no rows are affected, and the insert doesn’t complain if no rows need to be inserted. It also works out well because it tries the update first, then the insert. The only race condition here is in the situation where rows might be deleted between update and insert. In my situation, this database is never deleted from. If yours is, though, all you have to do is lock the relevant table around this query, which is pretty straightforward.
To put this into use, I have a data structure defined earlier on which is an associative array where the key is the supervisor and the value is an array of direct reports. I also have autocommit turned off, as I want to make all of my updates in one transaction (for reasons beyond the scope of this post). So, the rest of the code (aside from populating the data structure from the input source, defining variables, etc) looks like this:
foreach $supervisor ( keys(%supervisors) ){
foreach $alias ( @{ $supervisors{$supervisor} } ){
$q_manager_set->execute( $supervisor, $alias )
or print qq{manager_map: Unable to add mapping '$alias:$supervisor': }
. $q_manager_set->errstr . qq{\n};
}
}
$secdbh->commit unless( $secdbh->{AutoCommit} );