With the recent PostgreSQL 9.5 release, one of the exciting new features is “upsert” support, which is where you want to either update an existing row or insert a new one if it doesn’t already exist. If you’re still on an old version of PostgreSQL, however, you here’s how I’ve been getting that accomplished. This example uses the Perl DBI mechanism, but I think it’s readable enough that it’d be easy to convert to anything else with parametrized query preparation.
Read the rest of this entry »
Archive for the ‘programming’ Category
Recently, I needed to write a function which would find the next available value in a sequence of user IDs. My data is in a PostgreSQL table called agtab within a scheme called accounts. There is a column named uid and a column named gid for each account. The table was populated using the aggregate of several “legacy” systems, and the existing IDs are not contiguous. I need to have a mechanism which can quickly find the smallest unused ID. The brute force method would be to start at the lowest possible value and just start incrementing a value until I found one that’s unused. But that’s inelegant and won’t scale real well. I also need to skip over a chunk in the middle of the range which is allocated for another use. I ended up with the idea of calculating the difference of the list of all IDs and the same list with all IDs incremented by one; the lowest value in “ids plus one” which isn’t also in “ids” is the lowest free ID. Enter the “EXCEPT” clause.
Today I had a reason to use namerefs in a recursive function in a ksh93 script. Specifically, I wanted to build a list of email addresses for the management chain above an arbitrary number of employees. I used an associative array so duplicates would automatically be removed. But I wanted the function adding elements to the array to potentially work with different arrays.
Originally, I wrote the function roughly like this:
function append_managers { typeset -l user=${1:?} typeset -n arr=${2:?} typeset manager lookup "$user" manager="${manager[$user]}" lookup "$manager" arr[${email[$manager]}]=1 is_vp "$manager" && return append_manager "$manager" "${!arr}" }
I think this is pretty self-explanatory, but the way this should work is that the function takes a user name and an array name as mandatory arguments. It uses a function (lookup()) to populate a few global arrays, including “manager”, “email”, and one with the title (so is_vp() can later determine if the person is a VP or below). The manager’s email address is inserted into the associative array which was passed in, and then if the manager is not a VP or above, the function calls itself to continue climbing up the management chain. The things in here which might be unfamilar are
- typeset -l, which forces the variable’s value to lowercase (which works better with my directory)
- typeset -n, which makes the variable a “nameref”, so using the variable affects a variable with the name of the nameref’s value. It’s kind of like a pointer. In this case, referencing $arr will actually reference the variable with the same name as what was passed in as the second argument. If you do “a=’hello’;typeset -n b=’a’;print $b”; you should get “hello” in ksh93.
- ${!var}, which gets the name of the variable pointed to by a nameref. In the case above of ${!arr} and a function call like “append_manager ‘danny’ ‘cc'”, that would expand to ‘cc’.
This last bullet point is where things get hung up. The variable scoping in ksh93 with functions defined like “function f{}” (as opposed to “f(){}”) is lexical. In general, lexical scoping means that the scope is like the code would read; if you fully expand the code by copying the whole functions into the places where they’re called, copy includes in place, etc so that you have one long procedure, then variables’ scope would be where they appear. In this situation, that essentially translates to “variables defined in a function using typset are local to that function and to any functions called by that function.” So, recursive calls to “append_manager” with the same array name should all be able to see the same original array.
Unfortunately, in the version of ksh93 on the machine where I’m developing this – a RHEL6 system with “AT&T ksh 93u+” – this does not work as expected. On the second iteration through the recursion, I get an “arithmatic syntax error” when I try to use the nameref. I tried a number of variations on the approach; I tried hard-coding the array name, I tried putting the array name into a variable and then using that variable to hold the nameref name (instead of $2), and I tried making argument $2 optional so recursive calls just provided a new name while using the nameref scoped within the first call to the function. In all cases, I got either the same error or an empty array at the end. I suspect that the recursion messes up the internal variable tracking. Regardless of the cause, wWhat I finally ended up with was replacing the recursive call with this:
append_manager "$manager" "arr"
So, each call gets a new locally-scoped nameref variable pointing to the nameref array in its parent. It makes me feel slightly ill to have a call where I’m basically doing “typeset -n a=a”, but since the new “arr” is in a different, lower scope than the first, they’re technically two separate variables. It actually works, though, and even with various combinations of things I threw at it, it works properly. So, that’s what’s in my final script – along with some comments hopefully saving a future programmer the time.
One final note: I mentioned that I used an associative array to avoid duplicates. But that means I need to pull the indices (AKA “keys”) from the array in order to generate my final list of email addresses. Considering an array named “cc”, the code to generate an Outlook-friendly (separated by semicolons – ugh) list of email address looks like this:
   OLDIFS=$IFS; IFS=";"    print "cc: " "${!cc[*]}"    IFS=$OLDIFS
Or like this:
   list=$( printf '; %s' "${!cc[@]}" )    print "cc: ${list#??}"
The first option uses IFS to separate the arguments, which is limited to a single character. The second option (the one I actually used), prints each element preceded by a semicolon and a space, then prunes the first two characters off of that list. One could also use “${list:2}”, but I use the “prune” modifier more often than “substring” because I still have some older POSIX shell habits where the substring option wasn’t an option. :)
I keep my CFEngine policy (and some other similar things) in a Subversion repository.  The progression from unit test to integration test to production is handled by using tags.  Basically, the integration test policy is the trunk, unit tests are done by branching the trunk, and promotion to production is done by tagging a revision of the trunk with a release name (monthly_YYYY_MM.POINT). But this discussion doesn’t need to be just about that approach; my solution should work for pretty much anyone who needs a directory to match a portion of a subversion structure.
Recently, I’ve been trying to speed up my Subversion post-commit hooks.  I have several things which are run from the hook, and the number of separate commands leads to a bunch of fork() calls (or clone(), whatever).  Several of the scripts are already Python, so I figured I’d just write the hooks themselves in Python, making it so that the python interpreter would only need to start once and allowing the separate methods to be pre-compiled python.  This should decrease overall execution time, making the end-user experience slightly better overall by decreasing the time they have to wait on the server.  We’re talking about fractions of a second, but I have some operations which bulk-create directories in SVN or otherwise cause tens or hundreds of new revisions to be created at one time (which is necessary for the way some of my integration processes work), so it actually adds up.
This is also an excuse for me to learn Python, so bear with me if the code below is horrible.  Actually, don’t bear with me – leave a comment letting me know how it should have been done. :)
I’m responsible for a pretty large CFEngine installation.  CFEngine is designed to be pretty self-sufficient even when the network is unavailable, so it basically works by keeping its configuration local on each machine, and running from that local copy.  This is mostly implemented using a file-based configuration structure.  There’s a main configuration file (promises.cf) which includes several additional configuration files.  In pretty much every situation, one of the promises (the name for an individual policy item) or bundles of promises will ensure that the local config files are in sync with the configuration files on the central master.
While it’s possible to use LDAP or define some variables on the central master, the main way configuration is done is by putting the policy into some files on the master and then allowing individual systems to copy those files down; the central master is basically just a fairly efficient file server.