Databasics: 1 – Primary Keys

There is an ongoing debate among database experts regarding the design of a Primary Key. A debate that in my opinion should have been done and dusted a long time ago.

Note: A Primary Key is a piece of data contained in a database Column that uniquely identifies the database Row. This is the same as how a National Insurance Number uniquely identifies us to the authorities in the UK, or how a soldiers Service Number uniquely identifies then within the Military. If you need to View, Update or Delete an existing database record then it is essential that you can uniquely identify it.

Two Main Schools of Thought

The first says that the Primary Key should be a valid piece of information in it’s own right – not just an identifier. Like a name for example. In the West we use a Surname which identifies us when amongst other people, most of which will hopefully have a different surname. In situations where that is not true, for example family gatherings, the first name can be used as well as a means of narrowing this down. It can be difficult to build up a unique piece of information using valid information.

The second school of thought acknowledges the problems of the above solution and solves these issues by allowing a non meaningful Unique Identifier whose sole purpose is to be able to identify uniquely within any amount of similar items. This is basically what we have with Military Service Numbers and National Insurance Numbers.

My Preference

My preference is with the second school of thought and in fact you can easily adopt this strategy with most Database Engines using the Auto Increment option on the Column. This lets the Database Engine itself take care of generating a Unique, Non Reuseable Identifier.

I always use the first Column of my Database Table as my Primary Key and name it:

pk

Many of my earlier databases used an incrementing number as the primary key, a number that was unique within the table. One particular system used a number that was unique everywhere within the whole database, the theory being that it would make it easy to see the order of inserts across multiple tables. I never found this to be needed however and I never used it again.

One of the downsides of using an incrementing numeric value as a primary key (if this key is used as a foreign keys elsewhere within the database) is if you have to export and reimport data following a database issue.

You also get issues when you are operating data on a remote database in an offline state, that then needs to be reconciled and synchronized back to the main database.

My current way of thinking is that instead of an incrementing numeric value, I would instead use a GUID instead.

Consistency and Structure

All my Database designs use the same structure in order to build consistency, something which is not fully appreciated until you have to work with legacy databases which haven’t been built with consistency, structure or maintainability in mind.

Another example of consistency and structure; the second column of every Database Table I design is always updguid.

This column contains another identifier, however this one changes with every edit or update of the database record. This is used so that I can find out if the data I am viewing on my screen has actually since been updated elsewhere by someone else.

A comparison between the value of the updguid I have in memory and the value of the one stored in the database is all that is needed to determine the validity of the information I am viewing. If the information is stale I have several options I can pursue. This all is part of my Record Locking strategy, covered in another Databasics post soon 🙂