Postgres

PostgreSQL Autoincrementing Columns

by
published on

Autoincrementing columns are implemented in many DBs and PostgreSQL has two methods to create them.

The PostgreSQL documentation can be found here. The PostgreSQL specific method  uses the SERIAL column type. It is possible to use SERIAL, SMALLSERIAL and BIGSERIAL which in turn will be implemented using sequences and result in a column definition of SMALLINT, INTEGER and BIGINT respectively. 

CREATE TABLE tablename (
    colname SERIAL);

This is the same as writing:

CREATE SEQUENCE tablename_colname_seq AS integer; 

CREATE TABLE tablename ( colname integer NOT NULL DEFAULT nextval('tablename_colname_seq') );

ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;

The CREATE TABLE syntax allows the use of the standard SQL feature to create autoincrementing identity columns.   

column_name type GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY[ ( sequence_option ) ]

The reason I am writing this post is not to regurgitate what the PostgrSQL documentation already does very well, but to warn against an issue which can happen during migration from SQL Server (and most likely other DBs).

I was involved in migrating a SQL Server DB from an AWS IaaS instance to PostgreSQL RDS on AWS. The standard tools used were AWS SCT (Schema Conversion Tool) and DMS (Data Migration Tool). The SCT generated the PostgreSQL DDL using the SQL standard notation which was fine. 

The problems arose when the DMS migrated the data, as it used the "OVERRIDE SYSTEM VALUE" clause to insert the data into the new tables from the source. The result of this action is that the sequence is not incremented.

INSERT INTO  tablename ( colname)
OVERRIDING SYSTEM VALUE 
VALUES( 'value');

My solution to this issue, as the DMS migration was handled by a different department, was to write a little bit of code to generate SQL to update the sequences to a value which was larger than the maximum value currently used in the table. I based my code on information gained from this Stackoverflow post. Another reason to use this method is to maintain the original identity values as this would break the table references if changed.

WITH fq_objects AS (SELECT c.oid,n.nspname || '.' ||c.relname AS fqname ,
                           c.relkind, c.relname AS relation
                    FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace ),

     sequences AS (SELECT oid,fqname FROM fq_objects WHERE relkind = 'S'),
     tables    AS (SELECT oid, fqname FROM fq_objects WHERE relkind = 'r' )
SELECT
CONCAT('select CONCAT(''ALTER TABLE ',t.fqname, ' ALTER COLUMN ', a.attname ,' RESTART WITH '', max(', a.attname,')+1 ,'';'')from ',t.fqname,' union ')
FROM
     pg_depend d JOIN sequences s ON s.oid = d.objid
                 JOIN tables t ON t.oid = d.refobjid
                 JOIN pg_attribute a ON a.attrelid = d.refobjid and a.attnum = d.refobjsubid;

It is not perfect and could definitely be streamlined but it did the job for this once-off task. The SQL generates SQL which you then run against the DB to generate the actual ALTER statements. Please remember to remove the last "union" and double quotes if you are copying the code from PGAdmin.

This issue would not be a problem if the insert to the identity column was not be overridden but as I mentioned earlier, it would cause problems if the values where not the same as the source database.

I hope this information helps someone and also hope that the Stackoverflow contributors do not mind me referencing their work.