Include Databases in CI/CD pipeline using Liquibase

For most of the Organizations, it is almost unthinkable to work in a software project without some kind of version control. The benefits of tracking and retaining an incremental history of the code are well understood and documented. However most of the time this has been limited to the application code and not the accompanying databases. This has remain a concern due to the many complexities and cultural issues associated with how the databases operates and their various versions. However, it is not just limited to keep database schema and data changes as part of the version control. It is also equally necessary to include database changes as part of the version control and make them part of the build and release pipelines to realize full benefits.

Some of these benefits include and not just limited to: Better sharing of code changes, gain better visibility of full change for release, rollbacks and/or retrieval of previous database version, better auditing and compliance, better enforcement of Organization’s practices, and the synchronization of the application and database changes. Not only this, this lays foundation of greater goal of database automation where you can keep in sync hundreds of your databases deployment and easily reproduce full databases as and when required, that too with almost no effort involved. None of these benefits is small to be realized.

Introduction to Liquibase

Liquibase is an open source database change management and deployment tool released in 2006. It is one of the most popular tools used for database version control. It is also extensible using various plugins and APIs. Not only this, it supports variety of popular databases like IBM DB2, Oracle DB, Microsoft SQL, etc. and if your database is not on the list, you can always extend it to support your use case. You can also write database changes using popular formats like YAML, XML, JSON and also use native SQL related to your database. Since it has been written in Java, it also has plugin support for Maven, Ant and Gradle based projects.

There is also a commercial version available by Datical, which builds on the capabilities of Liquibase and provides extra useful features. However we are not going to include features related to Datical in most of the our posts on the Liquibase.

How Liquibase Works

Liquibase manages database changes using what is known as changesets or changelog (in liquibase terminology) files. A changeset or changelog is a plain text file which includes one or more changesets or changelogs in a hierarchical fashion. You can write these files using XML, JSON, YAML or database native SQL language as well.

A changeset is very basic unit of change for a database. It can be a schema change like adding tables, modifying column properties, creating stored procedures etc. It can also be data related changes like inserting rows, deleting rows, modifying certain rows, etc.

Since the changeset can be written in platform agnostic format, the same set of change can be applied to multiple database platforms without any further efforts. This becomes partially possible because the changeset is written using the Declarative syntax. So you only specify the change that needs to be made but not how. The how part is generated by the Liquibase itself and then passed on to the database using the driver associated with the database.

Due to hierarchical nature of the changesets or changelogs, all underlying changesets beneath the operating changeset, are considered together into an one big changeset. It is always advisable to keep one leaf changeset associated to one atomic database change and then represents all such changes using parent changeset. You can group changesets as how you feel comfortable around it. Some like to group it in terms of schema and data changes, some group it as per various releases, some do a mix and match of both etc. It is really up to you. So it gives immense flexibility to developers on how they want to arrange the changes.

Contents of changelog/changeset

A changeset must include two types of the information: the metadata about the change and the change details itself.

The metadata about the change includes a number of attributes that uniquely identifies it in the changelog file and defines how the change will be applied. The only two mandatory attributes required are the id and author of the changeset. These tags, along with the classpath location and name of the XML file create a unique identifier for that changeSet. An empty changelog looks like below:

A changeset looks like below:


<changeSet author="mohit" id="91">
<createTable tableName="Customers">
<column name="CustomerID" type="nchar(5)">
<constraints primaryKey="true" primaryKeyName="PK_Customers"/>
</column>
<column name="CompanyName" type="nvarchar(40)">
<constraints nullable="false"/>
</column>
<column name="ContactName" type="nvarchar(30)"/>
<column name="ContactTitle" type="nvarchar(30)"/>
<column name="Address" type="nvarchar(60)"/>
<column name="City" type="nvarchar(15)"/>
<column name="Region" type="nvarchar(15)"/>
<column name="PostalCode" type="nvarchar(10)"/>
<column name="Country" type="nvarchar(15)"/>
<column name="Phone" type="nvarchar(24)"/>
<column name="Fax" type="nvarchar(24)"/>
</createTable>
</changeSet>

For a database changeset to be applied, it must contain both the content and mandatory metadata.

What happens when Liquibase Runs

When Liquibase runs for the first time against a database, it creates two new tables in it: DATABASECHANGELOG and DATABASECHANGELOGLOCK.

DATABASECHANGELOG keeps records of what all changesets have been applied on the databse so far. When Liquibase runs, it takes the name of a changelog file as one of its parameters. It looks into the changelog file and reads all changeset underneath itself and its hierarchy. If it is running for the first time, it starts applying those changesets. After successful application of changeset, it adds a new row in the DATABASECHANGELOG table. This row information includes: id
author, file-name of changeset, MD5 Checksum of the changeset and date/time of application of changeset. A Sample of databasechangelog table may look like below:

data stored in databasechangelog table

With each subsequent run,it again looks at the changesets from beginning and checks what’s stored in the DATABASECHANGELOG table. If the change has already been applied, it skips the changeset and moves on to the next one.

One of the notable columns is the MD5Sum checksum. As mentioned above, it creates a checksum of each of the changeset. So if you modify the file later for any reason, it will have different checksum than the earlier version. When applying new changes, liquibase compares the checksum of existing changesets to what has already been applied to the database. This step ensures that the existing changesets have not been modified. If the checksum is different, liquibase fails. This is a step that guarantees that a changeset once applied is immutable and that any changes to a database schema should be applied as a new changeset.

In next blog post, we’ll see how to get liquibase installed and get started.

2 thoughts on “Include Databases in CI/CD pipeline using Liquibase

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s