In the last few posts on the managing database changes, we discussed how it is useful and what are the various benefits available. One of the core philosophies of the Database as a Source Code involves treating code for Database changes as source code. This is not limited to using a version control system like Git / Subversion / Mercury etc. but it also expands to other areas like designing the proper directory structure, making it scale ready for future changes, minimizing merge conflicts etc. In this blog post, we are going to discuss some of the practices used for organizing database changes when using Liquibase.
Choosing the Version Control System
Git has become the default version control system for most of the Organization these days. However, this does not mean that you should discard your previous knowledge of other version control systems like Subversion, Perforce etc. Determine with your colleagues what knowledge is already available with your team and what you can improve on or build on and then start using the same. There is lot of information available on internet for every major version control system. Also not to forget that most of the CI/CD tools supports all major version control systems. So that should not be an issue as well.
Segregating Changesets into Separate files
With the passage of time during the application lifecycle, more and more changesets will be added to the changelog file. When all developers are writing changesets in one single file, that file can become unmanageable very quickly. It can take longer and longer to load it an editor, understand and edit it, and subsequent handling. Using a single file would also increase possibility of generating conflicts while merging changes. Moreover, every time Liquibase runs the changelog, it has to work on every changeset from the beginning, something that will take longer and longer.
Fortunately, just like code refactoring, Liquibase changelogs can be refactored into hierarchical fashion. That means we can have a single parent changelog file which can include pointers to a number of child changelog files, which in turn can point to more changelog files and so on. Again, a changelog file can include any number of changeset files. All of this is done using the
include attribute available. You can refer the official documentation here for include attribute.
Determine directory structure for files
Another similiar question is on how to arrange the changelogs and changesets in the file system. This is a question often asked and often interpreted in lots of ways by different developers teams. In my personal opinion, the best method is to create a separate directory for each database release and name the directory based on the release. Determine a naming strategy for the directory and rigorously follow that. The relationship between the directories is determined using the refactoring method describe above.
The parent directory contains the parent changelog file. This changelog file includes references to other changelog files in release specific directories. The changelog files in these sub-directories will include references to changelog files for issue specific or feature specific directories.
Another common strategy is to divide the changes using the DB Object and the operation in nature. For example, parent changelog file will point to changelog files present in two sub-directories named ddl and dml. The changelog files present in the sub-directories will further include references to changelog files present in sub-directories like tables, stored procedures, etc. or issue, features, etc.
Either way, discuss and determine what best works for you and follow that strategy. Liquibase does not care much about the directory structure as long as references mentioned are in correct order.
Designing Strategy for minimum conflicts
If your team is doing development in Git, developers will create branches for handling changes and issues and after that, they are merged into the main release branch (release specific or master). There might be movement of selected commits into the release branch as well. If your team is doing development in Subversion, you will be creating branches and tags for handling changes and issues. At time of release, you will merge your changes into the trunk or any specific release branch/tag and release from there.
Either way, there are chances that you will likely face the conflicts when you are merging changes and most of them would be related to the changelog files and rarely it would be about the files containing the changesets. Do note that we have mentioned files here, not the database’s changes itself since we do not need to. All the changes are contained in the files and if you apply the files in a proper order, you’ll be able to recreate the change itself. Whether the change mentioned in the changeset has been applied or not, can be easily identified from the DATABASECHANGELOG table.
If you are getting multiple conflicts related to the files containing the changesets, identify a naming strategy for the files in reference so that you can be able to generate unique file names. One of the common strategy is to name your changeset files after the backlog item number or issue number that has been created for tracking the issue or the feature. If two developers are working on same database objects but they are handling different issues, then there changeset files would be different since they have been assigned different backlog items.
You would also need to consider the way you are generating the value for the id and the author name. Since there is nothing much can be done about the author names, you can use some strategy to generate unique id for each change. This may be again dependent on the backlog item or feature in some way or may be related to datetime of change or any strategy that would provide highly unique id’s.
Also do remember to minimize the changeset contained in the file. The best practice is to always contain a single change in the changeset and one changeset in one file.
Segregate Changes using Contexts
Most of the time, you would have multiple environments like dev, qa, staging, production, etc. You can designate if your change is meant only for specific environment or not and restrict it to that environment only while deploying using Liquibase. You can refer this blog post for more details.
Use Pre-Conditions to Check Database State
Use of pre-conditions would determine if database is in a particular state or not and then run the change. For example, if your changeset contains creating a stored procedure, it would help to know if it already exists or not. Again, before dropping a table, it would help to know if the table contains records or not. Use of pre-conditions wisely, minimizes the errors and also determining the alternative actions. Use of pre-conditions is describe in this blog post.
How Liquibase handles changes?
Always remember how Liquibase works: each changeSet has an id, an author, and a file path which together uniquely identifies it. If the DATABASECHANGELOG table has an entry for that changeSet it will not run it, by default. If it has an entry, it throws an error if the checksum for the changeset in the file doesn’t match what was stored on the last run, again by default. To alter this default behavior, there are couple of additional attributes like
runAlways='true", which will cause to always run the changeset irrespective of whether it has run previously or not. Similarly, there is another attribute
runOnChange="true", which will apply the change only if file contents has been changed. If its not changed and the checksum matches, it will move on further.
Summary and Notes
Most of the time, it should not matter if the changelog file includes the pointers to other changeset. You should be able to include your changesets and let Liquibase determine if it needs to apply all changesets mentioned or select few. There will always be edge cases and you may see issues after merging and deploying changes. In such a case, your developers need to sit together and decide what needs to be kept and what needs to be discarded. Future conflicts can be avoided by proper planning and collaboration and learning from the mistakes made.