Any nontrivial application requires a database that will evolve over time to support new features or to fix scaling problems with queries. It may be fine to change your database schema by hand in a console but only if your web application is very small, is in beta with a very low audience, or is a hobby project. In any other case, you should write migrations.
Migrations are small pieces of code that update the database schema. Because migrations are code, they can be reproduced in an automated fashion in different environments and they can also be tested in different ways. Usually, you’d use a tool to create and run migrations. This tool will provide a command line interface to start coding a new migration, to run a specific migration, to roll it back or to run all pending migrations in your current setup/environment. These tools are usually included in the ORM or database framework you use. For example, you have tools for Ruby on Rails, Sequelize, knex.js, or Prisma/Lift.
In a small application, you can decide to just run the migrations before running the application. I have worked in many projects that had a Node.js script like this:
"start": "./run-migrations && node app.js"
Even if the database is so big that the migrations take a while to run this is okay(ish) if you have a setup where the previous deploy keeps serving the requests until the new deploy is up and running. But what if the previous deploy does not work with the new database changes?
For example, what if you have renamed a column or you’ve added a column that in the old code produces ambiguity in a query? It could be a while before all of the requests are handled by the new deploy that will produce errors if the code tries to do it with the old name of the column. You may decide to shut down the service for a moment and then run the migration and reestablish the service.
However, this will annoy users using the service so you should schedule a maintenance window and tell your users via email, or put a warning on your site, etc. Is it worth the effort? Is there a better way?
There’s a solution to this problem that will allow you to keep your service up and running and your users won’t notice anything. It’s based on what I consider the golden rule about migrations — when migrating a database, the current code must work fine with both the previous and the new database schema.
How do we guarantee this? There are two ways:
This way you are testing your code before and after doing the migration. You test after because you are doing that explicitly in your CI, and before because if you haven’t changed any application code, this is true because the code hasn’t changed and you haven’t changed any tests, so the current code was already tested with the old database schema in the previous deploy.
Doing migrations this way also brings another benefit. You can decouple running the migrations from your deployment process, which allows you to do long-running migrations. For example, if you need to backfill data.
If you create new columns or tables to denormalize some data or calculations, you may need to iterate over all records to calculate those values for existing records, which means that some database changes require a multi-step process of schema migrations and deploys.
For example, let’s say you have items that have a created_at
date and a sold_at
date and you want to add a new feature to your application to show and be able to sort items by the amount of time they were in stock. If it was only showing the amount of time and not sorting, you could do this calculation when rendering the UI. But if you want to sort by that criteria you’ll want the data to already be calculated in the database in a new column and index it.
So, first, you’ll create a new column and the index in a migration. Here is an example using knex:
exports.up = function(knex, Promise) { return knex.schema.table('items', function(t) { t.integer('time_in_stock') t.index(['time_in_stock']) }) } exports.down = function(knex, Promise) { return knex.schema.table('items', function(t) { t.dropIndex(['time_in_stock']) t.dropColumn('time_in_stock') }) }
Then you’ll change your application code so new items have the time calculated when sold:
function soldItem(item) { const now = new Date() item.update({ sold_at: now, time_in_stock: now - item.created_at // new calculation }) }
Next, you’ll write and run a migration to calculate the values for existing records sold before you made the changes that start calculating the new column:
exports.up = function(knex, Promise) { return knex.raw('UPDATE items SET time_in_stock = UNIX_TIMESTAMP(sold_at) - UNIX_TIMESTAMP(created_at) WHERE time_in_stock IS NULL') }
Finally, you’ll be able to add application code to show the calculations and let your users sort by them. Depending on your case (and unlike the sold items example) maybe you’ll also want to add a NOT NULL constraint to the new column with another migration once all the records in the database have been backfilled.
Let’s look at some other examples that will require a multi-step process.
First, you must stop writing to that table or column:
item.update({ a_column: a_value // legacy_column: value // Stop writing to the column })
Then you’ll be able to write a migration that drops it physically from the database:
exports.up = function(knex, Promise) { return knex.schema.table('items', function(t) { t.dropColumn('legacy_column') }) }
If you want to drop a column that has a NOT NULL constraint you’ll also need the first migration to just remove that constraint. Otherwise, you won’t be able to just stop writing to that column.
The simplest example is moving data from one column to another in the same table. You may want to do this just because the new one has a different data type or just a different name. Let’s see all the steps involved. First, you’ll need to create the new column or table:
exports.up = function(knex, Promise) { return knex.schema.table('items', function(t) { t.integer('new_column') }) } exports.down = function(knex, Promise) { return knex.schema.table('items', function(t) { t.dropColumn('new_column') }) }
Then you’ll need to do application changes to start writing the information to both columns:
item.update({ legacy_column: value, new_column: value // Start writing to both columns })
Then you’ll do a migration to backfill the new column so older records have the right values in both columns:
exports.up = function(knex, Promise) { return knex.raw('UPDATE items SET new_column = legacy_column WHERE new_column IS NULL') }
Then you’ll change your application code to start reading and writing only from the new column:
// No longer writing to the legacy column item.update({ new_column: value })
Finally, you’ll be able to drop the old column or table:
exports.up = function(knex, Promise) { return knex.schema.table('items', function(t) { t.dropColumn('legacy_column') }) }
If you have a lot of data you need to modify or to backfill, it’s better not to do it in a single SQL statement, because you may lock your tables and the service will stop working or it’ll work poorly. It’s better to query some records that need to be updated and then run the SQL statement only in those records. Maybe doing something like UPDATE ... WHERE id IN (…list of ids from previous query)
. You’ll do this in a loop, first querying a few records, then updating them.
You’ll need to throttle the updates because otherwise you’ll be triggering many write operations in a short period of time and that may also have a bad impact on the sanity of your service.
This way of implementing long-running migrations also allows you to stop and resume the migration whenever you want. This is especially useful because migrations like these may encounter problems in some records. If that’s the case this strategy allows you to fix the migration and rerun it.
In my experience, it’s frequent that you created a new column and want to backfill it, but sometimes the calculations you want to backfill it with returns null, maybe because very old records don’t have all the information to do the calculations, and when updating the row you get errors because of the NOT NULL constraint. In these cases, you can default to a specific value or drop the constraint, depending on your case.
Evolving your database schema without downtimes and thus without disrupting your users may require many steps, but I find it better, less stressful, and safer than shutting down servers and running against the clock. It’s common to see some applications that get updated and then they require some cleanup at the beginning and make the user wait which is also a bad user experience.
These best practices are also applicable not only to web applications but also to mobile applications with embedded databases.
Install LogRocket via npm or script tag. LogRocket.init()
must be called client-side, not
server-side
$ npm i --save logrocket // Code: import LogRocket from 'logrocket'; LogRocket.init('app/id');
// Add to your HTML: <script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script> <script>window.LogRocket && window.LogRocket.init('app/id');</script>
Hey there, want to help make our blog better?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up nowLearn how to manage memory leaks in Rust, avoid unsafe behavior, and use tools like weak references to ensure efficient programs.
Bypass anti-bot measures in Node.js with curl-impersonate. Learn how it mimics browsers to overcome bot detection for web scraping.
Handle frontend data discrepancies with eventual consistency using WebSockets, Docker Compose, and practical code examples.
Efficient initializing is crucial to smooth-running websites. One way to optimize that process is through lazy initialization in Rust 1.80.