Alberto Gimeno Ecosystem engineer at @github. Sometimes I write about JavaScript, Node.js, and frontend development.

How to migrate a database schema at scale

5 min read 1621

Any nontrivial application requires a database that will evolve over time to support new features or to fix scaling problems with queries. It may be fine to change your database schema by hand in a console but only if your web application is very small, is in beta with a very low audience, or is a hobby project. In any other case, you should write migrations.

What’s a migration

Migrations are small pieces of code that update the database schema. Because migrations are code, they can be reproduced in an automated fashion in different environments and they can also be tested in different ways. Usually, you’d use a tool to create and run migrations. This tool will provide a command line interface to start coding a new migration, to run a specific migration, to roll it back or to run all pending migrations in your current setup/environment. These tools are usually included in the ORM or database framework you use. For example, you have tools for Ruby on Rails, Sequelize, knex.js, or Prisma/Lift.

How should we run these migrations

In a small application, you can decide to just run the migrations before running the application. I have worked in many projects that had a Node.js script like this:

"start": "./run-migrations && node app.js"

Even if the database is so big that the migrations take a while to run this is okay(ish) if you have a setup where the previous deploy keeps serving the requests until the new deploy is up and running. But what if the previous deploy does not work with the new database changes?

For example, what if you have renamed a column or you’ve added a column that in the old code produces ambiguity in a query? It could be a while before all of the requests are handled by the new deploy that will produce errors if the code tries to do it with the old name of the column. You may decide to shut down the service for a moment and then run the migration and reestablish the service.

However, this will annoy users using the service so you should schedule a maintenance window and tell your users via email, or put a warning on your site, etc. Is it worth the effort? Is there a better way?

There’s a solution to this problem that will allow you to keep your service up and running and your users won’t notice anything. It’s based on what I consider the golden rule about migrations — when migrating a database, the current code must work fine with both the previous and the new database schema.

How do we guarantee this? There are two ways:

  • A deploy that modifies the database schema must not contain changes to the application code
  • In your CI, run the tests after migrations are ran

This way you are testing your code before and after doing the migration. You test after because you are doing that explicitly in your CI, and before because if you haven’t changed any application code, this is true because the code hasn’t changed and you haven’t changed any tests, so the current code was already tested with the old database schema in the previous deploy.

Backfilling data

Doing migrations this way also brings another benefit. You can decouple running the migrations from your deployment process, which allows you to do long-running migrations. For example, if you need to backfill data.

We made a custom demo for .
No really. Click here to check it out.

If you create new columns or tables to denormalize some data or calculations, you may need to iterate over all records to calculate those values for existing records, which means that some database changes require a multi-step process of schema migrations and deploys.

For example, let’s say you have items that have a created_at date and a sold_at date and you want to add a new feature to your application to show and be able to sort items by the amount of time they were in stock. If it was only showing the amount of time and not sorting, you could do this calculation when rendering the UI. But if you want to sort by that criteria you’ll want the data to already be calculated in the database in a new column and index it.

So, first, you’ll create a new column and the index in a migration. Here is an example using knex:

exports.up = function(knex, Promise) {
  return knex.schema.table('items', function(t) {
    t.integer('time_in_stock')
    t.index(['time_in_stock'])
  })
}

exports.down = function(knex, Promise) {
  return knex.schema.table('items', function(t) {
    t.dropIndex(['time_in_stock'])
    t.dropColumn('time_in_stock')
  })
}

Then you’ll change your application code so new items have the time calculated when sold:

function soldItem(item) {
  const now = new Date()
  item.update({
    sold_at: now,
    time_in_stock: now - item.created_at // new calculation
  })
}

Next, you’ll write and run a migration to calculate the values for existing records sold before you made the changes that start calculating the new column:

exports.up = function(knex, Promise) {
  return knex.raw('UPDATE items SET time_in_stock = UNIX_TIMESTAMP(sold_at) - UNIX_TIMESTAMP(created_at) WHERE time_in_stock IS NULL')
}

Finally, you’ll be able to add application code to show the calculations and let your users sort by them. Depending on your case (and unlike the sold items example) maybe you’ll also want to add a NOT NULL constraint to the new column with another migration once all the records in the database have been backfilled.

Other examples

Let’s look at some other examples that will require a multi-step process.

Deleting a table or a column

First, you must stop writing to that table or column:

item.update({
  a_column: a_value
  // legacy_column: value // Stop writing to the column
})

Then you’ll be able to write a migration that drops it physically from the database:

exports.up = function(knex, Promise) {
  return knex.schema.table('items', function(t) {
    t.dropColumn('legacy_column')
  })
}

If you want to drop a column that has a NOT NULL constraint you’ll also need the first migration to just remove that constraint. Otherwise, you won’t be able to just stop writing to that column.

Moving data from one column or table to another

The simplest example is moving data from one column to another in the same table. You may want to do this just because the new one has a different data type or just a different name. Let’s see all the steps involved. First, you’ll need to create the new column or table:

exports.up = function(knex, Promise) {
  return knex.schema.table('items', function(t) {
    t.integer('new_column')
  })
}

exports.down = function(knex, Promise) {
  return knex.schema.table('items', function(t) {
    t.dropColumn('new_column')
  })
}

Then you’ll need to do application changes to start writing the information to both columns:

item.update({
  legacy_column: value,
  new_column: value // Start writing to both columns
})

Then you’ll do a migration to backfill the new column so older records have the right values in both columns:

exports.up = function(knex, Promise) {
  return knex.raw('UPDATE items SET new_column = legacy_column WHERE new_column IS NULL')
}

Then you’ll change your application code to start reading and writing only from the new column:

// No longer writing to the legacy column
item.update({ new_column: value })

Finally, you’ll be able to drop the old column or table:

exports.up = function(knex, Promise) {
  return knex.schema.table('items', function(t) {
    t.dropColumn('legacy_column')
  })
}

Caveats with long-running migrations

If you have a lot of data you need to modify or to backfill, it’s better not to do it in a single SQL statement, because you may lock your tables and the service will stop working or it’ll work poorly. It’s better to query some records that need to be updated and then run the SQL statement only in those records. Maybe doing something like UPDATE ... WHERE id IN (…list of ids from previous query). You’ll do this in a loop, first querying a few records, then updating them.

You’ll need to throttle the updates because otherwise you’ll be triggering many write operations in a short period of time and that may also have a bad impact on the sanity of your service.

This way of implementing long-running migrations also allows you to stop and resume the migration whenever you want. This is especially useful because migrations like these may encounter problems in some records. If that’s the case this strategy allows you to fix the migration and rerun it.

In my experience, it’s frequent that you created a new column and want to backfill it, but sometimes the calculations you want to backfill it with returns null, maybe because very old records don’t have all the information to do the calculations, and when updating the row you get errors because of the NOT NULL constraint. In these cases, you can default to a specific value or drop the constraint, depending on your case.

Conclusion

Evolving your database schema without downtimes and thus without disrupting your users may require many steps, but I find it better, less stressful, and safer than shutting down servers and running against the clock. It’s common to see some applications that get updated and then they require some cleanup at the beginning and make the user wait which is also a bad user experience.

These best practices are also applicable not only to web applications but also to mobile applications with embedded databases.

Plug: , a DVR for web apps

LogRocket is a frontend application monitoring solution that lets you replay problems as if they happened in your own browser. Instead of guessing why errors happen, or asking users for screenshots and log dumps, LogRocket lets you replay the session to quickly understand what went wrong. It works perfectly with any app, regardless of framework, and has plugins to log additional context from Redux, Vuex, and @ngrx/store.

In addition to logging Redux actions and state, LogRocket records console logs, JavaScript errors, stacktraces, network requests/responses with headers + bodies, browser metadata, and custom logs. It also instruments the DOM to record the HTML and CSS on the page, recreating pixel-perfect videos of even the most complex single-page apps.

.
Alberto Gimeno Ecosystem engineer at @github. Sometimes I write about JavaScript, Node.js, and frontend development.

Leave a Reply