First and foremost: I don’t intend to diss any of the modules mentioned in this post. A lot of hard work has gone into each and every one of them. They are used by production applications all around the world which merrily respond to plenty of requests every day. I’ve also deployed applications using ORMs and regret nothing.
ORMs are powerful tools. The ORMs we’ll be examining in this post are able to communicate with SQL backends such as SQLite, PostgreSQL, MySQL, and MSSQL. The examples in this post will make use of PostgreSQL, which is a very powerful open source SQL server. There are ORMs capable of communicating with NoSQL backends, such as the Mongoose ORM backed by MongoDB, but we won’t be considering those in this post.
First, run the following commands to start an instance of PostgreSQL locally. It will be configured in such a way that requests made to the default PostgreSQL port on
localhost:5432 will be forwarded to the container. It’ll also write the files to disk in your home directory so that subsequent instantiations will retain the data we’re already created.
Now that you’ve got a database running we need to add some tables and data to the database. This will allow us to query against the data and get a better understanding of the various layers of abstraction. Run the next command to start an interactive PostgreSQL prompt:
At the prompt type in the password from the previous code block,
hunter12. Now that you’re connected, copy and paste the following queries into the prompt and press enter.
You now have a populated database. You can type
quit to disconnect from the
psql client and get control of your terminal back. If you ever want to run raw SQL commands again you can run that same
docker run command again.
Finally, you’ll also need to create a file named
connection.json containing the following JSON structure. This will be used by the Node applications later to connect to the database.
Layers of Abstraction
Before diving into too much code let’s clarify a few different layers of abstraction. Just like everything in computer science, there are tradeoffs as we add layers of abstraction. With each added layer of abstraction we attempt to trade a decrease in performance with an increase in developer productivity (though this is not always the case).
Low Level: Database Driver
This is basically as low-level as you can get — short of manually generating TCP packets and delivering them to the database. A database driver is going to handle connecting to a database (and sometimes connection pooling). At this level you’re going to be writing raw SQL strings and delivering them to a database, and receiving a response from the database. In the Node.js ecosystem there are many libraries operating at this layer. Here are three popular libraries:
- mysql: MySQL (13k stars / 330k weekly downloads)
- pg: PostgreSQL (6k stars / 520k weekly downloads)
- sqlite3: SQLite (3k stars / 120k weekly downloads)
Each of these libraries essentially works the same way: take the database credentials, instantiate a new database instance, connect to the database, and send it queries in the form of a string and asynchronously handle the result.
Here is a simple example using the
pg module to get a list of ingredients required to cook Chicken Tikka Masala:
Middle Level: Query Builder
This is the intermediary level between using the simpler Database Driver module vs a full-fledged ORM. The most notable module which operates at this layer is Knex. This module is able to generate queries for a few different SQL dialects. This module depends on one of the aforementioned libraries — you’ll need to install the particular ones you plan on using with Knex.
- knex: Query Builder (8k stars / 170k weekly downloads)
When creating a Knex instance you provide the connection details, along with the dialect you plan on using and are then able to start making queries. The queries you write will closely resemble the underlying SQL queries. One nicety is that you’re able to programmatically generate dynamic queries in a much more convenient way than if you were to concatenate strings together to form SQL (which often introduces security vulnerabilities).
Here is a simple example using the
knex module to get a list of ingredients required to cook Chicken Tikka Masala:
High Level: ORM
This is the highest level of abstraction we’re going to consider. When working with ORMs we typically need to do a lot more configuration ahead of time. The point of an ORM, as the name implies, is to map a record in a relational database to an object (typically, but not always, a class instance) in our application. What this means is that we’re defining the structure of these objects, as well as their relationships, in our application code.
- sequelize: (16k stars / 270k weekly downloads)
- bookshelf: Knex based (5k stars / 23k weekly downloads)
- waterline: (5k stars / 20k weekly downloads)
- objection: Knex based (3k stars / 20k weekly downloads)
In this example, we’re going to look at the most popular of the ORMs, Sequelize. We’re also going to model the relationships represented in our original PostgreSQL schema using Sequelize. Here is the same example using the Sequelize module to get a list of ingredients required to cook Chicken Tikka Masala:
Now that you’ve seen an example of how to perform similar queries using the different abstraction layers let’s dive into the reasons you should be wary of using an ORM.
Reason 1: You’re learning the wrong thing
A lot of people pick up an ORM because they don’t want to take the time to learn the underlying SQL (Structured Query Language). The belief often being that SQL is hard to learn and that by learning an ORM we can simply write our applications using a single language instead of two. At first glance, this seems to hold up. An ORM is going to be written in the same language as the rest of the application, while SQL is a completely different syntax.
There is a problem with this line of thinking, however. The problem is that ORMs represent some of the most complex libraries you can get your hands on. The surface area of an ORM is very large and learning it inside and out is no easy task.
Once you have learned a particular ORM this knowledge likely won’t transfer that well. This is true if you switch from one platform to another, such as JS/Node.js to C#/.NET. But perhaps even less obvious is that this is true if you switch from one ORM to another within the same platform, such as Sequelize to Bookshelf with Node.js. Consider the following ORM examples which each generate a list of all recipe items which are vegetarian:
The syntax for a simple read operation varies greatly between these examples. As the operation you’re trying to perform increases in complexity, such as operations involving multiple tables, the ORM syntax will vary from between implementations even more.
There are at least dozens of ORMs for Node.js alone, and at least hundreds of ORMs for all platforms. Learning all of those tools would be a nightmare!
Lucky for us, there are really only a few SQL dialects to worry about. By learning how to generate queries using raw SQL you can easily transfer this knowledge between different platforms.
Reason 2: Complex ORM calls can be Inefficient
Recall that the purpose of an ORM is to take the underlying data stored in a database and map it into an object that we can interact within our application. This often comes with some inefficiencies when we use an ORM to fetch certain data.
Consider, for example, the queries we first looked at in the section on layers of abstraction. In that query, we simply wanted a list of ingredients and their quantities for a particular recipe. First we made the query by writing SQL by hand. Next, we made the query by using the Query Builder, Knex. Finally, we made a query by using the ORM, Sequelize. Let’s take a look at the queries which have been generated by those three commands:
Hand-written with “pg” Driver:
This first query is exactly the one we wrote by hand. It represents the most succinct method to get exactly the data we want.
When we prefix this query with
EXPLAIN and send it to the PostgreSQL server, we get a cost operation of 34.12.
Generated with “knex” Query Builder:
This next query was mostly generated for us, but due to the explicit nature of the Knex Query Builder, we should have a pretty good expectation of what the output will look like.
Newlines have been added by me for readability. Other than some minor formatting and unnecessary table names in my hand-written example, these queries are identical. In fact, once the
EXPLAIN query is run, we get the same score of 34.12.
Generated with “Sequelize” ORM:
Now let’s take a look at the query generated by an ORM:
Newlines have been added by me for readability. As you can tell this query is a lot different from the previous two queries. Why is it behaving so differently? Well, due to the relationships we’ve defined, Sequelize is trying to get more information than what we’ve asked for. In particular, we’re getting information about the Dish itself when we really only care about the Ingredients belonging to that Dish. The cost of this query, according to
EXPLAIN, is 42.32.
Reason 3: An ORM Can’t Do Everything
Not all queries can be represented as an ORM operation. When we need to generate these queries we have to fall back to generating the SQL query by hand. This often means a codebase with heavy ORM usage will still have a few handwritten queries strewn about it. The implications here are that, as a developer working on one of these projects, we end up needing to know BOTH the ORM syntax as well as some underlying SQL syntax.
A common situation which doesn’t work too well with ORMs is when a query contains a subquery. Consider the situation where I know that I have already purchased all the ingredients for Dish #2 in our database, however, I still need to purchase whatever ingredients are needed for Dish #1. In order to get this list I might run the following query:
To the best of my knowledge, this query cannot be cleanly represented using the aforementioned ORMs. To combat these situations it’s common for an ORM to offer the ability to inject raw SQL into the query interface.
Sequelize offers a
.query() method to execute raw SQL as if you were using the underlying database driver. With both the Bookshelf and Objection ORMs you get access to the raw Knex object which you provide during instantiation and can use that for its Query Builder powers. The Knex object also has a
.raw() method to execute raw SQL. With Sequelize you also get a
Sequelize.literal() method which can be used to intersperse raw SQL in various parts of a Sequelize ORM call. But in each of these situations, you still need to know some underlying SQL to generate certain queries.
Query Builders: The Sweet Spot
Using the low-level Database Driver modules is rather enticing. There is no overhead when generating a query for the database as we are manually writing the query. The overall dependencies our project relies upon is also minimized. However, generating dynamic queries can be very tedious, and in my opinion is the biggest drawback of using a simple Database Driver.
Consider, for example, a web interface where a user can select criteria when they want to retrieve items. If there is only a single option that a user can input, such as color, our query might look like the following:
SELECT * FROM things WHERE color = ?;
This single query works nicely with the simple Database Driver. However, consider if the color is optional and that there’s a second optional field called is_heavy. We now need to support a few different permutations of this query:
SELECT * FROM things; -- Neither
SELECT * FROM things WHERE color = ?; -- Color only
SELECT * FROM things WHERE is_heavy = ?; -- Is Heavy only
SELECT * FROM things WHERE color = ? AND is_heavy = ?; -- Both
However, due to the aforementioned reasons, a fully featured ORM isn’t the tool we want to reach for either.
Using a Query Build is a fine solution as long as you fully understand the underlying SQL it is generating. Never use it as a tool to hide from what is happening at a lower layer. Only use it as a matter of convenience and in situations where you know exactly what it’s doing. If you ever find yourself questioning what a generated query actually looks like you can add a debug field to the
Knex() instantiation call. Doing so looks like this:
In fact, most of the libraries mentioned in this post include some sort of method for debugging the calls being executed.
We’ve looked at three different layers of abstracting database interactions, namely the low-level Database Drivers, Query Builders, and the high-level ORMs. We’ve also examined the tradeoffs of using each layer as well as the SQL queries being generated: this includes the difficulty of generating dynamic queries with a Database Driver, the added complexity of ORMs, and finally the sweet spot of using a Query Generator.
Thank you for reading and be sure to take this into consideration when you build your next project.
Once you’re done following along you may run the following commands to completely remove the docker container and remove the database files from your computer:
Plug: LogRocket, a DVR for web apps
LogRocket is a frontend logging tool that lets you replay problems as if they happened in your own browser. Instead of guessing why errors happen, or asking users for screenshots and log dumps, LogRocket lets you replay the session to quickly understand what went wrong. It works perfectly with any app, regardless of framework, and has plugins to log additional context from Redux, Vuex, and @ngrx/store.