Advanced SQL in Rails - Part 1
The more I work with Rails apps, the more I love ActiveRecord. It’s a really elegant abstraction over your data layer, and lets you focus on business logic instead of crafting SQL statements. For the majority of use cases, this works great. But as apps grow in both database size and complexity, we can start to see some compelling reasons to get “closer to the metal” and work more directly with our database.
It’s no secret that databases are fast. For complex aggregate functions that involve processing data from thousands or hundreds of thousands of rows, databases can easily outperform any implementation in Ruby. ActiveRecord gives us some power here, too (shout-out to
.group!). But what if we wanted to go further?
In part 1 of this post, I’m going to cover two powerful features common to most relational databases today: window functions and views. In part 2, I’ll discuss how you can leverage their power from right within Rails. I’m using Postgres, but the examples I show should work in your RDBMS of choice (with a few tweaks to syntax here and there).
Your average (ha!) aggregate function returns just that: an aggregated result. As a simple example, let’s say I wanted to get the balance of a bank account by summing all of the transactions:
sum() function, we get back a single result. Now, what if we wanted to return all the records for the ‘debit’ account, with a running total? For instance, we might want to construct a view that looks like this:
This is where window functions come in. They allow you to compute aggregate functions for each individual row using a ‘window’ into the query that can slice the data up in different ways. In this example, for any given row, we can ask the database to compute the value for ‘balance’ by taking the results from our original query and drawing a ‘window’ around a subset of the rows, then sum the result. Here’s what that query would look like:
We can construct a window function using
OVER. Everything between the parentheses defines how the window will be dynamically constructed for each row. Here, we say that we want to create a column called ‘balance’ which will contain the sum of the amount column, but we want to calculate it by considering only the rows up to and including the current row, as sorted by date and ID.
Whew! This is where an animation might come in handy:
In addition to just sorting the result set in different ways, we can also compute values by partitioning the result set, essentially grouping each row into different ‘buckets’ before calculation. For instance, what if our transactions table had an ‘account’ column, and we wanted to display a table containing the transactions from every account, with a running balance for each account:
We’ve kept our
ORDER BY clause, but we’ve added
PARTITION BY. Here’s what’s actually happening:
Both of these examples are something that could be done at your ActiveRecord layer, maybe with clever usage of scopes and virtual attributes. But as your datasets grow, these kind of calculations become prohibitively expensive to do in-memory. Offloading processing like this to the database becomes an attractive option.
In our examples above, the ‘balance’ column has always been a virtual column. We don’t store it in the database; we compute it on-the-fly every time we run the query. We could add a column for it in our table, but then we’d need to ensure that it gets calculated correctly every time a record in the table is created, updated or deleted. Plus, since we know we’re going to need these queries a lot for displaying to the user, it would be nice if we could store it for easy access.
Database views allow you to do just that: store a query, and access it as if it were a table. You get the benefit of a common interface through which to access your data, without worrying about the complexities of persisting dynamic values.
Creating a view is pretty easy in Postgres. Let’s create one with our example query above:
Next time we want to access that data, we can just query the view as if it were a table instead of rewriting the entire query:
Note that using a view in this way is analogous to creating a method; when you call
SELECT * FROM debit_account_activity, Postgres will simply run the query you gave it when you created the view.
Many databases, including Postgres, have another type of view, called a materialized view. Materialized views will actually persist the results from the query as if it were a table. Because of this, however, they need to be refreshed whenever the underlying data changes, so they’re best for scenarios where real-time data is not a priority.
So far, we’ve seen how we can use views and window functions to construct efficient queries in SQL, avoiding many of the common pitfalls encountered when pulling together data for a view. In Part 2 of this post, I’ll discuss how we can use this to our advantage within Rails applications. Stay tuned!
Update 2016/08/12: Advanced SQL in Rails - Part 2 is up on the blog. Head over and check it out!