Search is a hard problem, thankfully a lot of really smart people have spent a lot of time on it and come up with some awesome tools. Most of our projects involve some kind of search functionality, and often tuning search indexes on the database server will get us enough performance to launch the minimum viable product. But say we have 350,000 things that need fulltext search with ordering and boosting, or 50,000 things with extremely complex access controls in a high traffic environment that needs to search quickly and scale up to 10s of millions of things? Here’s a low-level look at some of the specific steps we’ve taken to make Rails scale. Can rails scale?. We disagree. Onward!
Once we’ve exhausted the ability of the database layer, there are two paths to take. Even though people call me ‘old school’ for it, I like to start with caching things in memcache. I’ll add dalli to my project, set it up as the Rails.cache store, and get started. This begins as a pretty simple method; Here’s an example of caching something expensive:
This works great for things that are expensive to calculate, but has two problems. It doesn’t speed up actual searching, and when the cache is cold, the expensive calculation is inline and we have to start to do crazy things like this:
‘.delay’ is a magic method provided by delayed_job which enqueues this expensive computation as a background job. All those ‘force’ parameters are there to force not using the cache which makes debugging and testing a lot easier, because this get’s very crazy very fast when you have 10s of different things you’re caching, some of which include results from other cached values. Typically it makes sense for some cached things to be stored in the cache permanently, with code that updates them nightly or whenever their values change, as an empty cache is pretty useless when you’re counting on it.
Unfortunately, it’s very hard to get caching right and you will have page loads that take 10s of seconds due to things not being cached when you need them. Also, queuing up 1000s of delayed jobs with .delay (due to cache being cold) means 1000s of database inserts which isn’t quick either! Something like resque will help a bit, but it’s a tiny bit more involved to get running than delayed_job. Perhaps most annoying, things can get deleted from under you so you have to handle the possibility that an id you are using may no longer exist. Here’s one way of doing that:
So you’ve gotten to this point, caching isn’t enough, and searching needs to happen and be fast. Enter Solr and sunspot (Which our very own Andy Lindeman works on!). Adding these to your project is explained very well in the sunspot documentation, and if your app is deployed on Heroku like many of ours are, there’s a reasonably priced websolr addon that Just Works with the sunspot gem. Adding it to a model is very straightforward
Searching is also pretty straightforward, but this usage is a little more complex in the basics we want to make sure that all of the connection ids return both match the parameters and are in a potentially huge set of IDs that we are allowed to access.
Solr searching is very fast, especially when compared to bigger datasets in postgres, and using it’s search_ids method (instead of search) prevents loading all of the objects so that we can rely on the controller/view pattern above for loading only the objects we are displaying (pagination of the resulting array happens in the actual use case). This, combined with Rails 3.1’s Identity Map feature or something similar means that we can serve pages and pages of search results very quickly (200ms or so), doing fulltext matching, and displaying information from within the objects, all without having to hit the database.
Everything Just Works on Heroku, and for running things locally, theres a Procfile and foreman for that:
Whats your secret to scaling up?