fbpx

Blogs from the Ranch

< Back to Our Blog

Never use Resque for serial jobs

cereal

A long time has passed since we’ve last spoken about processing background jobs. Much has changed regarding the tools for asynchronously processing long running tasks in Ruby and Rails. Most recently we’ve favored Resque, especially now that Heroku’s cedar stack supports it.

There’s one problem with Resque. Enforcing strictly serial job semantics is impossible in Resque without custom development or limiting the number of workers.

So, NEVER use resque for serial jobs, OR read on to find how we resolved this dilemma with resque-lonely_job gem, a new resque plugin.

First, let’s review the simple solution. One queue, one worker. Voila, you now have strictly serial job completion.

However, your awesome new serial job semantics reduce throughput. What happens when only some of the jobs in your queue depend upon serial completion? We found ourselves in just such a situation recently.

Here’s the situation. One of our clients would like to offload data importing to background tasks. Due to operational constraints outside of our control and the multi-tenancy requirement of the application, we place all import jobs in the same queue. With a default Resque job and multiple workers, there exists the possibility that jobs for a given account could be executed in parallel. Since there an exists a hard dependency between jobs associated with the same account, we need to find a way to enforce serial job semantics.

To do this, we created resque-lonely_job. With Lonely_job, a worker’s before_perform hook attempts to grab a mutex lock. The interesting aspect of Lonely_job is that you may overwrite what characteristic of the job is used as the mutex.

Since the redis_key method receives all the arguments of the perform method, you can use any level of granularity to distinguish which jobs must follow the serial ordering semantics.

Now, if the worker fails to acquire the lock, the worker re-enqueues the job.

Uh oh, watch out! Here’s where you can get into trouble. Remember, we want strict serial ordering amongst jobs and there’s a distinct possibility that jobs may be re-ordered. To examine whether a job may be performed, a worker removes the first job in the queue. A possible race condition may occur where two workers acquire jobs, both of which are currently blocked due to an existing job being processed by a third worker. When re-enqueueing the blocked jobs, they may reverse the order of the jobs.

Furthermore, with the default re-enqueueing behavior of placing popped jobs at the back of the queue, there could be subsquent jobs after the two blocked jobs that are now before them!

Yikes! This is unacceptable! Moreover, all this crazy complexity is because we have no before_dequeue hook available in Resque. :disappointed: There is another way.

At this point, we have a set of arguments which are passed to the job’s perform method. We’ll subdivide the arguments into two sets. Subset A of the arguments is used as the mutex; let’s call this the account_id to give us something concrete to work with. Subset B is the actual payload that represents the key to performing the work. In Rails, this could be an ActiveRecord id so let’s call this the import_id. The general approach is to create a custom enqueue method for our Job that divorces the account_id from the import_id. Check out the sample implementation.

With this implementation, we no longer have to worry about the order of the jobs in the Resque queue as our “ImportJob:#{account_id}” contains the order of the work to be performed.

What other techniques or frameworks do you use to enforce serial job order semantics?

Not Happy with Your Current App, or Digital Product?

Submit your event

Let's Discuss Your Project

Let's Discuss Your Project