When your job is too tough for cron

09 August, 2017

Larry Garfield

Director of Developer Experience

Cron jobs are good. Cron jobs are nice. But sometimes, a Cron job just isn’t enough. Running a task now and again may be good for some jobs, but what if you have a really big task. A task that needs muscle from a long-running process.

For the big tasks a Cron job just isn’t going to cut it.

What you need is a dedicated worker instance; Worker application instances that Platform.sh now provides, of course.

What's wrong with Cron?

Cron tasks are fine for what they're good for: Relatively cheap single tasks that run at a specific time or times. Platform.sh has supported custom cron jobs since the stone age. (That's about 3 years in Internet time.) They have a number of limitations, though:

Cron tasks on Platform.sh run on the same container as your application instance, which means they're competing for resources.
A Cron task succeeds or fails entirely, making it a poor fit for a task that can be worked on incrementally.
We limit cron jobs to running no more often than every 5 minutes, which means a task that needs to be done "now, but not in the web request" may happen as long as 5 minutes later.
A running cron task blocks a new code deploy
If a cron task is still running when it's next triggered they may step on each other's toes and confuse the application state unless you're very very careful about how it's written.

As a general rule of thumb Cron jobs should be used when something needs to happen at a specific time. For example if you have to transfer a CSV export, every day, after midnight.

What’s a worker?

Workers are more general and flexible than Cron jobs. They generally work iteratively on fine-grained tasks, like a queue, and because they're a persistent process can work on task immediately once it's enqueued.

On Platform.sh a worker is a parallel instance of your application that doesn’t respond to web requests. Instead, it runs a different, persistent background process. It’s the exact same code, just running a process other than listening to incoming web requests.

This makes it incredibly easy to first implement your application in a traditional, synchronous manner, and with trivial changes make the heavy-lifting lazy and asynchronous. This is useful for bulk processing; for handling large queues; for long-running tasks; or anything else that needs to be done “as quickly as possible but don’t block the web request for it.".

(If what you need is a totally separate application, possibly written in a different language, check-out our multi-application support.)

It's also possible to mix-and-match. For example, a weekly mass-mailing could be triggered by a Cron task that runs once a week and enqueues a long list of emails to contact. A worker would then immediately begin churning through that queue and sending out emails one by one. (Sending email is far more time consuming than just enqueuing the tasks to be done.) That's far more robust, as well as potentially faster; you can even create multiple identical workers to work through the queue even faster.

Using Platform.sh workers can make your web application faster and more responsive while simultaneously reducing the latency of your background jobs and making the whole system more robust and easier to manage.

Great, so how's it work?

Setting up a worker is quite simple. For the most basic case, just add something like the following to your .platform.app.yaml file:


workers:
  queue:
    commands:
      start: |
          php worker.php

That will cause your application to be deployed twice, in 2 separate containers: One to handle web requests (exactly as it does now), and a second container named “queue" with the exact same code that will run worker.php instead of a web server. It doesn’t have to be a PHP script, of course. It can be any command your application container can run.

For example, a Drupal site can use the Drush Waiting Queue module to run its queue as a real queue, rather than piggy-backing on cron. A Ruby on Rails application can use sidekiq with a persistent Redis instance to churn through a long queue of background tasks. Symfony or Laravel developers can do the same with the PHP port of Resqeue, while Python people can opt for Celery.

Workers can also be backed by a dedicated queue server that offers more fine-grained functionality such as the RabbitMQ message-queue. It's your app, so pick the approach that works best for you.

Workers are of course much more flexible than just the few lines shown above, but you can see the documentation for the full low-down on how you can customize a worker instance to your needs, and even spin up multiple workers for different tasks.

For the tough jobs, bring a real worker process to the task.