Route all the things

20 December, 2017

Larry Garfield

Director of Developer Experience

Platform.sh has always put a great value on customer flexibility. That flexibility at times can seem bewildering, however, as users have an ample supply of buttons and dials to to customize their application architecture, hosting, and workflow. In particular, the routing system built into every project offers an enormous amount of flexibility (single application, multi-application, microservice, arbitrary redirects, etc.), but how exactly does it work?

Since we've just added new functionality for users hosting many domains, let's take this opportunity to take a deep dive into the routing system.

Short version: You can now specify an {all} placeholder in .platform/routes.yaml that maps all incoming domains to the same application, making multi-domain applications much easier to manage.

Long version:

How Platform.sh does routing

Every project on Platform.sh has a master environment for production, but it can also have many other environments for testing and development, potentially one for each Git branch. Each branch/environment has its own copy of the application, its own database server, and its own data, all on a series of lightweight containers.

Each environment also has its own "router", which is also a container. All requests to any application in your project go through this router.

But how does our infrastructure know for a given incoming request to which project/router container it should go?

That's where routing comes in, which is controlled from the routes.yaml file. There are actually two layers of routing involved, one "edge router" and one "environment router".

The edge router maps incoming requests to a region to the right environment router.
The environment router maps incoming requests to the right application container.

Both routers are updated every time you deploy to any branch, based on the information in the routes.yaml file and the domains configured in the project. Those get merged to produce essentially giant lookup tables in each router. (It's a bit more complex than that, but close enough for a blog.)

Suppose we look at the industry standard website, https://example.com/. Its routes.yaml file would most likely contain 2 standard entries:

https://{default}/:
  type: upstream
  upstream: app:http
https://www.{default}/:
  type: redirect
  to: https://{default}/

The project is also configured with a single domain, example.com. That means at deploy time on the master branch the above entries effectively turn into:

https://example.com/:
  type: upstream
  upstream: app:http
https://www.example.com/:
  type: redirect
  to: https://example.com/
http://example.com/:
  type: redirect
  to: https://example.com/
http://www.example.com/:
  type: redirect
  to: https://example.com/

That's right, four entries. By default, Platform.sh automatically creates a redirect from any HTTP route to its HTTPS equivalent. We're just doing our part to make the web more secure for everyone.

Two things happen with that list. First, the router container for that specific environment is created and the above list of domain instructions are turned into a configuration file for it. Three of the rules are just redirects, while the first entry tells router to proxy the incoming request to the application container named app in the same environment.

Second, there's two domains in the list: www.example.com and example.com. It doesn't matter what their configuration is; an entry is added to the edge router's lookup table that both of those domains should be forwarded to the router container for that environment. Our edge router is a custom, high-throughput proxy server that focuses on just one thing: Proxying incoming requests to the correct router container.

When a request comes in for example.com (like the request you send to view this page), the edge router looks up "example.com" in its table and finds that the request should go to a particular router container. It then proxies the request to that router. The router sees the request and that it should forward it to the application container with which it was deployed. The request then gets proxied to the application container where an Nginx/PHP-FPM setup is waiting for it, which sends back a response. (You could just as easily be using Ruby or Python or Go on the app container; everything else is exactly the same.)

Strictly speaking, that Nginx process on the application container is a third layer of "routing", and the one that offers end-users the most configuration. Depending on the .platform.app.yaml file, it could serve static files from disk, hand the request off to PHP-FPM or a Node.js process, add additional headers to the response, and many other things.

Of the three, only the router container does any caching, assuming its configuration and the HTTP headers of the response tell it to. (More on that another time.)

Multiple environments

This multi-layer routing system offers an incredible amount of flexibility at surprisingly little overhead. In particular, it makes it possible for us to create an effectively infinite number of additional environments for your Git branches. Suppose we have an update branch where we are testing an update to one of the Drupal modules that powers this site. We obviously need a new domain for that environment, since example.com is already used by the production site. Our system generates that new domain dynamically based on the branch name and project. That's where the common "gibberish domain" for dev branches comes from: $branch_id-$project_id.$region.

On a dev branch, then, the exact same process happens as in production. The only difference is that instead of looking at the configured domain for the site (example.com), we use the generated domain. Otherwise the process is identical.

Multiple applications

Platform.sh also supports multiple related applications in the same environment. That could be a front-end application and a backend API. It could be a micro-services setup. It could be a dynamic website and a queue worker in another language. It could be a static website with one directory a dynamic blog application. (Yep, you can route subdirectories to different applications in routes.yaml. Neat, huh?) Or whatever else works for your use case.

Let's pretend for a moment that we wanted to redesign our website to be a static main site with a WordPress blog at blog.example.com. We would modify our routes.yaml file like so:

https://{default}/:
  type: upstream
  upstream: site:http
https://www.{default}/:
  type: redirect
  to: https://{default}/
https://blog.{default}/:
  Type: upstream
  Upstream: blog:http

There's now 2 applications defined: site has whatever static site generator builds the main site, and blog is the WordPress blog. Both are in separate directories with their own .platform.app.yaml file.

Now on deploy, the edge router gets three entries (example.com, www.example.com, blog.platform.sh), all pointing at the router container for the environment. That router now has six entries:

Three HTTP->HTTPS redirects
One www.example.com->example.com redirect
One example.com->site container proxy
One blog.example.com->blog container proxy

We could of course have put the blog at example.com/blog instead, and just had one fewer domains registered in the edge router.

It's also possible to add wildcard subdomains. Say we want all subdomains of example.com to be handled by WordPress, not just blog. Then we'd need only do:

https://*.{default}/:
  Type: upstream
  Upstream: blog:http

Both the edge router and the router container will now get a wildcard configuration and everything will get proxied as you'd expect. The one caveat is that Let's Encrypt doesn't support wildcard certificates just yet, so for the moment you'll need to bring your own wildcard SSL certificate. That's expected to change in January, though, and we'll start supporting automatic Let's Encrypt wildcard certificates as soon as they tell us we can.

Multiple domains

Now we get to the fun part. What happens if you have multiple apex domains, not just subdomains? Suppose we want to have exampleblog.com instead of blog.example.com. What then?

Platform.sh allows you to associate any number of domains with a project, but only one gets associated with the special {default} placeholder. Fortunately, the others can still be used in routes.yaml literally. To wit:

https://{default}/:
  type: upstream
  upstream: site:http
https://www.{default}/:
  type: redirect
  to: https://{default}/
https://exampleblog.com/:
  Type: upstream
  Upstream: blog:http

In production, this works exactly as you'd expect. It does the exact same thing as the blog.example.com version, except with a different domain name. It's on a development branch that it becomes interesting. What happens to exampleblog.com?

Simple: We toss that domain name into the mix to produce the generated domain name. Strictly speaking we always do, but if the domain is just {default} you don't see it. Any static part of the domain in routes.yaml is simply prepended to the domain name we generate, so in the "update" branch we end up with exampleblog.com.update-abc123-cdhuk7d6hhcsg.us.platform.sh. In fact, our previous blog.example.com example would have a dev branch of blog.update-abc123-cdhuk7d6hhcsg.us.platform.sh.

(At this point those of you who have been with us for a while are probably wondering where the ---s went. As of December 2017, we are using periods rather than dashes to separate the generated parts of the domain, which makes the domains more compatible with Let's Encrypt. Existing projects didn't change to avoid breaking any existing links or DNS records you may have. If you want to switch your project to using dots, though, just file a ticket through your project interface and we'll flip the switch for you.)

Route all the domains!

The final use case to consider is a single container hosting multiple apex domains. A common example of this setup for our clients would be with the Drupal Domain Access module. Domain Access lets a single Drupal site serve multiple domains, with subsets of the same content available depending on which domain is being viewed. That is, one application would serve both (for example) example.com and example.net. How do we make the router handle that?

It's now super-simple. First you'd configure both domains in your project. It doesn't matter which is the default. Then, you'd put the following in your routes.yaml:

https://{all}/:
  type: upstream
  upstream: app:http
https://www.{all}/:
  type: redirect
  to: https://{all}/

The {all} placeholder iterates over all configured domains and makes an entry for each. On production, therefore, the above configuration would expand out to:

https://example.com/:
  type: upstream
  upstream: app:http
https://www.example.com/:
  type: redirect
  to: https://example.com/
https://example.net/:
  type: upstream
  upstream: app:http
https://www.example.net/:
  type: redirect
  to: https://example.net/

(Plus an HTTP->HTTPS redirect for each that I've omitted for space.) That will result in four entries (one for each domain) being added to the edge router, all pointing at the router container, and the router container will get 8 entries in its lookup table (the four above plus the HTTPS redirect for each). If we add a third domain, another 2 effective routes will get to the list.

What about development environments? The same pattern applies. The update branch would effectively turn into:

example.com.update-abc123-cdhuk7d6hhcsg.us.platform.sh
www.example.com.update-abc123-cdhuk7d6hhcsg.us.platform.sh
example.net.update-abc123-cdhuk7d6hhcsg.us.platform.sh
www.example.net.update-abc123-cdhuk7d6hhcsg.us.platform.sh

And then of course you can mix and match all of the above functionality as desired.

Even more flexibility

Each of those routes, of course, can also have a variety of other configuration on it. Path-specific redirects, caching, TLS configuration, and so on can be configured per-route. As mentioned, you can even set routes for specific paths that will have their own configuration; you can use that to set different cache rules for a mostly-static site vs. a blog vs. a Websocket path, for instance. Or create redirects from old, legacy URLs to current ones, right in the router. Or... various other things. Our documentation has more details on the many other options available.

So that's routing on Platform.sh. Your site is under your control, just as it should be. We just make it all work for you.