PHPun with FFI: C PHP run

Larry Garfield
Director of Developer Experience
26 Feb 2020

In our last exciting episode, we covered the basics of compiling C code, both as a stand-alone executable and as a shared library. Today we’re going to plug that C library into PHP and get it all running on Platform.sh.

FFI FFI fo PHPum

FFI, or Foreign Function Interface, is a feature of many languages to allow that language to call code written in another language. PHP got that functionality in the new PHP 7.4, although as one might expect there are some bumps in the road.

One concern is that PHP is, by nature, accessed by remote systems. That creates a natural security risk. With FFI, if you could exploit any sort of hole in an application, then you could potentially achieve a remote code execution hole for system level code. On a security scale from 1–10 that ranks a “holy crap!", so by default PHP doesn’t even support that. FFI is only enabled by default from the CLI or in preloaded code.

There’s a caveat there, however. Preloading (also new in PHP 7.4, more on that in a bit) relies on the opcache. So does FFI. The opcache, however, is disabled by default since on the CLI it has nowhere to persist cached opcodes from one execution to the next. To use FFI with the CLI, therefore, we’re going to need to manually enable the opcache.

FFI on the CLI

Let’s start with some code to use the points library we created earlier. First, we’ll want a structure in PHP to mimic the point struct. (That’s not required, but mirroring value objects on both sides makes the API easier to follow.)

class Point
{
   public int $x;
   public int $y;

   public function __construct(int $x, int $y)
   {
       $this->x = $x;
       $this->y = $y;
   }
}

The next step is to tell PHP about the library. There’s a couple of ways of doing so, but we’ll start with cdef():

$ffi = FFI::cdef(file_get_contents('points.h'), __DIR__ . '/points.so');

FFI::cdef() is not an alphabet lesson gone wrong; it stands for “C definition” and takes two parameters: a string that defines the C structures to expose and the path to the .so file in which they can be found.

In this case, it’s easiest to just read in the .h file for the definition. However, PHP’s FFI doesn’t use a standard C header parser. It has its own, which as of this writing is… rather weak. It doesn’t support a lot of syntax that is completely legal in modern C code and is in fact quite common in most real C programs today. While custom written libraries like this can reuse their header, most real production C libraries will have headers that are too complex for PHP’s FFI to handle. That means you’ll need to reimplement your own.

That’s actually fine; the coupling between a header file and a library is much weaker than, say, a PHP interface and class. The header file is just saying, “There exists, somewhere, a function named distance that takes 2 points.” The .so file itself publishes “BTW, I’ve got a function named distance that takes 2 points.” As long as those line up at loading time, it will work out.

It also means that if we want to expose only a small portion of a larger library, we can write a header file that only declares those functions and structs we want to expose to PHP, leaving the rest inaccessible. That can be useful with larger libraries.

We won’t go into it here, but Anthony Ferrara has written a tool called FFIMe that wraps the FFI API to make it easier to work with and also handles preprocessing C headers into the subset of the language that PHP supports. It’s not perfect, but it’s worth investigating for serious FFI use.

We can now use that $ffi variable to access the parts of the .so file that are exposed.

// inline.php

// ...

$p1 = new Point(3, 4);
$p2 = new Point(7, 9);

$cp1 = $ffi->new('struct point');
$cp2 = $ffi->new('struct point');

$cp1->x = $p1->x;
$cp1->y = $p1->y;
$cp2->x = $p2->x;
$cp2->y = $p2->y;

$d = $ffi->distance($cp1, $cp2);

print "Distance is $d\n";

First, we make two Point objects. Then we create two variables that are bridges to the point struct from the header file. Those variables are of type FFI\CData and act as an adapter between PHP land and C land. As they’re objects we can use them as objects and they handle the translation back to C, so copying our Point PHP object values to them is a straightforward and boring process.

Finally, we can call the distance method on the $ffi object, which is an adapter for the C function. The data gets handed off to the C library which produces a double (aka a float), and returns it. Running this code dutifully prints out Distance is 6.4031242374328.

Or, well, almost. Recall that we said above that FFI requires a working opcache, which the CLI doesn’t have by default. We can either enable it in php.ini or on the command line itself. The latter is easier:

$ php -d opcache.enable_cli=true inline.php
Distance is 6.4031242374328

Now we get the output we want.

Setting up in advance

There’s one problem here, though. Instantiating the FFI library, parsing the header file, and setting up the appropriate adapters in the background takes a non-trivial amount of time, especially if the C library is larger than this simple example. On the CLI there’s no good way to avoid that cost on every request, but once we connect this code to a web request we want to skip that overhead if possible.

Fortunately, FFI offers an alternative way to bridge to a C library that has no runtime overhead that leverages preloading. Preloading is another new PHP 7.4 feature that lets you load class and function definitions into PHP-FPM’s memory once at server startup and then never reload them again. We’ve written about preloading before, and it’s a really neat addition.

You can also initialize a C library via preloading and keep it in memory, avoiding the initialization cost in future requests. By default, that is the only way you can use it in a web request, for the aforementioned security reasons. It’s possible to enable FFI::cdef() to work in web requests for development, but please never ever do so in production.

We’ll show preloading on the command line first. Since we’re using preloading anyway, we’ll move the Points class definition to another file named classes.php, like it’s 2006 all over again. (It’s retro.) Next, we need to make some additional changes to the header file by adding these two lines at the beginning:

#define FFI_SCOPE "POINTS"
#define FFI_LIB "./points.so"

The FFI_SCOPE specifies a unique identifier by which PHP FFI will recognize this library. FFI_LIB specifies the library file that it refers to. Note that the path is evaluated relative to the system library path, which does not include the current directory by default. That means the ./ is required, and if missing it will fail to load.

Another important caveat for the header file: As of this writing, a bug in PHP prevents it from parsing the header file if there are any comment lines prior to a #define line. All #define statements must come before any comments. We’ve reported this bug to PHP (see previous link) and hopefully it will get resolved, or at least documented, in a future version.

Now we can go about setting up a preload script. In this case it’s a very simple file:

<?php
// preloader.php

declare(strict_types=1);

FFI::load(__DIR__ . "/points.h");
opcache_compile_file(__DIR__ . "/classes.php");

The opcache_compile_file() call tells the preloader to load everything in classes.php into PHP’s shared memory. The other call, FFI::load(), tells PHP to load the header file then create and save the FFI adapter for that definition. The library file to use and the identifier for that library are specified in the header file.

Finally, we can setup another test script to see the preloaded version in action; it can be nearly identical to the inline version, just with a change to the FFI setup code:

<?php
// preload.php

$ffi = \FFI::scope("POINTS");

$p1 = new Point(3, 4);
$p2 = new Point(7, 9);

$cp1 = $ffi->new('struct point');
$cp2 = $ffi->new('struct point');

$cp1->x = $p1->x;
$cp1->y = $p1->y;
$cp2->x = $p2->x;
$cp2->y = $p2->y;

$d = $ffi->distance($cp1, $cp2);

print "Distance is $d\n";

The $ffi adapter is now created using FFI::scope() and specifying the previously loaded library (POINTS) that we want to use. Because the library is already loaded into memory in the preload step, this is a much cheaper call. The rest of the code is identical.

Also note that since we preloaded it, there’s no need to require the classes.php file. It’s already in memory and usable.

To run this script from the command line, we need to specify the preload script to use, again either inline or via php.ini:

$php -d opcache.preload="preloader.php" -d opcache.enable_cli=true preload.php
Distance is 6.4031242374328

Et voilà.

Improving the API

The FFI API can feel a bit unnatural at times; in fact a lot of the time, especially once you start using its more advanced features that we’re not going to get into here. Fortunately, as programmers we have a standard solution for clunky APIs: “Hey look, an abstraction!”

There’s two main tweaks we’re going to make to our code base. First, we’ll let Point objects convert themselves to an FFI variable:

class Point
{
   public int $x;
   public int $y;

   public function __construct(int $x, int $y)
   {
       $this->x = $x;
       $this->y = $y;
   }

   public function toStruct($ffi)
   {
       $cp = $ffi->new('struct point');
       $cp->x = $this->x;
       $cp->y = $this->y;
       return $cp;
   }
}

Second, we’ll wrap the FFI logic up into another object. There’s a dozen ways to do so depending on the context of what you’re doing, but we’ll go for simple now just to make a point (no pun intended):

class PointApi
{
   private static $ffi = null;

   public function __construct()
   {
       static::$ffi ??= \FFI::scope("POINTS");
   }

   public function distance(Point $p1, Point $p2): float
   {
       $cp1 = $p1->toStruct(static::$ffi);
       $cp2 = $p2->toStruct(static::$ffi);

       return static::$ffi->distance($cp1, $cp2);
   }
}

A given FFI scope is safe to reference multiple times, so rather than try to make it a global singleton and inject it, we’ll just stash it into a private static variable.

Now, we can simplify our example script to just this:

<?php
declare(strict_types=1);

$p1 = new Point(3, 4);
$p2 = new Point(7, 9);

$api = new PointApi();

$d2 = $api->distance($p1, $p2);

print "Distance is $d2\n";

No FFI in sight, but that’s what’s happening behind the scenes.

Compiling on FFIday

The next question, of course, is how to get all of this code working for web requests. It’s fairly straightforward on Platform.sh, but the moving parts are essentially the same on any server.

We’ll start with our latest code in a project repository and assume the routes.yaml and an empty services.yaml are already setup. Then we’ll add a web directory with a simple web script that is just the preload.php script from a moment ago, with our cleaned up API.

The interesting part is in the .platform.app.yaml file, where there’s a few moving parts. We’re going to compile the shared library on every build so we always have the latest version of it. Fortunately gcc is already available on all Platform.sh environments so that’s trivial to do. Here’s the complete example, with relevant comments:

name: app

type: php:7.4

# FFI is an extension. It must be explicitly enabled.
runtime:
   extensions:
       - ffi

# Setting a variable in the `php` namespace makes it an ini setting. This block
# tells PHP-FPM to run `preloader.php` on startup. It's the same file we saw before.
variables:
   php:
       opcache.preload: 'preloader.php'

hooks:
   build: |
       # Using the Makefile we defined in part 1, compile the `points.so` file.
       set -e
       make points.so

web:
   locations:
       "/":
           root: "web"
           passthru: "/index.php"

disk: 128

Now on every git push the points library will be recompiled to a new .so file. Then at deploy time, PHP-FPM will start up, load the FFI extension, and run the one-time preloader.php script to initialize our POINTS FFI scope (plus whatever else we want to do). The web configuration is as simple as can be; every request goes to our basic index.php file, which is the same code we’ve already seen.

If we wanted to ensure that the FFI library worked from the command line, say for a cron task, the easiest way to do it is to make sure the cron command includes both the opcache.preload and opcache.enable_cli directives as above. Both are required.

But is it wise? That’s iFFI

Whew! Not much code, but a lot of concepts. Is it worth it?

As is usually the case, the answer is “It depends”. Most of the time? No, it’s not. For a trivial case like this it’s counter productive.

For one, modern PHP is pretty darned fast in its own right. The performance benefits of moving mundane PHP code to C are marginal at best most of the time; eliminating a database call or two will give a much better speed up for much less effort.

For another, FFI has its own overhead. Every time you call across the PHP/C boundary, there is a translation overhead converting data structures from one style to the other. Just accessing a variable through FFI can be twice as slow as if the variable were in PHP. For trivial code like this example, it is almost certainly slower than just implementing distance() in PHP directly.

So when is it appropriate to move logic from PHP to C using FFI? I see two main use cases:

  1. When you have a significant amount of heavy processing to do. Often that will be done offline in a command line script or a worker process, but it might be done in a web request if it needs to be dynamic. Think image processing, machine learning, heavy graph manipulation, and other tasks that might entail hundreds or thousands of objects or loop iterations if done in PHP. Those will be faster in a precompiled library.

  2. When you have an existing library from someone else that you want to leverage. Historically this would require writing a PHP extension, which is much harder and more involved than using FFI. With FFI, however, you can now take an existing C library, wrap it in FFI, and use it from PHP just as you would were it an extension. The use cases are similar: image processing, machine learning, and other CPU intensive tasks for which there is a plethora of existing code written in C or C++ already. In this case, you’ll almost certainly need to custom-write a header file or use FFIMe to bridge it to PHP. I would strongly recommend also building a nicer API atop FFI, as we did in this example, to make it easier for end users to use.

If your use case falls into one of those areas, you now have a new tool with which to tackle the problem.

Gabriel Couto has collected a series of examples of using FFI with existing libraries where reimplementing them in PHP would make no sense at all. Now that you have a good grasp of FFI basics, the examples there offer the next level up.

Of course, C isn’t the only way to build shared libraries. More on that in our next chapter.