PHPun with FFI: Just enough C

Larry Garfield
Larry Garfield
Director of Developer Experience
20 Feb 2020

One of the new and exciting features of PHP 7.4 is its support for Foreign Function Interface, or FFI. FFI provides a much easier way to load code from other, faster languages into PHP than writing an extension. That said, easier doesn’t mean trivial. There is some work involved, and it’s not appropriate in all situations.

Anatomy of a C library

The most common language for FFI integration is C, in part because of its ubiquity and in part because it’s the most common language for people who want to squeeze the most performance out of their computer. C doesn’t work like PHP, though, so let’s go over some basics of C.

To be clear, this is not a tutorial on C. It’s a tutorial on building C, and covers just barely enough to get you running with PHP-FFI. How Stuff Works has a reasonable crash course on C itself if you are interested.

C is a compiled language. That means you write source code in one file and then run a compile command on it, which produces a separate, machine-readable-only file. That’s in contrast to PHP and other interpreted languages, which will compile the source file into executable code on the fly and keep it in memory, discarding it when done.

C source code lives in a file that ends in .c. The code can be compiled into several different forms:

  • A stand-alone executable, or “binary”
  • A static library, which can then be combined with other static libraries into a stand-alone executable
  • A shared library, which can be loaded at runtime by a stand-alone executable or several stand-alone executables

A single binary is the easiest to build, but for PHP-FFI we need a shared library. We’ll build both for demonstration purposes.

Another aspect of C is that the source file alone is not enough. C also has “header files,” which end in .h, that define the interface of a package. It’s a similar concept to interfaces in C, but more general. The header file defines the functions and data types that a library exposes to other libraries. PHP doesn’t really have an equivalent, but it’s somewhat akin to “exports” for Javascript modules; not every function or data type needs to be made public.

Write your own

Let’s start with a simple header file to demonstrate how it all works. Here’s our initial points.h:

struct point {
   int     x;
   int     y;
};

double distance(struct point first, struct point second);

This file defines one struct called point. A struct is similar to a class, but it has no methods. (Historically, a class is “a struct with methods” rather than the other way around.) This struct is composed of 2 integers, x and y.

The header also declares one public function, distance, that takes two point structs and returns a double. That’s C-speak for “double-precision floating point number,” which PHP developers will recognize as “float.” Just like a PHP interface it does not define the body, just the signature. The point of the header file is to tell other code, “This is what I’m going to look like when compiled.”

We’ll come back to the header file when we talk about FFI, but that’s enough for C.

Now we need a file for the implementation of distance:

#include "points.h"
#include <math.h>

double distance(struct point first, struct point second) {
   double a_squared = pow((second.x - first.x), 2);
   double b_squared = pow((second.y - first.y), 2);

   return sqrt(a_squared + b_squared);
}

The #include lines specify what dependencies this library has. The first, in quotes, indicates the points.h file in the same directory. That’s what we wrote a moment ago. The second, <math.h>, means the math.h header file that’s available in the system’s standard library directory. The standard C library’s complex math functions are provided in a separate library from the baseline, because when they were first defined in the 1980s, math features on CPUs were expensive and rare, so people would often skip them and reimplement just the few bits they needed themselves. That’s no longer the case, but old standards die hard.

The distance method computes the distance between two points using our old friend the Pythagorean Theorem. The pow() and sqrt() functions come from the math library. (Side note: The sqrt() function is pronounced “squirt.” I will tolerate no disagreement on this point.)

Finally, we have a third file that will make use of the distance() function. A stand-alone executable needs to have a function named main(), which is what gets executed when the program runs. Ours looks like this:

#include <stdio.h>
#include "./points.h"

int main() {

   struct point p1;
   struct point p2;

   p1.x = 3;
   p1.y = 4;
   p2.x = 7;
   p2.y = 9;

   printf("Distance is: %f\n\n", distance(p1, p2));

   return 0;
}

It should be fairly familiar at this point. #include <stdio.h> provides the function signatures for the standard input/output library, while #include "./points.h" gives this file access to the functions from our points library. The main() function creates two points and then computes and prints their distance. printf() is part of the stdio library and works basically the same as in PHP. (Or rather, PHP’s printf() is just a thin wrapper on C’s.)

Make it build

Now that we have our source files, we need to build them. This is actually a multi-step process in C. Of course, you’ll need a C compiler and other build tools. On a typical Linux system there is a package named build-essentials (or similar) that you can install that will include everything we mention here. For other platforms consult their documentation as installing them yourself can be a bit involved.

Because there’s several steps involved we’re going to use a “make file.” GNU Make is the original granddaddy task runner; Ant, Phing, Grunt, Gulp, DoIt, and the rest are all reimplementations of Make. Make is itself a very thin layer on top of shell scripting, for better or worse. (I would link you to its documentation, but unfortunately its documentation is shockingly arcane given how simple Make itself is.)

We’ll start with a file named Makefile (capitals matter) with a single “build target”:

# Makefile

points.o: points.h points.c
  gcc -c points.c

That creates a single build target, points.o, which depends on two files: points.h and points.c. It’s typical for build targets to be the name of the file they produce. Indented under that line is one or more shell commands to run. (Note: The indentation must use a tab character, not spaces. Them’s the rules, I didn’t make ‘em.)

gcc is the GNU Compiler Collection and is the most widely used C compiler, but not the only. It has about 14 million possible options, of which we’re going to use about four. The -c switch means “Compile this file, but don’t do any of the other steps.” The output of that command is a new file named (surprise!) points.o. This is the “object file,” or the raw machine code version of points.c.

If we run nm points.o, it will list the “symbols” in the object file. (nm is short for “name” and dates from an era where memory was so tight and keyboards so hard to use that programmers didn’t believe in vowels.)

0000000000000000 T distance
             	U _GLOBAL_OFFSET_TABLE_
             	U pow
             	U sqrt

That lists four functions encoded into the object file: Our distance function has its source code included (hence the T), while pow, sqrt`, and some internal bookkeeping are also mentioned but not defined internally. That’s an indication to the next step, “Hey, I need these things.”

We also want a target to compile the main.c file:

main.o: main.c
  gcc -c main.c

Next we have a choice. We can build a stand-alone executable for main, or we can turn our points code into a shared library. For demonstration’s sake we’ll do both:

main: main.o points.o
  gcc -o main main.o points.o -lm

This gcc command reads “compile to the output (-o) file main, using object files main.o and points.o, and link against library m.” The target main depends on the other targets main.o and points.o so will run those targets if necessary to produce those files if they don’t already exist.

The m warrants extra mention; it can be read as “Here’s another object file to include, named libm, in the standard library directory on the system.” The “lib” part of the name gets dropped in the -l command. (That’s lowercase L, for those reading this in a sans-serif font.) m in this case is the C standard math library we mentioned before. (See previous note about programmers in the 80s being allergic to typing full words.)

The process of taking multiple object files and plugging them into each other is called “linking.” When they’re all plugged together into a single executable file, it’s called “static linking.”

We can now run make main at the command line. That will:

  • Compile points.c to a machine code file points.o.
  • Compile main.c to a machine code file main.o.
  • Superglue those two files together, along with the C standard math library, and wrap it up into a format that the operating system knows how to execute.

Now running ./main on the command line should produce the following output:

$ ./main
Distance is: 6.403124

Yay.

There’s two more steps we need, though. First, main is not useful for PHP-FFI. For that we need a shared library. That will take another build target in Makefile:

points.so: points.o
	# Wrap the object file into a shared object.
	gcc -shared -o points.so points.o -lm

That command reads, “Create a shared library, output to points.so, by combining points.o and the libm library.” It’s very similar to the command for main, but this time we produce a .so file, or “Shared Object” file. That takes the same code as the stand-alone binary, but packages it up differently with hooks for other C programs to load it into memory separately and access it on-demand. It’s the .so file that we’ll need for PHP-FFI.

There’s also one other target we need to be complete, and that’s clean:

clean:
  rm -f *.o *.so main

Any time you’re compiling code, you want to be able to wipe out your compiled versions and start from a clean slate. In this case we just rm all compiled files. Make sure you add .o and .so to your .gitignore file, too, since you don’t want to commit those to Git.

Here’s our final Makefile so far:

points.o: points.h points.c
  gcc -c points.c

points.so: points.o
  gcc -shared -o points.so points.o -lm

main.o: main.c
  gcc -c main.c

main: main.o points.o
  gcc -o main main.o points.o -lm
  chmod +x main

clean:
  rm -f *.o *.so main

Onward to PHP!

Building a C program can easily get vastly more involved and complex than this trivial example. The goal today has been to get our feet wet and explain enough of the anatomy of a C library that we can start plugging it into PHP’s FFI layer. The most important parts in the end will be points.so and points.h, both of which we’ll need for PHP.

We’ll cover FFI in the next chapter.

(Many thanks to Anthony Ferrara for his help explaining how to build C.)