PHPun with FFI: Getting Rust-ic

php

02 March, 2020

Larry Garfield

Director of Developer Experience

So far in our FFI series, we've covered how to compile a C library and plug it into FFI on Platform.sh.

However, FFI doesn't only work with C. It works with anything that can compile to the C ABI. Many languages can, although the best to use are those that do not have their own runtime or garbage collection as that can result in very odd mismatches when sharing data structures. The most popular runtime-free C-alternative language these days is Rust, and guess what? It works with FFI, too.

More alphabet soup

ABI stands for Application Binary Interface. It's the same concept as an Application Programming Interface (API), but more low-level. Whereas an API specifies things like "This function takes an integer as its first parameter and a string as the second," an ABI specifies things like "The first four bits mean this thing, the next three mean this other thing, and the bits are in Little Endian order.”

To make things even more difficult, ABIs also depend on the CPU architecture, operating system, and a few other factors. That's why code compiled for a 64-bit x86 architecture on Linux won't work on a 32-bit ARM processor running FreeBSD: The ABI is different, even if the source code is the same, so the bits are literally not in the place the operating system expects.

On the upside, it means that two compiled languages can interact as long as they use the same ABI, and many compiled languages let you pick what ABI to compile to.

Rust in a nutshell

As with C we won't cover all of Rust; the official Rust documentation is already quite good and covers both the language and how to build Rust in detail, so we'll just cover the relevant bits here. I'm also going to assume that you've installed Rust on your system, per the documentation.

Let's start by porting our points library from C to Rust. Unlike C, Rust does not have a separate header file. Instead, like most modern languages, it has a module and package system that lets you explicitly expose certain symbols to others. It also really wants to build code as part of a package, which in Rust is called a "Crate.” We'll follow its lead.

We'll put our crate into a directory called points for better organization. A crate's definition starts with a Cargo.toml file. (Cargo is the Rust package manager, akin to Composer for PHP folks.) In ours, we'll say we're creating both a "lib" package type (for building native Rust) and a "cdynlib" package type (for building a C-compatible library).

[package]
name = "points"
version = "0.1.0"
authors = ["Larry Garfield"]
description = "A demonstration library to show how to build Rust for use with PHP-FFI."
edition = "2018"
build = "build.rs"

[dependencies]

[lib]
crate-type = ["cdylib", "lib"]

Most of it is pretty self-explanatory, but note the crate-type, which is the only non-default value. As with C, we don't really need to build a stand-alone executable, but I'm including it for completeness.

Code in a cargo package lives in a src directory, and the main file for the library is named lib.rs. Here's our point library ported to it:

pub struct Point {
   pub x: i8,
   pub y: i8,
}

pub fn compute_distance(p1: Point, p2: Point) -> f64 {
   let a_squared = (p2.x as f64 - p1.x as f64).powf(2.0);
   let b_squared = (p2.y as f64 - p1.y as f64).powf(2.0);

   (a_squared + b_squared).sqrt()
}

#[cfg(test)]
mod tests {
   use super::*;

   #[test]
   fn test_distance() {
       let p1 = Point{x: 3, y: 2};
       let p2 = Point{x: 6, y: 6};
       assert_eq!(compute_distance(p1, p2), 5.0);
   }
}

compute_distance() is the same as our previous distance() function, just renamed for reasons I'll explain in a moment. The Point struct is nearly the same but capitalized per Rust standards. It also explicitly specifies an 8-bit integer, as Rust requires us to specify the size of an int type. Of note, both the struct and the function are marked as pub, meaning code outside of the crate will be able to access it. Without that, they'd be hidden from other libraries such as FFI itself.

Rust also includes a built-in testing framework as part of the language tooling, so we've included a simple test here on principle, because that's just sweet.

Now let's add our main executable, which we'll call main.rs:

use points;

fn main() {

   let p1 = points::Point{x: 3, y: 4};
   let p2 = points::Point{x: 7, y: 9};

   println!("Distance is: {}", points::compute_distance(p1, p2));
}

Nothing really exciting here from our perspective, but of note the main file is part of a different logical crate than the library, so it needs to use the points library in order to have access to it.

We can now run our simple test with cargo test and build the application with cargo build --release, the latter indicating to produce a release-targeted version of the program rather than a debug build (the default). The compiled output will live in a target/debug or target/release directory, accordingly.

Running the resulting points/target/release/points file will dutifully print Distance is: 6.4031242374328485.

Rust for C

Compiling our new Rust library into a C-compatible library takes a few tweaks. First, we need to tell Rust that certain parts of the code should be compiled to the C-ABI, so that they "look like C" to other libraries. Update lib.rs like so:

#[repr(C)]
pub struct Point {
   pub x: i8,
   pub y: i8,
}

#[no_mangle]
pub unsafe extern "C" fn distance(p1: Point, p2: Point) -> f64 {
   compute_distance(p1, p2)
}

The #[repr(C)] on Point tells the compiler to "represent" it as "C" code.

The second part is a simple wrapper function for our compute_distance() function. It's not strictly necessary to separate them in this case, but it's good practice. That's because we need to mark the function in a couple of ways:

#[no_mangle] means Rust should not mess with the name of the function. Normally the compiler is free to rename things as part of internal code generation, but this flag tells it not to so that the name as seen from C is predictable.
extern "C" means "Compile this bit in an external format, specifically C.” There are other extern options that aren't relevant for us here.
unsafe means "You know all that intensive memory safety checking that Rust is famous for? Don't bother." It's the "Trust me, I know what I'm doing" flag. It's not strictly necessary in this case but often is when you need to manually marshall, say, strings from C to Rust and back again, which requires doing things Rust won't normally let you do because they're dangerous.

Now when we run cargo build --release, we'll get an additional file named target/release/libpoints.so. The name is slightly different, but it's the same idea as the points.so file we built with C, and PHP will recognize it just the same.

PHP still needs a header file, though. We're again going to get caught by PHP's weak header file parser. There are tools to produce a C-compatible header file from a Rust library, but sadly they use syntax that is beyond the bare bones version supported by PHP. For completeness we'll show how to hook that up, but we're going to have to manually modify it afterward.

Generating the header file

First, add the following to the Cargo.toml file:

[build-dependencies]
cbindgen = "^0.6.0"

That adds a new dependency during build only, so it won't be included in the library itself. It installs the "C Binding Generator" (cbindgen) package. On build, Cargo will download that package if necessary, compile it if necessary, and then let you use it.

Second, we need a program to use it. There's a special program that gets built and used during compilation that we can leverage. build.rs goes in the main package directory, not in the src directory:

extern crate cbindgen;

fn main() {
   let crate_dir = std::env::var("CARGO_MANIFEST_DIR").unwrap();

   cbindgen::generate(crate_dir)
       .expect("Unable to generate C bindings.")
       .write_to_file("points.h");
}

This is fairly boilerplate code. All it does is load the cbindgen library, then tell it "analyze the current crate and write its equivalent C header file to points.h."

Now when we compile the code, we'll get an extra points.h file produced in the crate directory:

#include <cstdarg>
#include <cstdint>
#include <cstdlib>

struct Point {
 int8_t x;
 int8_t y;
};

extern "C" {

double distance(Point p1, Point p2);

} // extern "C"

It's very similar to the one we wrote before, but with some notable differences:

It #includes several other libraries by default that we don't need. It doesn't hurt anything.
The Point struct uses int8_t values instead of int. They end up being close enough to the same thing.
There's an extra extern block around the distance() definition. That sadly breaks the PHP FFI header parser.
The parameters to distance() are missing the struct keyword. This is also legal in modern C but breaks the FFI parser.

While it's in concept possible to manipulate the output as it's being generated from build.rs, it's frankly less work to just copy the file and modify it ourselves. We can commit our modified version to Git and just ignore the generated one, unless we change the code in the future and need to regenerate it.

Here's the modified header file:

#define FFI_SCOPE "POINTS"
#define FFI_LIB "./points/target/release/libpoints.so"

struct Point {
 int8_t x;
 int8_t y;
};

double distance(struct Point p1, struct Point p2);

It looks nearly identical to the one for C (which shouldn't be surprising), but with the size-specific integer type and a different path for the FFI_LIB variable.

Speaking of ignoring things, ensure the generated points.h file and the target directory are excluded via .gitignore.

Rusty PHP

The PHP code for leveraging our new Rust-ic libpoints.so is virtually identical to what we saw before for C. The only difference is that when the code refers to struct point, it now needs to say struct Point (note the capital), and the paths to where the header and .so files live are different. Otherwise, nothing changes.

Score.

Rust on Platform.sh

There's a bit more work to do in order to build Rust on Platform.sh, as currently the Rust compiler does not come standard on app containers. It's not difficult to install, though. We have a HowTo on installing Rust posted on our community forum. In short, we need only modify our build hook to this:

     set -e
      curl https://sh.rustup.rs > rustup.sh
      sh rustup.sh -y --default-toolchain stable
      rm rustup.sh
      . "$HOME/.cargo/env"
      cd points
      cargo build --release

(It's almost identical to the post above, but we have to change into the right directory first.)

That will compile the Rust library on every build, using a freshly downloaded copy of the Rust toolchain.

There's one catch, though: compiling Rust (or any compiled language, really) may be non-fast. The Rust compiler does a pretty good job of caching intermediate steps to minimize wasted time, but in the current setup that cache isn't available on the next build as it only exists in the build and later. Is there a way to preserve that?

The Build Cache

There is. The Platform.sh build environment includes a "build cache" directory, defined in the $PLATFORM_CACHE_DIR environment variable, that persists from one build to the next. That makes it a useful place to stash build data that is not needed at runtime but we do want to reuse from one build to another. Take note, though, that the build environment is branch-agnostic, meaning the cache will be used by all builds, regardless of what branch they're on.

There's a number of ways one could leverage the build cache. For our case to keep it simple what we're going to do is:

Install Rust in a directory in the build cache.
Copy our source code to the build cache.
Compile our source code in the build cache.
Copy the build cache back to the application directory, now that it includes the build libpoints.so file.

It would be slightly more efficient to only copy back the compiled file we want, but that may impact the path the file lives at. This approach keeps things simple at the cost of a few extra files. If our compilation were much larger than this simple example that might be worth the effort, but for now this is sufficient to demonstrate the concept.

Here's our new, cached build hook:

set -e

cd $PLATFORM_CACHE_DIR
mkdir -p rust
# Tell Rust to install to the build cache, not to the home dir.
export CARGO_HOME=$PLATFORM_CACHE_DIR/rust/.cargo
export RUSTUP_HOME=$PLATFORM_CACHE_DIR/rust

# Copy the latest source into the build cache directory.
cd rust
rsync -r $PLATFORM_APP_DIR/points/ .
# Only install Rust if it's not already installed.
if [ ! -f rustup.sh ]; then
   curl https://sh.rustup.rs > rustup.sh
   sh rustup.sh -y --default-toolchain stable
fi
export PATH=/mnt/cache/app/rust/.cargo/bin:$PATH

# Build the library.
cargo build --release

# Copy the build cache result back to the application directory.
cd $PLATFORM_APP_DIR/points
rsync -r $PLATFORM_CACHE_DIR/rust/ .

Et voilà. If we ever want to clear the cache and start over, the Platform CLI command platform project:clear-build-cache will wipe it clean for the next build.

In closing

As with C libraries, Rust-through-FFI should not be taken lightly. In most use cases it's not going to be worth it to port your application to Rust, as IO will swamp any benefits you see.

If you have an existing library to leverage, however, or you're doing something especially CPU intensive, then FFI may be worth considering. It's a bit clunky for now but it does work, and it works well on Platform.sh.

A complete, working example of both the C and Rust versions is available on our examples site. Just push it to a new Platform.sh project to see it in action or to reference it for your own work.