Using AWS S3 snapshot repository for Elasticsearch

elasticsearch community s3 backups

18 August, 2021

Sergii Dolgushev

Lead Developer

Contextual Code

Contextual Code specializes in enterprise-level projects for state government agencies. We routinely tackle difficult web content management implementations, migrations, integrations, customizations, and operations. We know what it takes to get a project off the ground and onto the web.

We use Platform.sh as our primary hosting platform because it’s incredibly flexible and it provides a vast list of services that can be set up very easily.

There are a few possible scenarios, such as creating additional backups or syncing data to local development environments, when you might need to extract data from those services. In many cases, it’s simple to extract this data when you’re using Platform.sh. For example, you can get MariaDB/MySQL via the Platform.sh CLI tool command.

But in some cases, it’s not so simple to extract the data you need; more advanced tools are necessary. We covered one such case in our Backup Solr on Platform.sh blog post. Today we’ll cover another—how to use AWS Elasticsearch S3 snapshot repository for Elasticsearch on Platform.sh.

Getting started

First, let's make sure we have an Elasticsearch service in .platform/services.yaml:

elasticsearch:
  type: elasticsearch:7.2
  disk: 256

Then let’s inject the service into the application via the elasticsearch relationship in .platform.app.yaml:

relationships:
  elasticsearch: elasticsearch:elasticsearch

Also in AWS Management Console we need to:

Create a new AWS S3 bucket
Use AWS IAM to create a new user with read and write permissions for the newly created bucket

Registering the Elasticsearch S3 snapshot repository

The Elasticsearch S3 plugin is extremely easy to enable on Platform.sh. We just need to add repository-s3 in configuration.plugins for the elasticsearch service in .platform/services.yaml:

elasticsearch:
  type: elasticsearch:7.2
  disk: 256
  configuration:
    plugins:
      - repository-s3

After we deploy this change, we need to SSH to the application container and register a new snapshot repository by running the following command:

# SSH to the Platform.sh app container
platform ssh

# Replace the value for these variables
AWS_BUCKET_NAME="<YOUR_AWS_BUCKET_NAME>"
AWS_ACCESS_KEY_ID="<YOUR_AWS_ACCESS_KEY_ID>"
AWS_SECRET_ACCESS_KEY="<YOUR_AWS_SECRET_ACCESS_KEY>"

# Extract Elasticsearch host and port from relationships
ES_HOST=$(echo "$PLATFORM_RELATIONSHIPS" | base64 --decode | jq -r '.elasticsearch[0].host')
ES_PORT=$(echo "$PLATFORM_RELATIONSHIPS" | base64 --decode | jq -r '.elasticsearch[0].port')

# Register the snapshot repository
curl -X PUT "http://${ES_HOST}:${ES_PORT}/_snapshot/aws-s3?pretty" -H 'Content-Type: application/json' -d'{
 "type": "s3",
 "settings": {
   "bucket": "'"${AWS_BUCKET_NAME}"'",
   "client": "default",
   "access_key": "'"${AWS_ACCESS_KEY_ID}"'",
   "secret_key": "'"${AWS_SECRET_ACCESS_KEY}"'"
 }
}'

Once that is done, all new Elasticsearch snapshots will be stored on the AWS S3 bucket.

Creating the new Elasticsearch snapshots

We’ll use a simple bash script that will need to be executed in the app container --make-elasticsearch-snapshot.sh in the root for your project:

# Extract snapshot parameters
SNAPSHOT_ID=$(date +"%Y%m%d-%H%M%S")
SNAPSHOT_NAME=$(echo "${PLATFORM_PROJECT}-${PLATFORM_BRANCH}-${SNAPSHOT_ID}")
SNAPSHOT_DATE=$(date +"%Y-%m-%d %H:%M:%S")

# Extract Elasticsearch host and port from relationships
ES_HOST=$(echo "$PLATFORM_RELATIONSHIPS" | base64 --decode | jq -r '.elasticsearch[0].host')
ES_PORT=$(echo "$PLATFORM_RELATIONSHIPS" | base64 --decode | jq -r '.elasticsearch[0].port')

# Create a new snapshot
curl -X PUT "http://${ES_HOST}:${ES_PORT}/_snapshot/aws-s3/${SNAPSHOT_NAME}?wait_for_completion=true&pretty" -H 'Content-Type: application/json' -d'{
 "ignore_unavailable": true,
 "include_global_state": false,
 "metadata": {
   "taken_by": "Platform.sh cron",
   "taken_on": "'"${SNAPSHOT_DATE}"'",
   "taken_because": "Daily backup"
 }
}

Add this script as elasticsearch_snapshot to the cron jobs in .platform.app.yaml:

crons:
   ....
   elasticsearch_snapshot:
       spec: '15 23 * * *'
       cmd: bash make-elasticsearch-snapshot.sh

And deploy it:

git add .platform.app.yaml make-elasticsearch-snapshot.sh
git commit -m "Added Elasticsearch snapshot cron job"
git push

After this is deployed, we can run the script in the app container:

# SSH to the Platform.sh app container
platform ssh

# Run the newly deployed script
bash make-elasticsearch-snapshot.sh

The new snapshot will be created and stored in our AWS S3 bucket.

Using Elasticsearch snapshots

We can get a list of available snapshots by running the following commands:

# SSH to the Platform.sh app container
platform ssh

# Extract Elasticsearch host and port from relationships
ES_HOST=$(echo "$PLATFORM_RELATIONSHIPS" | base64 --decode | jq -r '.elasticsearch[0].host')
ES_PORT=$(echo "$PLATFORM_RELATIONSHIPS" | base64 --decode | jq -r '.elasticsearch[0].port')

# Get the list of available snapshots
curl -X GET "http://${ES_HOST}:${ES_PORT}/_cat/snapshots/aws-s3?v"

Our next steps would be:

Register the same s3 snapshot repository for our local Elasticsearch
Chose the snapshot we want to restore on our local installation

Restore the snapshot on our local Elasticsearch:

curl -X POST "http://%LOCAL_ELASTICSEARCH%/_snapshot/aws-s3/%SNAPSHOT_NAME%/_restore"

Once these steps are done, we export the data from Platform.sh Elasticsearch to our local installation. And we can repeat these steps whenever we need.

Now it’s your turn

I hope you found this post interesting and useful. Hopefully it illustrates how flexible and extensible the Platform.sh framework is. Feedback and comments are appreciated. Happy snapshotting!

(Reprinted with permission.)