When thinking about Cloud Services, backup is something normally not taken into consideration because we hear lots of “ephemeral”, “self-healing”, “repositories” words.
Sometimes, applications need to be backed up in order to achieve a RTO that suits customer’s business objectives.
Automation is key under these concepts, so with this document we want to indicate how easy is to setup a fully automated system using Lambda functions written in Python and scheduled in a daily basis to fulfil the requirements.
Last but not least, storage usage is also important here. If you have hundreds of snapshots, and you don’t delete them appropriately, you end up having TB of old, useless data. Removing snapshots based on a retention policy is very important in this process too.
Let’s define first the pre-requisites to have this properly working in our AWS account:
Setup IAM Permissions
- Go to Services, IAM, Create a new Role
- Write the name (ebs-lambda-worker)
- Select AWS Lambda
- Don’t select any policy, click Next, and Create Role.
- Select the new role, and click Create Role Policy
- Go to Custom Policy, click Select
- Write a Policy Name, (snapshot-policy), and paste the content of the following gist.
- What we’ve just done is allowing this role to Create/Delete Snapshots, Create tags and modify snapshots attributes. Also we have allowed permissions to Describe EC2 instances and view logs in Cloudwatch
Create Lambda Backup Function
This first function will allow us to backup every instance in our account under the region we put the lambda function, that has a “Backup” or “backup” key tag. No need to indicate a value here.
Before creating the function I would like to briefly explain what it does. The script will search for all instances having a tag with “Backup” or “backup” on it. As soon as we have the instances list, we need to get all the EBS volumes on each instance in order to have the list of EBSs to be backed up. Also, it will look for a “Retention” tag key which will be used as a retention policy number in days. If there is no tag with that name, it will use a 7 days default value for each EBS instance.
After creating the snapshot it creates a “DeleteOn” tag on the snapshot indicating when will be deleted using the Retention value and another Lambda function that we explain later on this document.
Steps to create the function:
- Go to Services, Lambda, and click Create a Lambda Function
- Skip the blueprint screen
- Write a name for it (ebs-backup-worker)
- Select Python 2.7 as a Runtime option
- Paste the code below
- Select the previously created IAM role (ebs-lambda-worker)
- Click Next and Create Function
Create Lambda Prune Snapshot Function
Our snapshots are created successfully using our previous function, but as explained at the beginning of this document, we need to remove them when not needed anymore.
Reminder: By default, every instance will be backed up if it has a “Backup” or “backup” tag. Also, if no “Retention” tag is added, snapshots for the instance will be removed after a week, as the backup Lambda function will add the “DeleteOn” tag key on each snapshot with the specific date when must be deleted.
Using the same steps as before, create the function (ebs-backup-prune)
Use the following code:
You will end up with something like this:
So, you now have 2 working functions that will backup ebs volumes into snapshots and remove those when “DeleteOn” specifies. Now is time to automate using the Event sources feature from Lambda.
Schedule our Functions
We need to run at least once a day both. For doing that, we need to:
- Go to Services, Lambda, click on the function name
- Click on Event sources
- Click on Add event source
- Select Scheduled Event
- Type Name: backup-daily or remove-daily based on the function you are scheduling
- Schedule expression: rate(1 day)
- Click Submit