Basics – Using Avamar to Protect S3 Bucket Data

Technically you shouldn’t need to backup object stores such as Amazon S3. Object stores are designed with massive durability levels, and best practices for protection include:

  • Enable versioning
  • Use replication
  • Object write splitters (with subsequent versioning and replication as required, as well)
  • Use snapshots

(OK, as far as I’m aware, snapshotting an S3 bucket isn’t an option – though it is with Azure’s equivalent object storage.)

Fantasy scene involving bucket and clouds
Yay! Buckets!

Beware that, when fighting monsters, you yourself do not become a monster … fot when you gaze long into the abyss, the abyss gazes also into you.

Friedrich Nietzsche

So, how does one backup a bucket1of data using Avamar?

Here’s a list of what you’ll need:

  • Avamar (preferably with a Data Domain system, unless the bucket is small)
  • A bucket
  • A Linux system running S3FS – s3fs-fuse.

For my setup, I used:

  • Avamar 19.1 server (AVE) connected to a DDVE running DDOS 6.2.0.30
  • Linux server running CentOS 7.7.1908, with s3fs-fuse-1.85.el7.
  • Curl/libcurl 7.29.0-54.el7.
  • Permissions enabled such that the Linux system can access the bucket.
  • Avamar client 18.1.100-33 installed on the Linux system.

The s3fs-fuse software is an open-source project designed around the idea of instantiating an object store as a mountable file-system. Once it’s visible as a file-system on a host, you’ve got additional options for dealing with the data it contains.

I won’t document the s3fs-fuse configuration process – the project documentation is average, at best, but it can be muddled through.

Starting with an empty bucket
There’s a hole in my bucket (it’s empty)

I started with an empty bucket and uploaded a couple of files to it to get started. (I’m using the free AWS experimentation account, so I’m not going to load up an example with hundreds of gigabytes.)

Uploading some files to the bucket

When uploading to the bucket, I chose to store it in conventional S3, since there’s a small amount of S3 provided in the free experimentation accounts.

To enable access to my bucket, I configured IP level access since I was accessing from a fixed IP address in my home lab. (In practice, you’d probably want to grant access into a VPC running within AWS as this would be faster than doing it over the Internet.) My bucket policy looked like the following:

{
     "Version": "2012-10-17",
     "Id": "S3PolicyId1",
     "Statement": [
         {
             "Sid": "IPAllow",
             "Effect": "Allow",
             "Principal": "*",
             "Action": "s3:",
             "Resource": [
                 "arn:aws:s3:::bucketNameGoesHere",
                 "arn:aws:s3:::bucketNameGoesHere/*"
             ],
             "Condition": {
                 "IpAddress": {
                     "aws:SourceIp": "A.B.C.D/32"
                 }
             }
         }
     ]
 }

(Bucket permissions will be set and maintained by your cloud team, so the example I’ve provided just shows the construct I used.)

I also configured a user within my AWS account with permissions only for buckets, and then wrote the access key and secret key in the format ACCESS:SECRET to /root/.passwd-s3afs on my Linux server.

I created a directory, /s3 on the Linux server, and issued the following mount command:

# s3fs bucketNameGoesHere /s3 -o retries=1 -o passwd_file=/root/.passwd-s3afs -o use_path_request_style -o dbglevel=info -f -o curldbg -o endpoint=bucketNameGoesHere.amazonaws.com -o url=https://s3-regionNameGoesHere.amazonaws.com

The debug options were essential to tell me what I was doing wrong, though it seems the output of the debug commands leaves a lot to be desired – as does the manpage for s3fs. That being said, after a few trials of formatting the options differently, I got a successful mount:

Accessing the bucket under s3fs-fuse

Now, just to make things interesting, I added a subfolder:

Adding content to the bucket

Right, so what about Avamar? I installed and registered the client software2, then configured a policy. There’s really not much to the policy at all – here are the first few screens of it:

Naming the policy
Selecting the Clients for the Policy
Adopting a Dataset/Creating a New One
Dataset Selection Details

For my policy, I configured a dataset selection of just /s3. (Here’s where you’d need to be cautious – even with source-side deduplication, bucket backups will cost you in reads, so you have to factor that into your cost modelling and choose the appropriate frequency with which you wish to perform a backup.)

Running the policy with the debugging options mentioned in the s3fs mount command earlier will trigger a lot of output. Ideally, once you’re satisfied things are working, adjust your s3fs mount command to do the appropriate logging and run in the background. (You may even want to enable local filesystem caching and to backup that cache, instead.)

For my environment, the activity monitor shows the backup completing quite quickly:

Backup Complete

Now, what about data recovery? Let’s start with a recovery to an alternate location:

Initiating an Avamar Recovery

Here’s the walk-through of the restore wizard:

Restore Wizard: Select Backup
Restore Wizard: Select Destination Client
Restore Wizard: Select Content
Restore Wizard: Select Recovery Destination Folder
Restore Wizard: More Options (none used for this recovery)
Restore Wizard: Summary Page

After clicking ‘Finish’, the restore was initiated and it finished relatively quickly according to the ‘Activity’ view:

Restore Complete

And here’s a look at the recovered content:

Verifying the recovered content

But what about recovering back into the bucket?

Oops! I better recover my bucket!

I won’t run through the entire restore wizard again, but here are the two key screens where we select the content to recover and leave the destination as the original location:

Restoring S3 Bucket Data Back to the Source

The obligatory view of the completed restore via activity monitoring:

Restore Completed

And finally, verification that we did indeed recover the data:

Verifying the Recovered Data

So, there you have it! That’s how you can backup and recover data in an S3 bucket (or other object stores compatible with s3fs-fuse) with Avamar. I’ll wrap up by giving a few architectural recommendations:

  • Unless absolutely necessary, place the system that’ll mount the bucket for backup topologically near the object store – I hosted my client within my lab rather than within AWS.
  • Deduplication is essential here – you want to limit your egress costs, after all.
  • Be careful with bucket permissions, of course! Getting them wrong can leave you with a horrendous security issue!
  • Consider using versioning and other forms of native bucket protection and focus on this as a compliance level backup, executed with reduced frequency.

Footnotes

  1. I must say, ‘backup a bucket’ is an inconvenient phrase that’s pretty much perfect for triggering my old speech impediment.
  2. I’m part-way through upgrading from Avamar 18.1 to Avamar 19.2. So my Avamar server was on the interim Avamar 19.1 version, but my client packages were still 18.1.

1 thought on “Basics – Using Avamar to Protect S3 Bucket Data”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.