Thoughts on how to Protect PaaS

As an organisation becomes more mature in their dealings with cloud they recognise that certain cloud decisions (AWS vs Azure, IaaS, vs PaaS) have operational impacts.

One of the ones that I frequently run into is that customers have chosen to use Cloud Managed databases (e.g., RDS in AWS) but then decide they want (or the regulator demands it) copies/backups of the data outside of the cloud (or at another cloud provider). Backup options provided by the platform, generally doesn’t let you export data (in a simple way) outside of the platform. The problem with any traditional approach is that we usually need to install agents in order to efficiently access the data (e.g., RMAN for Oracle). This access to the platform tends not to be available in a PaaS offering.

So what can a customer do ?

I will presume you are familiar with Data Domain Virtual edition and BoostFS (if not read this , this and this before continuing) There are a number of tools that can make logical copies of a database. What this means is that instead of reading from the data files, we perform a query on the database to extract all the table and data information. This is obviously a much less efficient way to getting to the database, but in the case of PaaS, may be the only option.

Tools like:

  • sqlpackage, bcp (SQL)
  • exp (Oracle)
  • mysqldump (MySQL)
  • many others

all have the ability to connect to a remote database and logically extract data. So if we install these tools on a host, mount boostfs to that host, we now have a way of extracting data from RDS (and other managed databases) and store/replicate this data efficiently.

Here are my thoughts on how to automate this

  1. Use AWS cli to work out what databases need to be ‘backed up’
  2. Use native logical tool to extract data to boostfs file system
  3. Add ‘backup’ info into local mySQL database (for extraction by DPA)
  4. Schedule DD snapshots (and mtree replication) everyday to efficiently offsite copies of data (using DD deduplication)
  5. Test recover data in other cloud, on prem, other region, by using the same tools in reverse

So here is some (I used Perl, I know it’s not trendy) code that works, it has a bunch of technical debt, but wanted to show that this can be done.

Initialisation (setting some variables)

# set the variables for the script

$AWSCLI = "/usr/local/bin/aws";
$MYSQLDUMP = "/bin/mysqldump";
$MYSQL= "/bin/mysql";

# Information for all mysql dbbases

$MYSQLPASS= "Password_123";

# BOOSTFS directly (assume that it is already mounted)

$BOOSTFSDIR= "/boostfs1";

# Information for reporting database

$REPDB= "jdvmysql";
$REPDBTABLE = "report";
$REPDBUSER = "mysqluser";
$REPDBPASS = "Password_123";
$REPDBSERVER = "jdvmysql.6db48nd933.us-east-1.rds.amazonaws.com";

Create directory for each day (and should be for each backup) and use AWS CLI to get list of all RDS databases. We are only going to concentrate on MySQL (but this could easily be extended to include other DBs)

#
# Code Start
#

($day, $month, $year) = (localtime) [3,4,5];
$TODAY= "$day$month$year";

# create the directory for todays backups
# TBD check to see if the directory exists

if ( ! -d "$BOOSTFSDIR/$TODAY") {
 $retval = system("mkdir $BOOSTFSDIR/$TODAY");
 die ("Can\'t create directory for backups") if ($retval != 0)
}

# get the list of MySQl database
# TDB need to do this for each region
# need to have run aws configure to set up credentials before running awscli

open(DBS," $AWSCLI rds describe-db-instances --query 'DBInstances[*].[DBInstanceArn,Engine,DBInstanceIdentifier,Endpoint.Address,MasterUsername,AllocatedStorage]' --no-paginate --output table |") or die ("Can't run AWS CLI");

Iterate through each line returned by the AWS call and ignore anything that is not of type ‘mysql’

foreach $db_instance (<DBS>) {
 my ($pad1,$db_arn, $db_type, $mysql, $db_server, $db_username, $size,$pad2) = split (/\|/, $db_instance);

# remove all whitespace

$db_arn =~ s/^\s+|\s+$//g;
 $db_type =~ s/^\s+|\s+$//g;
 $mysql =~ s/^\s+|\s+$//g;
 $db_server =~ s/^\s+|\s+$//g;
 $db_username =~ s/^\s+|\s+$//g;
 $size =~ s/^\s+|\s+$//g;

# drop line if it is not a mysql instance (or other rubbish)
 next if ($db_type ne "mysql");
 print ("Found mysql instance $mysql\n");

Found a mysql database, run DB specific logical copy command (in this case mysqldump) and dump to BoostFS filesystem and then insert information about the backup into a reporting database (so that DPA can extract it)

# run the mysqldump command to create the dump file in the ddboost file system
 # TBD if the host is None, the DB is in the process of being created (or killed), so skip
 $start_time = localtime();
if ($db_server ne "None") {
 system("$MYSQLDUMP -P 3306 -h $db_server -u $db_username --password='$MYSQLPASS' $mysql > $BOOSTFSDIR/$TODAY/$mysql");
 $retval = $? >> 8;
 }
 $end_time = localtime();

# insert the backup information into the mysql table

if ($retval) {
 $status = "Failed"
 } else {
 $status ="Success"
 }
 $retval2 = system("$MYSQL -P 3306 -h $REPDBSERVER -u $REPDBUSER -p $REPDB --password=\"$REPDBPASS\" -e 'INSERT INTO $REPDBTABLE values(\"$start_time\", \"$end_time\", \"$mysql\", \"$s\
ize\", \"$db_arn\", \"$status\")'");

Now that all of this has happened, lets have a look at the reporting database

[root@ip-172-31-90-95 backup_scripts]# mysql -P 3306 -h jdvmysql.6db48nd933.us-east-1.rds.amazonaws.com -u mysqluser -p --password="Password_123"
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 271
Server version: 5.6.40 Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]> use jdvmysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MySQL [jdvmysql]> select * from report;
+--------------------------+--------------------------+----------+------+------------------------------------------+---------+
| start | end | dbname | size | longname | status |
+--------------------------+--------------------------+----------+------+------------------------------------------+---------+
| Tue Feb 12 23:52:08 2019 | Tue Feb 12 23:52:08 2019 | jdvmysql | 20 | arn:aws:rds:us-east-1:118186027194:db:jd | Success |
+--------------------------+--------------------------+----------+------+------------------------------------------+---------+
1 row in set (0.00 sec)

MySQL [jdvmysql]> quit

Now all we need to do is set up a custom external report in DPA and add the script into a cronjob.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.