
The most popular open-source RDBMS databases are MySQL and PostgreSQL databases. If you are looking to leave on-premises for a public cloud, AWS offers many choices with rich functionalities for you.
- Run these databases in AWS EC2 compute instances and self-manage them.
- Leverage managed services from AWS using RDS for MySQL or PostgreSQL.
- Use AWS Aurora for MySQL or PostgreSQL
- Use AWS Aurora serverless for MySQL or PostgreSQL
Now, that’s a lot of options and might even be overwhelming. Let’s look at the pros and cons of the various options comparing AWS RDS for Aurora vs MySQL vs PostgreSQL.
MySQL and PostgreSQL databases in AWS EC2
In this case, you determine the right EC2 instance based on the amount of compute & memory. You determine the right storage tier based on your IOPs requirements. You then install MySQL or PostgreSQL database software and manage the infrastructure, OS, and database by yourself….just like you did on-premises….the burden is still on you to make it run smoothly.
If you and the application team wants to use a fully managed services offering, like Database as a service, consider using AWS RDS for MySQL or PostgreSQL.
AWS RDS for MySQL and PostgreSQL
AWS RDS is a managed service offering from AWS. i.e. AWS takes care of OS & Database patch management, upgrades, snapshots.
When your application team needs a production database, you can deliver it in minutes. When the load increases (say before thanksgiving or Xmas) you can add more compute and memory capacity in minutes.
When your database size is growing and you need to assign more storage, you can do it in minutes.
If you want to improve performance by offloading reads from production RDS to read replicas, you can set it up in minutes.
If you need high availability in another Availability Zone with synchronous replication, you can enable the checkbox in minutes.
As you can see, everything can be done in minutes!!!. This reduces the operational burden, increases the speed at which your application team can deliver functionality. Everybody wins.
But there is a downside.
All RDBMSs use redo log files to record transactions. They do this to maintain ACID for databases…so that in the rare event that the system goes down, the redo logs can be used to replay the transactions.
However the challenge is that for every logical database write, there are multiple writes that happen to the storage layer because of writing to redo log files and later to the actual data file/pages as well.
The IO write capacity between database and storage that should have been used for just writing transactions has decreased because of unavoidable housekeeping IOs.
AWS Aurora
AWS Aurora solves the IO overhead between database server and storage by coming up with an innovation in the storage layer.
For Aurora, “the log is the database”. i.e. when the database server writes to redo log, there is no additional IO performed between the database server and storage. Instead, the storage layer takes care of applying redo logs when the pages are read.
Thus, the compute and storage can now scale out independently, delivering great performance. Because of such unique architecture Amazon claims that Aurora can deliver 5x and 3x faster than MySQL and PostgreSQL databases, respectively.
Aurora also offers additional benefits such as:
- It writes to 3 availability zones in 4/6 quorum mode i.e. it can survive an availability zone failure plus one more node failure.
- You can have up to 15 low-latency read replicas. This further boosts up the performance by offloading reads from the production database to read replicas.
- You get continuous backups to S3. You can recover to any point-in-time.
- If a system crashes, the recovery time is very fast because it doesn’t have to replay from the redo log files. Instead, the storage layer applies the redo logs automatically when the pages are read.
- Aurora also helps users to create storage optimized clones in minutes. These clones do not take any extra storage space and are rewritable. You can use them for test/dev purposes.
You can provision an AWS Aurora instance by choosing a specific compute and memory instance.
Or you can choose the serverless option where you can grow and shrink based on Aurora Capacity Units (ACUs).
If you still feel that you want to run MySQL and PostgreSQL databases in EC2 instances, that’s fine. You can start here, make sure your multi-tier application is working fine in AWS, and then at some point, easily migrate from the databases running in EC2 to AWS Aurora.
Cloud Backup, Cloud Disaster Recovery, and Database Cloning for Test Data Management and Analytics
While you are running your databases in AWS EC2, you need an enterprise-grade data management solution in AWS. i.e. you need backup, recovery, and database cloning for test/dev and data warehouse analytics. Pick a solution that can deliver you with
- Incremental forever backup solution with database consistency. This reduces your RPO (recovery point objective).
- Instant mount and recovery in just minutes, even for multi-TB databases. This reduces your RTO (recovery time objective) to minutes.
- Reuse the backups to rapidly provision database clones, in minutes, for data warehouse & analytics, Dev/QA environments.
Schedule a short discussion with an Actifio subject matter expert for the data management use cases in AWS or any other cloud such as Azure, GCP, and IBM cloud.
For additional information on more details of how AWS Aurora works, check out this blog.