What Is AWS DMS And Why You Shouldn’t Use It As An ELT

What Is AWS DMS And Why You Shouldn’t Use It As An ELT

November 8, 2024 data engineering 0

Recently, I’ve encountered a few projects that used AWS DMS, which is almost like an ELT solution. Whether it was moving data from a local database instance to S3 or some other data storage layer. It was interesting to see AWS DMS used in this manner.

But it’s not what DMS was built for.

As the name suggests, it was built as a database migration service. Which is very different from what you might need an ELT solution for. ELT and ETL solutions allow the end user to do more than just move data from point A to point b; you can actually run transforms and other processes.

So, in this article, I want to cover two topics.

First, what is AWS DMS, and are some more appropriate alternatives for ELTs?

What is AWS DMS?

AWS Database Migration Service (DMS) is a solution that makes it easier for end-users to migrate data from a source database into AWS. Whether you’re transitioning from an on-premises database to Amazon Web Services (AWS) or migrating between different databases in the AWS ecosystem, DMS simplifies this process.

Key Features of AWS DMS

  1. Minimal Downtime: DMS is designed to keep your source database fully operational during the migration. This reduces the risks associated with downtime, ensuring your business can continue to operate while the data moves in the background.
  2. Wide Database Support: It supports various database engines, including MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Amazon RDS databases, among others. It can also migrate data between on-premises databases and Amazon Redshift, a data warehouse.
  3. Homogeneous and Heterogeneous Migrations: DMS supports both homogeneous (e.g., MySQL to MySQL) and heterogeneous (e.g., Oracle to PostgreSQL) database migrations. This makes it flexible for various use cases, whether sticking to the same database or switching to a new one.
  4. Ongoing Replication: With DMS, you can set up continuing replication between your source and target databases, allowing you to keep data continuously synchronized. This is helpful for hybrid environments or staging before the final cutover. This is also why some people use it to ingest data into a data warehouse or reporting system.
  5. Data Transformation and Validation: It offers the ability to transform data during migration to ensure compatibility between source and target databases. Additionally, it validates data during the transfer to ensure that no records are lost or corrupted. But this is minimal.

Why Should You Consider AWS DMS?

Migrating databases is often seen as a complex and risky process. Very few people I know use a script to migrate data. Unless they need some fine-tuned control. Instead, in the past, they’ve likely used solutions like RedGate that make it easy to migrate not only data but often code as well.

In a similar way, AWS DMS reduces these challenges by offering a highly managed, reliable solution. The service helps you streamline the process, providing robust monitoring and control over migrations.

Is AWS DMS A Good ELT Tool?

The short answer is no. 

Don’t get me wrong, AWS DMS can move data from point A to point B, but it is generally limited.

It’s limited to databases.

It’s also limited in how you can configure it to ingest data.

That’s why you should likely consider some form of either ELT or custom solution. There are plenty of options; it’ll just depend on what use case you’d like.

Extracting Data From Databases For Analytics

There are several vital reasons why AWS DMS is not a good fit for traditional data engineering workflows.

But if you need some bullets, here are some:

  • Doesn’t Allow For Transforms: AWS DMS is designed primarily for database migration with minimal data transformation features. It lacks the complex transformation logic that is typically needed in robust ETL workflows.
  • No Support for Advanced Data Processing: For more advanced data processing, aggregations, or transformations, AWS DMS falls short compared to full-fledged ETL tools like Apache NiFi, Talend, or AWS Glue, which offer better flexibility and processing power.
  • Schema Change Management: AWS DMS doesn’t handle schema evolution well during the migration or sync process. Any changes to the source schema may require manual intervention, making it cumbersome for ongoing ELT/ETL tasks.
  • Latency in Real-Time Processing: While DMS supports continuous replication, there may be some latency, making it less ideal for real-time data pipelines that demand near-instantaneous data availability.
  • Error Handling and Data Quality: The error handling and logging mechanisms in AWS DMS are not as robust as those in dedicated ETL tools, which can make troubleshooting and data quality assurance more difficult.
  • Customization and Extensibility: AWS DMS is not easily customizable for specific workflows or data processing needs. Tools like AWS Glue, Databricks, or Apache Airflow provide more flexibility for tailored data engineering tasks.
  • Limited Integrations: AWS DMS mainly focuses on database migrations and has fewer connectors and integration options compared to comprehensive ETL platforms, which support various data sources, APIs, and file types.
  • Pricing Model: Depending on the use case, the pricing model for AWS DMS can become more expensive than other ETL services, especially for continuous data migrations or transformations.
  • Connector Limitations: DMS supports only CDC database sources and Amazon database targets, requiring VPC for all targets. It doesn’t support some cross-region migrations.
  • Initial Load Process: The initial full load of source data by table captures the table at that specific point in time. Subsequent changes are cached and applied before transitioning into CDC-based replication. This approach, reminiscent of old Attunity methods, requires table locks and can burden the database at scale.
  • Scalability Issues: DMS has scalability limitations, such as a 100GB memory limit for replication instances, which complicates handling large tables. This often necessitates manually splitting tables into smaller segments.

Alternatives To AWS DMS

When it comes down to it, there is no shortage of tools. You can just check out the ELT section of the MAD Landscape below. 

data landscape

Again, although AWS DMS isn’t a solution, you should use ELT or at least EL, it has been. So here are three options you have for actual ELTs.

Custom Built Solution

It goes without saying that you can just write your own code to extract data from a database. Especially if you only have a few tables and not a large amount of data. Connect to your database, extract it into a file, and push it into your data warehouse. Maybe you’ve even created a direct connection and can skip the file. But if you can’t, it’s pretty straightforward to write.

Estuary 

Looking away from custom scripts, there are many options in the EL world, ranging from open-source to paid. As this article was spurred by a few possible clients who were using AWS DMS to perform their ELT and who asked about Estuary, I’ll reference it. Estuary Flow provides “Managed CDC, a simple, efficient change data capture from databases with minimal impact and latency. It also offers seamless backfills and real-time streaming out of the box, making it a comprehensive solution for data management.

Upsolver

I’ll also call out Upsolver, which can also perform a similar task. They can help your team ingest data into their data lake, perform SQL transformations, and create iceberg tables. They also offer the ability to ingest data from multiple sources, such as databases, and would generally be a better option than AWS DMS at the correct scale. 

AWS DMS And More

AWS DMS has its place, and I have a few clients that use it, often to move databases from on-prem to the cloud. But it’s not built as an ELT solution. So, although it can move data from point A to B, you’ll likely quickly find yourself fighting against various limitations.

So consider either building a solution that meets your needs or looking into pre-built solutions. That will allow your team to actually perform the required transforms and processes without being hindered. 

Also! Don’t forget to check the articles below.

Back To The Basics With SQL: Understanding Hash, Merge, and Nested Joins

How to analyze merchant homepages visually at scale with SQL

The Data Engineer’s Guide to ETL Alternatives

Explaining Data Lakes, Data Lake Houses, Table Formats and Catalogs

How to cut exact scoring moments from Euro 2024 videos with SQL

How To Modernize Your Data Strategy And Infrastructure For 2025