Category: data engineering

parsing pdfs with python

Challenges You Will Face When Parsing PDFs With Python – How To Parse PDFs With Python

Scraping data from PDFs is a right of passage if you work in data. Someone somewhere always needs help getting invoices parsed, contracts read through, or dozens of other use cases. Most of us will turn to Python and our trusty list of Python libraries and start plugging away. Of course, there are many challenges…
Read more


November 19, 2024 0
unstructured data analytics

What is Unstructured Data? A Guide to Storage, Processing, and Analysis

Much of the data we have used for analysis in traditional enterprises has been structured data. It’s easy for humans to break down, understand, and, in turn, find insights from it. However, much of the data that is being created and will be created comes in some form of unstructured format. However, the digital era…
Read more


November 13, 2024 0

What Is AWS DMS And Why You Shouldn’t Use It As An ELT

Recently, I’ve encountered a few projects that used AWS DMS, which is almost like an ELT solution. Whether it was moving data from a local database instance to S3 or some other data storage layer. It was interesting to see AWS DMS used in this manner. But it’s not what DMS was built for. As…
Read more


November 8, 2024 0
how to lead a data team

9 Must-Watch Videos for Aspiring Data Leaders: Bridging Tech and Business for Data Team Success

Leading data teams can be challenging. You’ve got management and non-technical teams constantly reaching out with ad-hoc data requests; you’re likely trying to figure out what tools will work best and not blow the bank. Not to mention, you’ve got to bridge the gap between business and technology. All while trying to grow your data…
Read more


November 6, 2024 0

Real-time Analytics Vs Stream Processing – What Is The Difference?

One of the holy grails that many data teams seem to chase is real-time data analytics. After all, if you can have real-time analytics, you can make better decisions faster. However, there often is a conflation between real-time data analytics and stream processing.  These are two different concepts that are crucial to understanding how to…
Read more


September 3, 2024 0
airbyte alternatives

4 ELT Alternatives To Airbyte – How To Ingest Your Data

Getting data out of source systems and into a data warehouse or data lake is one of the first steps in making it usable by analysts and data scientists. The question is how will your team do that? Will they write custom data connectors, pay for a data connector out of the box or perhaps…
Read more


May 8, 2024 0
change data capture time

Terms You Should Know If You’re Planning To Use Change Data Capture

If you’ve worked in data long enough, then you’ve likely come across the term change data capture. Often called CDC, change data capture involves tracking and recording changes in a database as they happen, and then transmitting these changes to designated targets. This can be crucial because some pipelines, in particular batch pipelines, don’t capture…
Read more


April 29, 2024 0
data engineering videos

10 Great Videos To Help You Learn Data Engineering

How data is structured, managed and processed will continue to grow in importance as the demand for AI and machine learning increase. It’s unavoidable that as businesses demand that their data teams implement AI, they will also realize that data engineers are a crucial piece of the data pipeline. That means, if you’re looking for…
Read more


April 20, 2024 0
apache druid architecture

Apache Druid’s Architecture – How Druid Processes Data In Real Time At Scale

Recently, I wrote an article diving into what Druid is and which companies are using it. Now I wanted to do a deeper dive into Apache Druid’s architecture. Apache Druid has several unique features that allow it to be used as a real-time OLAP. Everything from its various nodes and processes that each have unique…
Read more


March 11, 2024 0
ssis migration project

Alternatives to SSIS(SQL Server Integration Services) – How To Migrate Away From SSIS

SQL Server Integration Services (SSIS) comes with a lot of functionality useful for extracting, transforming, and loading data. It can also play important roles in application development and other projects. But SSIS is far from the only platform that can provide these services. You might seek alternatives to SSIS because you want a more agile…
Read more


February 27, 2024 0