Category: big data

How to parse a pdf with SQL

How To Automate PDF Data Extraction – 3 Different Methods To Parse PDFs For Analytics

If you work in data, then at some point in your career, you’ll likely need to parse data from a PDF. You might need to parse thousands of PDFs in order to pull out invoice information. Or maybe you need to parse financial filing documents such as 10-Ks. This can seem challenging at first. Afterall,…
Read more


October 2, 2024 0

Real-time Analytics Vs Stream Processing – What Is The Difference?

One of the holy grails that many data teams seem to chase is real-time data analytics. After all, if you can have real-time analytics, you can make better decisions faster. However, there often is a conflation between real-time data analytics and stream processing.  These are two different concepts that are crucial to understanding how to…
Read more


September 3, 2024 0
change data capture time

Terms You Should Know If You’re Planning To Use Change Data Capture

If you’ve worked in data long enough, then you’ve likely come across the term change data capture. Often called CDC, change data capture involves tracking and recording changes in a database as they happen, and then transmitting these changes to designated targets. This can be crucial because some pipelines, in particular batch pipelines, don’t capture…
Read more


April 29, 2024 0
spark vs flink

Apache Spark Vs Apache Flink – What Is The Difference?

As data increased in volume, velocity, and variety, so, in turn, did the need for tools that could help process and manage those larger data sets coming at us at ever faster speeds. As a result, frameworks such as Apache Spark and Apache Flink became popular due to their abilities to handle big data processing…
Read more


April 25, 2024 0
real-time streaming consulting

5 Real-Time Data Processing and Analytics Technologies – And Where You Can Implement Them

No matter your industry, you’ll often need to make split-second business decisions in the digital age. Real-time data can help you do just that. It’s information that’s made available as soon as it’s created, meaning you don’t need to wait around for the insights you need. Real-time data processing can satisfy the ever-increasing demand for…
Read more


March 1, 2024 0

7 Data Engineering Projects To Put On Your Resume

Starting new data engineering projects can be challenging. Data engineers can get stuck on finding the right data for their data engineering project or picking the right tools. And many of my Youtube followers agree as they confirmed in a recent poll that starting a new data engineering project was difficult. Here were the key…
Read more


May 21, 2023 0

OLTP Vs OLAP – What Is The Difference

If you’re relying on your OLTP system to provide analytics, you might be in for a surprise. While it can work initially, these systems aren’t designed to handle complex queries. Adding databases like MongoDB and CassandraDB only makes matters worse, since they’re not SQL-friendly – the language most analysts and data practitioners are used to.…
Read more


May 8, 2023 0
data strategy consulting

Do You Need A Modern Data Stack Consultant

Photo by JESHOOTS.COM on Unsplash Modern data stack consultant plays an important role in companies looking to become data-driven. They help companies design and deploy centralized data sets that are easy to use and reliable. They do so by using cloud based solutions that help automate data pipelines and processes with less code than in…
Read more


January 25, 2023 1
alternatives to fivetran

What Is The State Of Data Engineering And Infrastructure In 2023

Photo by Sigmund on Unsplash 2022 is coming to an end. What is the state of data infra? Are Snowflake and Databricks still fighting over total cost of ownership? Is everyone switching to DuckDB? Are data engineers all learning Rust? Let’s try to answer these questions.   Building field-level lineage for modern data systems Lineage…
Read more


January 18, 2023 0
data engineering consulting

Should We Get Rid Of ETLs?

AWS has jumped on the bandwagon of removing the need for ETLs. Snowflake announced this both with their hybrid tables and their partnership with Salesforce. Now, I do take a little issue with the naming “Zero ETLs”. Because at the very surface the functionality described is often closer to a zero integration future, which probably…
Read more


December 30, 2022 0