What Is The State Of Data Engineering And Infrastructure In 2023

What Is The State Of Data Engineering And Infrastructure In 2023

January 18, 2023 Analytics Engineering big data Data Driven Culture 0
alternatives to fivetran

Photo by Sigmund on Unsplash

2022 is coming to an end. What is the state of data infra?

Are Snowflake and Databricks still fighting over total cost of ownership?

Is everyone switching to DuckDB?

Are data engineers all learning Rust?

Let’s try to answer these questions.

 

Building field-level lineage for modern data systems

Lineage is a critical component of any root cause, impact analysis, and overall analytics heath assessment workflow. But it hasn’t always been easy to create, particularly at the field level. In this session, Mei Tao, Helena Munoz, and Xuanzi Han (Monte Carlo) tackle this challenge head-on by leveraging some of the most popular tools in the modern data stack, including dbt, Airflow, Snowflake, and ANother Tool for Language Recognition (ANTLR). Learn how they designed the data model, query parser, and larger database design for field-level lineage—highlighting learnings, wrong turns, and best practices developed along the way.

Presenters: 

  • Mei Tao: Mei Tao is a product manager at Monte Carlo, a data reliability company. Prior to joining Monte Carlo, Mei worked in product management at NEXT Trucking and Product Strategy at Coinbase. She received her MBA from Harvard Business School and her B.S. in Statistics from the University of California, Berkeley.
  • Helena Muñoz: Helena Muñoz is a senior front-end engineer at Monte Carlo. Previously, she served as senior software engineer at Portchain and a team lead at Infragistics.
  • Xuanzi Han: Xuanzi Han is a senior back-end software engineer at Monte Carlo. Previously, she worked on Uber’s Marketplace Intelligence team building the systems to deploy the global ridesharing platform at scale.

Hamilton: Natively bringing software engineering best practices to python data transformations

At Stitch Fix, a data science team’s feature generation process was causing them iteration & operational frustrations in delivering time-series forecasts for the business. It wasn’t the scale of data that was the problem, it was their code. In this talk I’ll present Hamilton, a novel open source Python framework that solved their pain points by changing their working paradigm.

Specifically, Hamilton enables a simpler and more productive approach for data science & data engineering teams to create, maintain, execute, and scale both the code (human) and computational sides of feature/data transforms. In this talk I’ll cover the motivation & backstory, what the Hamilton paradigm is and how it works, and the journey thus far.

Talk: Data Cloud Optimization & FinOps: An Efficiency Multiplier 

Data Clouds like Snowflake and Databricks have become the cornerstone of data-driven innovation as enterprises move their data and analytics efforts to the cloud to expedite the time to insight for business decisions and develop modern data apps. Additionally, the industry is transitioning to consumption-based pricing models. As companies increasingly run workloads in the cloud, cost governance, workload optimization and operational discipline are becoming key priorities for data engineering teams and finance executives. With a better understanding of cloud usage and an easier way to optimize resources, both engineering and finance teams can more effectively enforce accountability and improve the efficiency of their Data Cloud investments. 

A new generation of tools is needed to continuously optimize data efficiency, balance costs without slowing down innovation and improve x-functional team productivity. Listen as Mingsheng Hong, co-founder & CEO at Bluesky provides insight into a new generation of data tools that help users observe and optimize the way they use data and automate safeguards that allow experimentation and innovation all while preventing wasteful spending.

Sign Up

If you’re looking to sign up, you can do so here.

In addition to our amazing talks we have several great sponsors.

Sponsors

Select Star gives you an automated data catalog, lineage, and usage analysis across thousands of datasets, so you & your team can find and understand data easily.

The Data Stack Show Conversations at the intersection of data engineering and business