Preparing Your Data Infrastructure for 2025: Lessons from the Past, Strategies for the Future

research@theseattledataguy.com December 3, 2024 Uncategorized 0

When I broke into the data world, everyone wanted to hire data scientists that would let their companies become more data driven. There were statistics about the exabytes of data that we were creating and the value it would provide.

However, a few years into my career, the data world started to make a pivot or at the very least there was a sudden focus on data engineering.

Now we are about to go into 2025, and many companies are pushing for AI. Everyone wants AI for everything. Leadership wants to be able to go back to their board and tell them that guess what, we have Ai implemented into our product. It’s starting to feel like 2012 all over again.

So, if your team wants to make use of your data, whether it be AI or just trying to answer key business questions, let’s talk about how you can better prepare your data infrastructure for 2025.

Lessons from the Past—Why Data Projects Fail

Let’s take a few lessons from the past. After all, although history may not repeat exactly, it often rhymes. Around 2012, there was an article that came out from the Harvard Business Review. The Sexiest Job of the 21st century, where it was proclaimed that data science was that role. Suddenly everyone in college wanted to take on the role. Back then, these roles offered a +100k salary. Which for millennials was a lot of money.

However, along with new employees wanting to be data scientists, companies were watching Facebook, Google, and other big tech companies start using data to their advantage. Everyone was running towards data science. However, many companies learned quickly that their data was not in order. Many of their data warehouses were somehow still siloed, their data pipelines may have stopped updating, and their data catalog hasn’t been updated for the past three years.

In many cases, data scientists were forced to become data engineers. Even if they didn’t want to. Everyone started using Hadoop, and job descriptions seemed to require data scientists to be PhDs, software engineers, statistical analysts and everything else in between. Slowly, some companies realized they needed to get their data house in order before really diving into a data science strategy.

Current Data Maturity Assessment

Now, before diving into how your team can better prepare your data infrastructure not just for AI but for a vast set of use cases. Let’s do a quick sanctity check.

What are your current teams capabilities? Now is a great time to assess and ask the following questions:

Can You Integrate Data? – One of the goals of traditional data warehousing is to allow end-users to join data from multiple sources. At companies like Facebook, this is far easier than at most other companies. Most of their applications are built from the ground up and talk to each other. Making integration easy. This isn’t the case for many enterprises. I have seen project management systems that wouldn’t talk to hour tracking systems and plenty of “joining on name” examples that go wrong.
Do You Trust Your Data? – If this was in order, this would be first. Do you, or more importantly your stakeholders trust your data. If they don’t, building and implementing AI into your systems isn’t going to end well. You need to start from a position of trust.
Does Your Team Know What Your KPIs Are? – Lots of data teams know a lot about technology, but not all data teams are well plugged in and understand the core metrics of their business. If the data team doesn’t have at least a baseline understanding of your company and it’s core metrics and how they can be influenced, then you’ll want to improve their knowledge. It’ll allow them to make better decisions, and hopefully build better AI products.
Do People Know Where Data Comes From – There are plenty of projects that I have come into where one data analyst will know where part of the data is and another will know some other tidbits about another data set and they are the only two people that really know where the data comes from. Meaning if they left, the data team’s projects might work for a little bit, but eventually something would break and no one would know where it came from.
Is There A Clear Method For Interdisciplinary Collaboration – Lots of companies often focus on technologies and how they could help amplify their ability to elicit value from data. But one of the places many companies can improve is their ability to work between teams. The more teams know what is going in between other departments, the more they can see how they can work together. Especially the data team. The more they understand what challenges other teams are facing, the more they can actually help.
Can You Scale? – Does your current infrastructure scale with your business needs? Does your data platform make it easy to deploy ML models?As your company grows, the volume, velocity, and variety of data often increase. In fact, I often tell businesses that it’s a good sign. You’re likely selling more products or increasing engagement. Many organizations build systems that work for their current state but fail to plan for future growth. Evaluate whether your pipelines, storage solutions and processing power can handle growth effectively without breaking the bank. AI models, in particular, can be resource-intensive, so scalability becomes even more critical.

These are just some of the key questions your business should consider as you are trying to get AI ready or if you haven’t already while you’re deploying your AI systems. With that behind us, let’s talk about data infrastructure.

Key Principles for Building a Future-Ready Data Infrastructure

As your data team is building your data infrastructure, they should keep a few key principles in mind.

Build for Flexibility

Flexibility is the cornerstone of a future-ready data infrastructure. This was the case before AI and it’ll be the case long after. You need to be able to adapt your infrastructure to future needs and to changing technologies. Begin by adopting modular architectures, one where if you need to remove a single component, it doesn’t break the rest. These architectures help decentralize data ownership and make it easier to adapt to evolving requirements for AI workloads. Using cloud platforms also allows you to leverage elasticity and on-demand resources, which are essential for AI and analytics. Cloud does come with it’s own costs so do be mindful that there are some benefits to using on-prem solutions. But that’s a point for a different article.

Emphasize Data Quality

Without high-quality data, even the most advanced AI systems will fail. That’s why your data team needs to develop data quality systems that can help ensure that data is as error free as possible. It also helps build trust and allows you to fully automate your data systems.There are a few different ways data teams may set up checks.

For example, for companies using dbt, you might implement dbt tests or perhaps you’ll integrate a solution like Elementary which provides data freshness, anomaly and row count checks out of the box. Still others will create Airflow tasks that run prior to pushing data into production to ensure the data is accurate.

Ensure Governance and Security

Data governance and security are critical as organizations work with increasingly sensitive data. Develop strong governance frameworks to establish ownership, compliance, and accountability across teams. Implement role-based access control to ensure that only authorized users can access specific datasets. Encryption and auditability should also be prioritized to safeguard data from breaches and meet regulatory requirements.

Integrate ML/AI Pipelines

To prepare for AI-driven workloads, align your existing data engineering pipelines with machine learning frameworks like MLflow or feature stores. These tools enable seamless integration of AI models into production, ensuring that data and models are versioned and reproducible. Reproducibility is especially critical in AI to maintain trust and reliability. By integrating ML capabilities early on, your infrastructure can support iterative development and experimentation, making it easier to scale AI initiatives over time.

Steps to Prepare Your Data Team for 2025

Now there are plenty of areas your team can focus on to be more AI-ready for 2025. You can improve your data tooling, upskill employees, etc. We’ll cover a few of these below but really a lot of the areas you should focus on is improving your business and data team alignment.

Upskill and Empower Your Team

To prepare for the evolving demands of AI and advanced analytics, invest in building your team’s skills. You can do so by providing training in AI technologies, MLOps, and advanced data engineering tools. If you’re looking for help upskilling your data team, our team can help. Feel free to reach out and we’ll help put together programs to upskill your teams.

Align Data Strategy with Business Needs

AI and data-driven initiatives should solve real business problems. This is where many businesses and data teams make mistakes. They find projects that sound good, and are “cool” but don’t drive business impact. If you’re looking for a place to really understand how to find projects and develop a data strategy that aligns with your business needs, then check out this article.

Build a Culture of Experimentation

There is a fine balance between making sure your team focuses on high impact ROI data projects, and testing out new ideas. After all, innovation thrives in an environment that encourages experimentation. Set up sandbox environments where teams can safely test AI technologies without disrupting core systems. Promote iterative development to refine ideas and deliver better solutions over time. Also, give your team space to test out ideas and fail. Not all data projects will be successful, which is ok as long as it’s only occasional.

Strengthen Collaboration Across Teams

The most successful data initiatives involve cross-functional collaboration. Just because a team implements a model successfully, doesn’t mean the outcome will be good for the business. In fact, I have a few stories where what seemed like a successful project implementation eventually had to be rolled back. Because the project technically worked, but the outcome wasn’t positive for the business. So encourage your data team to partner with other departments, from marketing to operations, to ensure your projects address the organization’s broader needs. This interdisciplinary approach often leads to more meaningful outcomes.

Look For Tooling To Help Make

Tools and solutions without good processes can only take you so far. But there are a lot of tools out there that can help make your life considerably easier as an engineer. Instead of having to spend time performing tasks like creating API data connectors, you can use an out of the box ELT solutions. Instead of trying to build your own version of a data orchestrator, you can use something out of the box like Airflow. If you need help finding the right data tools, feel free to reach out!

The Time to Act Is Now

Data can be a valuable asset to a company. But if you rush into building AI and data science solutions without building reliable data sets and infrastructure, that makes it easy to build out data products that are more than just a fancy MVP.

This takes time. But it can be done, but it requires doing more than just signing expensive contracts with data tooling. It requires improving your team’s processes and communication between them and the departments around them. Teams must align on business goals, ensure stakeholders trust the data and develop a culture of collaboration and continuous improvement.

By taking a deliberate and structured approach, you’ll not only be able to build AI solutions but also create a robust data foundation that drives meaningful business outcomes. The companies that succeed in 2025 and beyond will be the ones that start today, investing in the fundamentals and fostering a vision for innovation and adaptability.