How To Set Up Your Data Analytics Team For Success – Centralized vs Decentralized vs Federated Data Teams

How To Set Up Your Data Analytics Team For Success – Centralized vs Decentralized vs Federated Data Teams

September 13, 2023 Uncategorized 0
data analytics consulting

Photo by Austin Distel on Unsplash

Success in the data world hinges on team setup.

I’ve delved into onboarding and standards in previous articles, but never into the structure of data teams. Typically, there are three configurations: Centralized, Decentralized, and Federated. Most companies I’ve seen use a mix of these.

While the newest tech breakthroughs grab headlines, team organization is the unsung hero. A well-structured team boosts business impact, streamlines communication, and enhances information sharing among internal data units.

So, let’s dive into these setups and their real-world implications.

Addressing Hybrid Teams

Before we go further, remember: data teams aren’t confined to one structure.

For instance, you might centralize your data engineers but have your data analysts decentralized. Or, you could have one unified data team for the whole business.

As we explore each setup, keep in mind that centralized, decentralized and federated are just high level categories. There are many hybrid set-ups that can be utilized.

Centralized Data Teams

data analytics consulting

Most companies start with a centralized data team. It is easy to do and generally the most strategic early on.

You can have your data engineers and analysts work together and have the business units reach out to them with their project needs. I feel as if many people believe this approach to data teams is mostly seen at start-ups.

However, I have worked with several large organizations that utilized a centralized data team approach. Generally, this was because the organization treated data more like traditional IT or viewed it as a cost more than a strategic partner.

In other words, they really wanted traditional reporting and not much more.

Pros of a Centralized Data Team:

  1. Consistency: Having a centralized team ensures consistency in data management practices, tools, and technologies used across the organization. What tends to happen as companies expand is they pick up more and more solutions. Suddenly instead of just using Snowflake, a company might be using Snowflake, BigQuery, Postgres, etc. But when the data work is centralized you can avoid this(a little).
  2. Economies of Scale And Limiting Duplicative Work: Centralizing resources can lead to cost savings as it eliminates duplication of efforts and allows for better utilization of resources. This actually wasn’t uncommon at Facebook. You might be taking on a project that another team or individual was working on and it’s hard to know if you are in your own little central data pod.
  3. Knowledge Sharing: Centralizing the data team can facilitate better knowledge sharing and collaboration among team members. Whether its because they are physically in the same place and can converse daily on what is occurring in the business or just because they can all see the Kanban board.

Cons of a Centralized Data Team:

  1. Slower Response Time: A centralized team may be slower to respond to individual business units’ specific needs and requirements, as there may be more layers of bureaucracy and decision-making. This also can force your operations teams to simply create their own shadow data teams in many ways, defeating the whole purpose of the centralized approach. This is often amplified by smaller data teams who may not have the budget of a sales or marketing organization.
  2. Risk of Being Out of Touch: The centralized data team may not have as deep an understanding of the specific needs and challenges of individual business units, leading to solutions that may not be as tailored or effective. It can also create an us vs. them feeling because instead of being aligned with the business your goals might be more focused internally.
  3. Potential for Lower Engagement: Business units may feel less ownership and engagement in data projects if they are entirely managed by a separate centralized team. Again, this reiterates the Us vs. Them feeling. If the business doesn’t feel like the data team is being empathic or isn’t producing the results they want, they will seek out other options.

Why Companies Move Away From A Centralized Data Team

There are a lot of reasons companies will move away from centralized data teams. But here are a few reasons I have seen.

  • Marketing, Sales and other departments are building data teams on their own anyway as shadow data teams
  • Scale and rapid growth often pushes companies to start to decentralize their data teams so they can respond more effectively to business questions. Or at the very least they may just provide teams like sales with designated analysts.
  • A company may be looking to better align what their data analysts and other data professionals are working on and the individual business units goals.

Decentralized Data Teams

Decentralized teams are sometimes also referred to as embedded teams. Now, this is where things can get tricky; some companies I have worked with (or for) have a centralized data infrastructure and data engineering team.

However, the analysts are the ones that are decentralized or “embedded.” This provides a specialized analyst to operate internally and be better aligned with the department.

At the same time, the data engineering and infrastructure teams can benefit from the economies of scale by standardizing the core infrastructure(this actually can start to lead towards the federated approach will be referenced later).

Another approach could be integrating the data engineering and analysts into functional pods. But in my experience, this is when companies start to federate technology too.

Pros of Decentralized Data Teams:

  1. Responsiveness: Decentralized teams can respond quickly to the specific needs and requirements of individual business units as they are closer to the action and have a deeper understanding of the particular challenges that the unit faces. Of course, this can be for better or worse. Suddenly a data analyst can become the “can you pull this data real quick” person. If there is no barrier between the analyst and the marketing team, then they might find themselves as a service org of one.
  2. Tailored Solutions: Decentralized teams can develop solutions tailored to the specific needs and challenges of individual business units. Which can work for a while, but in my experience, having some form of central data repository, is beneficial. Even if everyone manages their own namespace or schema, it makes dealing with regulations and compliance(amongst other things) far easier.
  3. Contextual Understanding: Probably one of the biggest benefits is the ability for data professionals to have a far better context on what they are working on. When you’re actually embedded into the team, you’re meeting with their leads and working on their problems daily, It can make it easier to figure out how to prioritize projects as well as deliver them.

Cons of Decentralized Data Teams:

  1. Inconsistency: Having decentralized teams can lead to inconsistencies in data management practices, tools, and technologies used across different business units. This can lead to challenges with integrating data across the organization and may result in lower data quality and reliability. This also makes it harder to switch teams as new team members need to learn both the business context and the new tools.
  2. Duplication of Efforts: Decentralized teams may lead to duplication of efforts as different business units may develop similar solutions independently. This can happen a lot, in fact when I worked at a hospital I worked on several dashboards that other teams were also working to deliver and we didn’t learn about the duplicative work until we both finished.
  3. Challenges in Knowledge Sharing: Decentralized teams may face challenges in sharing knowledge and best practices across the organization as they are more siloed. That being said, I do find that when you have systems that encourage searching and looking through other people’s code and work like data catalogs and automated wiki pages, this can be mitigated.
  4. Difficulty in Aligning with Organizational Goals: I think this is likely the most salient point I felt when I worked for decentralized teams. Decentralized teams are very aligned with their business units but it’s easy to lose touch with the bigger picture.

If your team is looking for help improving your data team’s strategy or infrastructure, then set up a free consultation today!

Set-Up A Consult Today

Why Companies Look Into Adding A Federation Layer

I haven’t been in a situation where a company has, as a business goal, switched from a decentralized approach to a more federated approach. However, here are the reasons I assume it happens.

  • The company is dealing with far too many contracts and vendors and wants to simplify what they manage
  • The company wants to make it easier for its analysts, data engineers, and data scientists to shift between teams
  • A company may also want to re-center some of the data team’s goals so they don’t just align with the business units but the organization as a whole.

Federated Data Teams

When I researched this topic, several articles referenced federated as decentralized and not a separate concept. But perhaps this is more because having a decentralized team can quickly lead to some federation of said teams and standards.

Federated, from my perspective, references the fact that the teams, the completed work, and the goals that the teams take on remain decentralized. The individual data teams are united under one company and its goals but their work is separated.

However, in terms of tooling and processes, those tend to be set by managed by a centralized team.

For example, at Facebook, most teams took on the work they felt was important to complete. However, when it came to tooling, there wasn’t much in terms of choice(although people were open to use what they wanted, the amount of support provided for the core tools made it difficult not to use).

And other teams managed and decided upon the core tooling and general infrastructure.

Everyone used Dataswarm for task orchestration.

They all used Spark or Presto to process data (and even that was through data swarm).

They all used Tableau or an internal BI tool for dashboards.

All those tables were centralized into a single data catalog and data lineage solution.

This made it easy for any data engineer to move between teams and allowed Facebook to streamline their onboarding. In addition, even if Facebook purchased solutions from external vendors, they’d likely benefit from reducing operations and maintenance contracts by picking single solutions like Presto or Dataswarm.

Pros of the Federated Model:

  1. Contextual Understanding: Similar to the decentralized approach, since all the various teams still have a data team embedded, they will continue to maintain the context of what their business partners need.
  2. Skill Development And Transferability: If all the teams are using the same solutions, then having data engineers or analysts transfer teams don’t have the same learning curve. The individuals only need to learn about the business and not the tools they are working with.
  3. Economies of Scale And Duplicative Work: By centralizing the expertise and resources, the organization can achieve economies of scale and avoid duplication of efforts. You don’t need to have one team using Airflow, one team using SSIS, and another using who knows what. Instead, you can have one managed instance that everyone is using.

Cons of the Federated Model:

  1. Potential for Bureaucracy: There is a risk that the federated approach could become bureaucratic and slow to respond to the specific needs and challenges of individual business units. This in turn can lead to a slowdown in innovation.
  2. Slows Innovation: So this point is interesting. If you have a data infrastructure team that is very focused on constantly innovating and improving then you won’t deal with this issue. But if the core data infrastructure team isn’t incentivized to improve their core infrastructure, then when the individual business units need improved infrastructure, it may not happen.
  3. Some Engineers May Find This Boring: At Facebook one of the issues that many data engineers faced(I know I had at least a dozen calls or 1:1s with them) was boredom. They thought when they were hired to work at Facebook they’d get to set up data infrastructure, work on Big Data problems, and write a lot more code. Of course, then they found that most of their work was very Airflow(like) and SQL heavy. Don’t get me wrong, some data engineering teams were very code and infrastructure-heavy. However because much of the baseline infrastructure was managed by software engineers, it took away some of the fun for some data engineers.

As a final thought about federated teams, I’d love to hear about a case where a company switched from a federated approach to a centralized approach. If that has happened.


Finding The Right Fit

Finding a data team structure that works and allows both the data teams not to feel overworked and out of alignment, as well as provide the business teams with results quickly, can help ensure your business has a successful data program.

Data teams can drive a lot of value for companies. However,  it’s crucial to put  data teams in environments that allow them to provide said value. Yes, tools play a role in that, as do the individuals hired.

And so does how you organize your teams. The right team organization balances business alignment, speed, collaboration, and maintenance costs. These are key organizational considerations that need to be made.

Thanks so much for reading, and I will see you in the next one!

If you’d like to read more about data engineering and data science, check out the articles below!

Why Is Data Modeling So Challenging – How To Data Model For Analytics

How to build a data pipeline using Delta Lake

Intro To Databricks – What Is Databricks

Data Engineering Vs Machine Learning Pipelines

Do You Need A Data Warehouse – A Quick Guide