How To Scale Your Data Team’s Impact Without Scaling Costs
Photo by Lukas
As you increase your analytical processes and abilities, you’ll unavoidably increase costs. But there are definite ways to avoid having your costs grow at an unsustainable rate. This is the topic of a panel at the Modern Data Stack Conference featuring Maura Church, ex-director of data science and data engineering from Patreon.
As many heads of data are currently being forced to downsize or rationalize their spending, the question becomes, “How do you build a data stack without increasing costs drastically?” To help answer this question, I interviewed Maura to get an idea of how she set up a successful data team.
As put by Maura, once you’ve proven the power of data and your data team at your company, the next question is, “How can I get more?” and “Will it come at a greater cost?” If so, are there trade-offs you can make to ensure you continue to maximize your output without getting a massive cloud bill?
In this article, we will discuss how Maura grew the data engineering and data science teams at Patreon and made sure to balance costs and output.
Growing Patreon Data Team
When Maura started at Patreon, the entire company was only 45 people with a two-person data team. At first, it grew slowly as they added another member, and Maura became the manager of said team. At the time, they were mostly using Google Sheets and a copy of the production MySQL database to do a lot of their analysis and data management.
Now, after growing the team for several years, they have a data engineering team of about seven people and a data science and analytics team of about 18 people.
Throughout that period, Maura helped take the team from doing analysis in Google sheets to a centralized data warehouse in Redshift powering data use cases across the company and a production machine learning ecosystem in Databricks
Each of these decisions was pushed by actual use cases and needs of the business for better and faster reporting and analysis. Similarly, as they started shifting to better data warehouses, they also started looking towards other solutions to help amplify their effectiveness without growing their costs too quickly.
But as time went on, there were growing costs, not just on the data team, but also in other departments. Maura and her team, in turn, started putting together initiatives and programs to help manage this.
All that said, what was really driving up costs?
What Causes Data Teams Costs to Grow
To better understand how to scale a data team without drastically increasing costs, we must first understand what increases data teams costs. In Maura’s case, she listed several issues that could risk increasing costs.
This included:
- Increasing compute and storage costs
- Tool bloat
- Third-party paid API usage
- Mis-matched skill levels for tasks
All of these posed a high-cost risk while decreasing the value output the team was driving.
Here is a deeper dive into how some of these costs can impact a data team.
Tool Bloat and Vendor Costs
In particular, tool bloat and computational costs pose considerable risks to many teams. Yes, one or two small $20k contracts don’t mean too much. But once you’re at 5, 6, and 7 contracts, each $20k+, your company suddenly spends the equivalent of a one or two-person’s salary budget on tools.
Before even driving value.
We’ll talk a bit about this later, but Patreon as a high-growth startup had hundreds of vendors and contracts within the company that contributed to costs, a portion of which the data team was responsible for.
But, of course, some vendors don’t have a flat fee and instead rely on usage, such as Databricks, that can also massively drive up costs.
Compute and Databricks
An all too familiar cost to any data team is compute. In this modern era of infinite computation, cloud bills can also be, well, infinite (so to speak). This was another area Maura and her team saw costs rising. Now, not to skip to the punchline, but in this case, an easy opportunity for reducing cost was evaluating if ML models were run more frequently that would add value.
Now, I am all too familiar with machine learning models running on systems and being the heaviest form of computer. We often had to request ML engineers to fix or stop their models at Facebook because they’d run up our quota.
In this case, the ML models can be run less frequently. Of course, Maura’s team did an analysis to better understand the accuracy and business impact to figure out how often is enough in terms of retraining a model or changing the model architecture to reduce compute or other costs.
Let’s take a step back from tactical and easy wins and talk about what data teams need to do to reduce costs from a higher level.
How Can You Avoid Costs Rising Too Fast
Now that we know how data teams can cause costs to explode…
How can you reduce your data infrastructure costs?
How can you reduce your data team costs?
For Maura, she realized there needed to be several categorical changes made that ranged from cultural to processes.
What Cultural Changes Are Needed
The major shift that needed to occur at Patreon was data scientists and engineers needing to become aware and prioritize cost. In the past, it hadn’t been a metric that was heavily focused.
However, as Patreon’s data needs scaled, they started treating reducing cost as a key objective they were working towards–not just for the data teams but all teams.
The next point Maura referenced was that along with the new focus on cost savings from a team standpoint, they also started celebrating small and big wins, like when an engineer found major cost savings by simply changing one AWS config file.
Creating a culture that cares about costs is an important step. Otherwise, you’ll just end up back with an ever-increasing bill when someone introduces a new expensive process.
What Process Changes Are Needed
In order to scale the data team and continue to keep costs low, there also need to be process changes.
The data science team started having a recurring agenda item where the focus revolved around cost-saving opportunities. They’d essentially ask, “Do you notice a cost saving opportunity within any of the teams you’re partnering with?”
The finance team also published a list of all their vendors and costs (over 300 vendors) so teams could be aware of what each vendor was costing and could double=-check if they were even using it. Another benefit of this was other employees might notice a vendor they worked with in the past and know if they could get a discount during the upcoming negotiations.
Do You Need an All-Senior Team?
Another way Maura kept managing costs was by putting the right person on the right job. For example, it can be tempting to hire all senior employees as they will likely require little to no teaching or instruction. But senior employees are expensive, and there are problems they may have worked on thousands of times over that might be better handled by a junior or mid-level employee with guidance from a senior.
Of course, this is a fine balance; you neither want an entire team of juniors nor an entire team of seniors (in many cases). Both sides provide benefits, but relying on either for their specific benefit often omits the cons side.
Overall, Maura and her team constantly balanced what was required with the investments they were making, whether it was how often a model needed to be retrained or the seniority of employees being hired, helping them keep their costs reasonable.
Conclusion
Simply scaling up your data team while allowing costs to explode puts your data initiatives at risk. Many teams are being reduced or cut altogether in the data space because the teams weren’t kept small (or in one discussion I had, just because the previous executive staff made bad choices).
To keep costs low while still providing impact with data, it’s a whole team shift. The culture, processes, and people you implement will play a significant role.
If you want to learn more about Maura’s experience at Patreon and how she scaled up her data team, check out the upcoming MDS conference!