Practical Data Science Teams – Data Consulting
Operating a data science team is not something that can just be learned by watching lectures and videos on Coursera and Udemy. Don’t get us wrong, they are great places to learn data science and machine learning theory with practice problems.
However, they don’t teach good business practices, and how to operate a data team in a business settings. Knowing algorithms, and how to use Hadoop is not enough to have an effective data team.
Advice To Data Science Teams
Teams have to work with other departments, they have to maintain software, report to executives, and of course, return business value! Data science, like analytics and business intelligence are just tools to help make the business more effective at making money.
None of this is discussed in most data science classes. That is why, one of our key focuses is not just custom data science algorithms and models, but also data science team development.
We wanted to offer some great tips that will help your data science team be more succesful. This has nothing to do with algorithms and models, and everything with how data specialist need to operate in a business:
ROI Vs. Algorithms And Technology
As programmers, data scientists, and engineers. Most of us often prefer to focus on the technical aspects of our data projects or software we are developing. The reasons we develop products is not solely money, but to prove that we can do something. It is a challenge! We are problem solvers.
Maybe we want to prove that we can develop an algorithm that can predict whether a product is a Hot Dog or Not a Hot Dog. Just for fun!
However, at the end of the day, us data scientists, data consultants, and software engineers are hired by businesses. At the end of the day, those businesses want to see fiscal results. It doesn’t really matter whether your use a neural network or a support vector machine based algorithm, which result either saves the most money, or brings in the most revenue.
It is important to remember, because the soon a data scientists or big data analyst can figure this out. The more effective they are in their role. Part of being a data scientists is having a slight entrepreneurial spirit.
Data specialists seek out opportunities to save the company money, or find new value streams. We are often right too, because we not only understand the business but we have the data to back our insights.
That is one of the values of having a data team that is well attuned with your business. They have data to drive their decisions.
Data Engineering
One area that can occasionally be rushed is data engineering. It might seem unimportant, it might seem easy to change. However, if data isn’t engineered in an easy to manipulate and develop method. Data scientists will have one hell of a time trying to design their algorithms and workflows downstream.
There is a reason data engineers are still a higher percentage of job requests over data scientists on indeed.com.
How the data is structured plays a large role in analytics. Our team has several members who originally were data engineers and this is why they are so valuable. They are able to create not only beautiful algorithms but also data pipelines that flow naturally from point A to point B. From data warehouse to algorithm.
Data engineered well is easy to modify, easy to allow new modules and report metrics, etc. It may seem strange, but it is all possible with good data engineering!
System Design Is For Data Scientists Too
When designing an algorithm, it can be easy to forget that the results need to actually be implemented into production.
Data scientists don’t just design an algorithm and finish there. Instead, there is a need typically for some form of data warehouse, or data storage center to act as a system that both feeds and records the data from the models developed. The algorithm is not an isolated island that creates dollars on it’s own.
There will also typically be some form of interface that a user can interact with.
This might be a website, or a dashboard for instance. The purpose is to let the end user have actionable and understandable insights ported to them directly. Rather than them having to translate random numbers and outputs from models.
This can get overlooked when simply doing a Kaggle problem or creating a project in class. This is why programs like Galvanize partner their students with actual businesses, because putting an algorithm into production requires more than just developing it.
There are old systems to work with, API documentation to sift through, bugs, work arounds, and of course, corporate politics.
Corporate Politics, Yes, You Will Get Involved
Businesses always have politics. There is no way to get around it. Data science executives and program leaders need to be able to work with other teams and get funding just like the rest of the departments.
This requires understanding what other executives want and need and making sure they back your projects. If they are not backing your projects, if they are wiating to stab you in the back(and that happens), your project will fail.
Do not manipulate, but guide other team leaders to your view point, or barter, or comprimise. Just make sure you don’t start stepping on everyones toes…at least, not until your data team has proven itself a few times. Even then, don’t become to difficult to work with.
Otherwise, no one will every give your business team resources.
Documentation Is A Data Team’s Friend
Ok, 85% of programmers need to own up to something. They hate documentation. It’s ok, it is not exactly the most fun thing. However, it is important to constantly be documenting!
Don’t wait until the project is over to document!!!
Data science algorithms, data structures, and software need to be constantly documented.
No one is asking your data team to write the next Tom Swayer. Just keep legible and understandable notes that any other programmer can pick up.
You never know when a team member will leave, and thus, leave a bunch of half finished projects with no documentation behind.
Thus, for the sake of maintenance, get your data teams documenting their projects as they go. It will save your team hundreds of hours worth of technical debt and ensure your products continue to operate.
Data Science Projects Need A Software QA and Life Cycle
Data science is an arm of software development. That means it requires a process to ensure the code that is developed is robust and maintainable.
How you ask?
By having a great QA process for both the code and the data, and making sure there is a standardized process for code to go from development to production.
No, you should not test code on production!
That’s how things go wrong!!!
Don’t get us wrong, you need to push code out, but not at the expense that your code will break the build.
Peer reviews, QA, and unit testing can save your data teams a lot of grief. Make sure there are not constant impediments, like an engineer who takes forever to peer review someone else’s code.
At the same time, make sure you’re not developing straight to production!!
Data Science Is Not All About Algorithms
The truth about data science and analytics is it is not a magic bullet. It is really just another tool businesses can use to increase their profits and decrease their costs. When operated correctly, it creates a huge competitive advantage. When data is engineered well, and the teams function well with the rest of the business.
Our data consultants specialize in making sure your team is operating at full capacity. Sure, we love solving data science and machine learning problems.
However, we also offer awesome coaching and workshops to help develop you team members! We come in with both data science and business professionals who know how to make sure your team is operating with the business. This way, you are getting the full benefit of using data to drive your value streams! Whether it is big data, small data, let us know how we can serve you today!
Read more about data science below!
Amazon Is Taking Your Lunch With Data Driven Strategies
How To Interview A Data Scientist
Algorithms analytics Big Data Data Science Data Science Coaching Data Science ROI Data Science Teams Databases Executive Level Data Science leadership