What Skills And Tools Do Data Analysts Need?
Photo by Isaac Smith on Unsplash
Analyzing data is a skill required but more than just data analysts and data scientists in 2021.
Many roles require some form of data analysis whether it be done in Excel, Jupyter Notebooks, or Tableau.
However, data analysis is not generally a course taught in college. It might be tangentially taught in an economics or business class, but rarely are the best practices or processes taught as a course.
Yet, product managers, marketers, solutions partners, and a wide array of other roles find themselves analyzing data. Whether it is to make better decisions on which features to include in software or what marketing strategy to take.
Analyzing data is important.
In this article, we will discuss the tools, skills, and general process for you to perform data analysis. You can use these steps as a set of guidelines to help you out in your future projects.
The Technical Tools For Data Analysis
Data analysts are no longer limited to just using Excel in 2020. Now data analysts are often pushed to know a lot more data analysis tools like SQL, Power Query and Power BI just to name a few.
SQL(both a skill and a tool)
Icons made by Pixel perfect from www.flaticon.com
SQL or Structured Query Langue remains the language of data. There has been some push to try to simplify this technical layer with drag and drop and GUI based tools. However, if you’re looking at a data scientist or data analyst job description you will likely see SQL as a job responsibility.
This is because companies of all sizes are looking more to data to help drive their decisions. Even small and medium-sized companies are developing data warehouses to centralize all their data or at least using some sort of virtualization layers like Denodo or Starburst Data to help them access data across multiple internal data sources like their ERP, CRM, and ERMs.
SQL has proven to be a very powerful language that helps do things like join database tables, clean up messy data, create basic models, calculate KPIs, and more.
As a data analyst, SQL can be used to help prepare and analyze data before even putting it into another tool like Excel or Jupyter Notebooks.
Excel
Excel isn’t going anywhere. Event though many analysts and data scientists are switching over a lot of their work into Jupyter notebooks, Excel still plays a roll in performing analytics, doing exploratory data analysis as well as building reports.
The versatility and simplicity of Excel is what makes it popular.
Why learn how to code a pivot table when you can just highlight a table and create one?
Power Query/Power BI
Power Query and Power BI are tools built to help simplify the data workflow. For both analytics and data visualization.
First, take a look at Power Query. Power Query’s goal is to allow data analysts to easily pull in data from multiple sources. May it be Workday, Salesforce, MySQL, Oracle, etc. On top of that, you can implement different forms of data transformations to help clean and model the data to better fit your analysis.
Finally, a big reason many analysts like using Power Query is because you don’t need to learn or use any code to do any of it.
Microsoft Power BI is a business intelligence platform that provides non-technical business users with tools for aggregating, analyzing, visualizing, and sharing data. Power BI’s user interface is fairly intuitive for users familiar with Excel and its deep integration with other Microsoft products makes it a very versatile self-service tool that requires little upfront training.
Jupyter Notebook
The popularity of Jupyter Notebook goes hand in hand with the increased usage of R and Python as analytical tools. Analysts now can organize their thoughts into concise workbooks rather than code in some random IDE or Notepad++.
Also, the recent surge of online Jupyter Notebooks that are easy to share, version control and integrate into a team’s workflow has only further provided support to analysts.
Essentially, Jupyter Notebook has become some analysts Excel. You can do a lot of similar tasks like pivoting, connecting to databases, creating graphs and charts, and aggregating data but you have the option to run these Notebooks on the cloud.
This is a huge advantage when you are running calculations on massive datasets. Have you ever accidentally crashed Excel? Well, imagine you could run your Excel spreadsheet on a set of distributed GPUs using Dask on a tool like SaturnCloud?
Jupyter Notebook might have a steeper learning curve compared to Excel, but it also allows for a lot more heavy lifting.
Tableau/Looker/And Other Dashboard Builders
These tools are all bundled together because in many regards they are all kind of the same. Yes, they don’t all work the same or have the same features. However, they generally are used to solve the same problem.
How to display your data.
So regardless of which of these tools you use, it is just important that you know at least one of them. Truthfully, it is usually easier to learn the nuances of a specific dashboard tool compared to the skills required to develop a solid dashboard UI.
Since many of these tools are developed to be used by non-technical people, they are usually dragged and drop based and require very little technical knowledge.
What also makes these tools helpful is you can just as much use these tools as data analysis tools as well as a final dashboard tool.
Soft Skills
Data analysis is not only statistics, SQL, and scripting. It is also about staying organized, being able to articulate to managers the discoveries that have been unearthed, and skills that are generally not considered technical.
Here is a list of some of the skills I have found that is very helpful for data analysts.
Context Focused
Besides being focused on details, data analysts and data scientists also need to focus on the context behind the data they are analyzing.
This means understanding the needs of the other departments who have requested the project, as well as understanding the processes behind the data they are analyzing.
Data typically represent the processes of a business. This could be a user interacting with an E-commerce site, a patient in a hospital, a project getting approved, software being purchased and invoiced, and so on.
All of these get represented in thousands of data warehouses and databases across the world and they are often stored only slightly different with different business rules.
That means that data analysts need to understand those business rules and logic. Otherwise, they can’t perform good analysis; they will make bad assumptions and they will often create dirty and duplicate data.
All because they did not understand the context. Context allows data-focused teams to make clearer assumptions. They are not forced to spend too much time in the hypothesis-phase where they are testing every possible theory. Instead, they can utilize context to help speed up the process of their analysis.
The metadata (e.g. context) around data, is like gold to a data scientist. It isn’t always there, but when it is, it makes our jobs easier.
Communication
The term data storyteller has become correlated with data scientists but it is also important for anyone who uses data to be good at communicating their findings.
This skill-subset fits in the general skill of communication. Data scientists have access to multiple data sources from various departments. This gives them the responsibility and needs to be able to clearly explain what they are discovering to executives and SMEs in multiple fields.
They take complex mathematical and technological concepts and create clear and concise messages that executives can act upon.
They’re not hiding behind their jargon but they transcribe their complex ideas into business-speak. Analysts and data scientists alike must be able to take numbers and return clearly stated ROIs and actionable decisions.
This means not only taking good notes and creating solid workbooks. It also means creating solid reports and walk-throughs for other teams.
How do you do that? This could be a post in itself but here are some quick tips to better communicate your ideas in a report or presentation.
- Label every figure, axis, data point, etc.
- Create a natural flow of data and notes in a notebook.
- Make sure to highlight your key findings and sell your conclusion. This is easier said than done when you have lots of data to prove your point.
- Imagine you are telling a story or writing an essay with data.
- Don’t bore your audience to death, keep it sweet and to-the-point.
- Avoid heavy math jargon! If you can’t explain your calculations in plain English, you don’t understand them.
- Peer-review your reports and presentations to ensure maximum clarity.
The video The best stats you’ve ever seen by Hans Rosling is a great example of data storytelling.
Empathetic Listening
Data scientists and analysts aren’t always on the same team as the business owners and managers who come to them with questions. This makes it very important for analysts to listen diligently to what is being asked of them.
Working in large corporations, there is a lot of value in trying to find other teams’ pain-points and problems and help them through them.
This means having empathy. Part of this skill requires experience in the workforce and another part of this skill simply requires the understanding of other human beings.
Why are they asking for the analysis and how can you make it as clear and accurate for them as possible?
Miscommunication with the business owners can happen quite easily. Thus, the combination of listening diligently, as well as listening for what is not being said, is a great asset.
Creative and Abstract Thinking
Creativity and abstract thinking help data scientists better hypothesize possible patterns and features they are seeing in their initial exploration phases.
Combining logical thinking with minimal data points, data scientists can lead themselves to several possible solutions. However, this requires out-of-the-box thinking.
Analytics is a combination of disciplined research and creative thinking. If an analyst is too limited by confirmation bias or process, they might not reach the correct conclusions.
If, on the other hand, they are thinking too wildly and not using basic deduction and induction to drive their search, they could spend weeks trying to answer a simple question as they wander through various data sets without clear goals.
Engineering Mindset
Analysts need to be able to take big problems and data sets and break them down into smaller pieces. Sometimes, the two or three questions asked by a separate team can’t be answered with two or three answers.
Instead, the two or three questions themselves might need to be broken down into small, bite-sized questions that can be analyzed and supported by data.
Only then can the analyst go back and answer the larger questions. This is particularly true for large and complex data sets. It is becoming more and more important to be able to breakdown analysis into its proper pieces.
Attention to Detail
Analysis requires attention to detail. Just because an analyst or data scientist might be a big-picture type person, it doesn’t mean they are not responsible for figuring out all the valuable details of a project.
Companies, even small ones, have lots of nooks and crannies. There are processes on processes and not understanding those processes and their details affect the level of analysis that can be done.
Especially when you’re writing complex queries and programming scripts. It is very easy to incorrectly join a table or filter the wrong thing. Thus, it is key to always double and triple check your work. Also, if scripts are involved, peer reviews should be too.
Curiosity
Analysis requires curiosity. We will get into this when we break down the process. However, a step in the analysis process is listing out all the questions you believe are valuable to the analysis. This requires a curious mind that cares to know the answer.
Why is the data the way it is? Why are we seeing patterns? What can we use to find the answer? Who would know?
These are just some of the questions that can help to point analysis in the right direction. You need to have the drive and desire to know why.
Tolerance of Failure
Data science has a lot of similarities to the science field, in the sense that there might be 99 failed hypotheses that lead to one successful solution.
Some data-driven companies only expect their machine learning engineers and data scientists to create new algorithms, or correlations, every 12 to 18 months. This depends on the size of the task and the type of implementation required (e.g. process implementation, technical, policy, etc.).
In all this work there is failure after failure, there is unanswered question after unanswered question and analysts have to continue.
The point is to get the answer, or clearly state why you can’t answer the question. However, you can’t just give up because the first few attempts failed.
Analysis can be a black hole for time. Question after question can be incorrect. That is why it is important to have a semi-structured process. One that guides analysts but doesn’t hold them back.
The Data Analytics Process
Pre- Exploratory Data Analysis
After working for nearly a decade across multiple industries, I have had to work on multiple data sets. When I first started working in the data world, I assumed every data set I was provided was accurate and the source of truth.
How many times I was wrong.
The phrase “Trust, but verify” comes to mind.
This is why before starting on an analysis or promising too much I will generally spend time analyzing the data just to ensure it makes sense.
This is somewhat like EDA, except I purposely am looking for data issues.
High null rates in columns, bad data, data that doesn’t align with similar sources throughout a company, etc.
Anything that could cause an issue down the line (Regardless of how good my analysis or model is). You don’t want to develop some dashboard, or finish some analysis and write some sort of conclusion that the manager knows is wrong because you used a bad data source.
Thus, spending a day or two just to make sure your data set is reliable is a good idea.
Data Gathering And Assessment
An important step when you first start doing analysis, especially as a data engineering or data science consultant is to take inventory of what data is available.
Generally, most clients and internal business owners will have a general set of data they want looked at. However, having good insights into what other data sets exist can be useful.
It can be useful in two ways.
First, you might need to bring some of the data in these data sources. However, if you didn’t know that data existed, then you wouldn’t be able to realize this.
Second, other data sources might not be useful in your current situation. However, they might provide useful information in the future. So when you are writing your conclusion you can point future analysts to the follow-up questions as well as what data sets they could pull the information from.
Questions
One important step, that can get easily missed because analysis has a way of dragging you down its rabbit hole is defining what questions your data team would like to answer.
It doesn’t matter whether the team is a team of data scientists, analysts, marketers, or business owners.
Without a set of questions, you could spend days spinning in circles with no real conclusions.
Setting up questions is like setting goals. It helps you define what done is. Without it, you could be constantly getting into the data.
That being said, each question answered, might lead to 3 new questions. This is fine and natural. It is also part of the process. Again, track these questions!
The purpose of this is not just to create goals, but also so when your team starts to walk through your notebook and finding with directors that it is easy for everyone to follow. Think of this more like a storybook, just with data! Make sure you can retrace your steps so you can re-explain your analysis.
Exploratory Data Analysis
Exploratory data analysis is a combination of both trying to answer specific questions as well as
Peer Review Your Results
Mistakes happen, and bad assumptions can be made all the time with analysis. Sadly, unlike in programming where they are just bugs and are somewhat expected.
Mistakes as analysts or data scientists are just that, mistakes. One number off, and your entire analysis can be off.
Thus, it is important to take a step back on a regular cadence with each project and review the work being done. This is also why it is great to keep a clear and concise notebook. That way, when you are presenting to your team. You can explain each detail easily.
The peer reviewer or reviewers should look for errors, formatting issues, and ease of understanding. If your team can’t understand what you are trying to say, it will be very difficult for a director who hasn’t been working with the data scientists closely to fully grasp the conclusions.
Our team, along with many other programmers and analysts like to say:
“ If you can’t explain it simply, you don’t understand it at all”
Charts And Reporting
One of the challenges that some analysts and data scientists struggle with is that they don’t create a final product with every project. At least, not in the same way that software engineers and data engineers do.
This is where consolidating all your analysis and charts into a final report is a natural final product. It’s key to point out that we point out that “consolidate” is an important term here. Merely shoving all your charts into a word document and calling that satisfactory is unhelpful.
Also, putting charts that are interesting but more of a distraction into a final analysis doesn’t provide a lot of value.
So making sure you create an analysis with a clear flow of ideas is important. Rather than a scattered bit of charts and findings that lack a story. If you believe some of your other findings are worth noting, then it might be best to include them in an appendix or a secondary analysis.
As a data science consultant, I find this has been useful at helping my clients understand what is most important in terms of their data set without distracting them.
Conclusion Writing
Another important step, that is essentially part of report writing is writing a conclusion. We won’t go too in-depth here, but making sure that as an analyst or data scientist that you have some of your thoughts is key.
You have worked on the data set more than anyone else. You probably have your thoughts on it. Making sure you take a stand on what you believe your data means is an important part of your report.
Externally Sharing Resources And Reporting
Finally, once you have written your report and reviewed your results it’s time to share them. Companies have various methods for doing this.
You might have an internal tool like Slack with a channel focused on the area you researched. It might be good to share there. You can also do a presentation with the business owners to make sure everyone gets on the same page.
Also, it is worthwhile to then archive the analysis for future use in some shared location so everyone that should have access to it, does.
Data Analysis Is Not just For Analysts In 2021
As 2020 comes to an end, our team is looking at the trends that are likely to preset. In particular, we are noting that tools like PopSQL, SaturnCloud, and other collaborative data analytics tools are becoming more popular.
Companies aren’t becoming more data-driven. Companies already are more data-driven. In 2020 we helped companies ranging from 7-figures to Fortune 100 upskill and improve their data analytics skills.
So all these skills we pointed out are already being implemented heavily.
If you are interested in reading more about data science or data engineering, then read the articles below.
Developing A Data Analytics Strategy For Small Businesses And Start-ups
3 Ways To Improve Your Data Science Teams Efficiency
4 SQL Tips For Data Scientists
How To Improve Your Data-Driven Strategy
What Is A Data Warehouse And Why Use It