Data Scientists – people with the ‘sexiest job of the 21st century’, a career of the future or a short-lived job, or even just a buzz word? Being a Data Scientist has many facets nowadays so let’s first try to define the term.
According to Wikipedia, “science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions”. If we go back to the original meaning of the term, a Data Scientist is someone who experiments with data, an erudite researcher of sorts who studies fundamental phenomena, either of data or thanks to data. Add a touch of pragmatism, and
Data science can be described as a vast scientific field which extracts knowledge using a series of manipulations from a given data set in order to solve specific problems.
This brings us to our next question: who are these Data Scientists everyone is talking about?
Universities make Machine Learning machines
Many universities now offer data science courses, seemingly a ‘must have’ for higher education. Although they have evolved, most of these training courses turn students into machine learners. They teach them the latest algorithms and how to master Kaggle, focusing mostly on modeling data science’s large value chain.
Machine learning, which emerged in the 1950s, is a set of tools created to allow computers to accomplish tasks ever faster and more precisely. The business context, the reality of the data and its interpretability are sometimes overlooked in order to focus on optimizing metrics that are difficult to use in companies.
Let’s go back to Kaggle to illustrate this: users of this platform have to apply highly sophisticated techniques to readily available datasets, whose origin is not always known, their ultimate goal being to reach the highest score on the prediction scale. An example of this was the famous Netflix challenge.
Although Machine Learning knowledge, both theoretical and practical, is critical to solving a problem, it is usually not sufficient in a business environment where the key to resolving an issue is the knowledge and understanding of the business context, the availability of qualified data, and appropriate interpretation tools to harness the results.
This explains what distinguishes Machine Learning specialists from Data Scientists and why it is so important for companies to know which professional to choose to successfully complete their data projects. We should also mention that 50% of data science projects fail, partly due to difficulties in accessing the right skills (IDC study).
But we still have not said precisely who these Data Scientists really are…
The mythical unicorn
Back in 2012, the Harvard Business Review claimed that Data Scientists had the “sexiest job of the 21st century” without giving many details. It is hard to be specific when you know that they do very different things depending on whether they are Data Scientist at Airbnb, JPMorgan Chase or General Motors.
But, regardless of their differences, they all share one trait: their focus on business. Indeed, the aim of data science is to use data to resolve issues in a business, such as predictive maintenance, fraud detection, customized purchasing pathways, or content recommendations, to name a few.
Such projects require a broad range of skills and expertise, namely:
- A solid understanding of the business needs and the ability to come up with a rigorous and pragmatic approach to solving the company’s issues
- Knowledge of the technical architecture in order to create and implement the best possible architecture to support the project
- A good command of data analysis, statistical description, and visualization tools to understand the data and to guide processing and modeling choices
- Data extraction, preparation, and handling skills
- Statistical and machine learning knowledge, if the project requires modeling
- The ability to interpret analytical and/or modeling results, draw actionable conclusions from those results, and explain said conclusions in a way that can be understood and implemented by project teams
- The ability to support the implementation of the results (for example: creating and automating an actionable solution when fraud is detected, or implementing an efficient strategy for sharing custom content on a website or via an email marketing campaign)
… and to measure the results to constantly improve techniques!
Yes, this list is quite a tall order… and unicorns do not exist.
What if the term “Data Scientist” was actually a misnomer? What if data science was the job of a whole team, with several different skill sets?
Shedding light on the Data Scientist myth
In reality, data science cannot be the responsibility of just one person in a business. Filling this role requires several different profiles:
- A Data Project Manager: with an acute business strategy, and a solid understanding of the technical challenges, the data project manager will harness the team’s skills and create a plan to meet the business’s needs while ensuring the plan’s seamless implementation.
- One or more Data Analysts: armed with analytical skills, they prepare and investigate data – in SQL format, using visualization or statistical tools – to answer specific questions from the project team and to present results in an impactful and useful way.
- One or more Machine Learners: part statistician, part developer, they work with large amounts of data to identify hidden patterns and predict behaviors or events.
- One or more Data Architects: they design, implement, and manage the overall architecture supporting the data processing, always keeping an eye on scalability, resilience, and the capacity for the solution to evolve.
- One or more Data Engineers: they program and maintain the collection, storage, and distribution of the data, which is in turn used by the Data Analyst(s) and Machine Learner(s). As Data Architects and Data Engineers require similar skill sets, these roles can sometimes be held by the same person.
It is crucial for a company to clarify the different roles of each of these “Data Scientists”. This will help the recruitment process, avoid any misunderstandings regarding the role, foster fulfillment among the team, and ensure the success of data-driven projects.
At fifty-five, we use these principles to adapt our strategies to our clients’ skillset and available resources, to enable them to reach their goals. This is why we have the different profiles described above:
The goal is not to find the one Data Scientist who can do it all, but to identify the needs and strengths of each person to create a functional and fulfilled data science team, and make full use of everyone’s skills. Ultimately, the key is…governance!