Demystifying Data Science Projects: a Look on the People and Process of Data Science Today
Aho, Timo; Sievi-Korte, Outi; Kilamo, Terhi; Yaman, Sezin Gizem; Mikkonen, Tommi (2020-11)
Aho, Timo
Sievi-Korte, Outi
Kilamo, Terhi
Yaman, Sezin Gizem
Mikkonen, Tommi
11 / 2020
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202101051053
https://urn.fi/URN:NBN:fi:tuni-202101051053
Kuvaus
Non peer reviewed
Tiivistelmä
Processes and practices used in data science projects have
been reshaping especially over the last decade. These are different from
their software engineering counterparts. However, to a large extent, data
science relies on software, and, once taken to use, the results of a data
science project are often embedded in software context. Hence, seeking
synergy between software engineering and data science might open
promising avenues. However, while there are various studies on data science
work
ows and data science project teams, there have been no attempts
to combine these two very interlinked aspects. Furthermore, existing
studies usually focus on practices within one company. Our study
will fill these gaps with a multi-company case study, concentrating both
on the roles found in data science project teams as well as the process.
In this paper, we have studied a number of practicing data scientists to
understand a typical process
flow for a data science project. In addition,
we studied the involved roles and the teamwork that would take place
in the data context. Our analysis revealed three main elements of data
science projects: Experimentation, Development Approach, and Multidisciplinary
team(work). These key concepts are further broken down to
13 different sub-themes in total. The found themes pinpoint critical elements
and challenges found in data science projects, which are still often
done in an ad-hoc fashion. Finally, we compare the results with modern
software development to analyse how good a match there is.
been reshaping especially over the last decade. These are different from
their software engineering counterparts. However, to a large extent, data
science relies on software, and, once taken to use, the results of a data
science project are often embedded in software context. Hence, seeking
synergy between software engineering and data science might open
promising avenues. However, while there are various studies on data science
work
ows and data science project teams, there have been no attempts
to combine these two very interlinked aspects. Furthermore, existing
studies usually focus on practices within one company. Our study
will fill these gaps with a multi-company case study, concentrating both
on the roles found in data science project teams as well as the process.
In this paper, we have studied a number of practicing data scientists to
understand a typical process
flow for a data science project. In addition,
we studied the involved roles and the teamwork that would take place
in the data context. Our analysis revealed three main elements of data
science projects: Experimentation, Development Approach, and Multidisciplinary
team(work). These key concepts are further broken down to
13 different sub-themes in total. The found themes pinpoint critical elements
and challenges found in data science projects, which are still often
done in an ad-hoc fashion. Finally, we compare the results with modern
software development to analyse how good a match there is.
Kokoelmat
- TUNICRIS-julkaisut [19294]