Towards stochastic data science project management
Introduction
Project management has evolved over time to adapt to the latest innovations[1][2][3]. Lately, the IT projects are expected to bring innovation faster, and thus require a dedicated approach of project management[4]. From deterministic projects in the construction industry, to the fast-changing ecosystem of computer sciences, project management has developed tools & processes in order to optimize the path leading to an existing solution[5].
More recently, Agile methods[6] allowed computer science projects to improve their success rate from a less than 50% to something that continually succeeds in delivering value to the final user[7]. Since 2010, the explosion of accessible data and computing power made data science[8] an industrial opportunity (fig 0). The latter has been integrated into the conventional Agile methodologies along with software development. However, this has proven to fail since 85% of data science projects fail to deliver value[9].
Figure 0: The evolution of Google search trends on the keywords data science since 2004
We assume that the 85% failure rate in data science is due to a change in paradigm between deterministic processes and stochastic ones. Agile methodologies are applied successfully on deterministic projects[10] (e.g software engineering). However data science is a stochastic process. It brings uncertainty and makes decision making harder[11]. The lack of visibility on costs and outcomes prevents project management to be applied effectively, thus lowering the chances of conducting a data science project to its success[12].
In this context, the success rate will be optimized if we raise the level of predictability & maintain a good control over the costs. Agile methods are still valuable on deterministic processes, but we consider that Agile tools are not applicable to stochastic processes. Hence, we need to define a new set of tools that embrace the stochastic nature of data science. Those tools (processes or concrete products) need to provide visibility and predictability on the potential costs and outcomes in order to allow valuable project management when it comes to stochastic processes.
This document describes the building blocks of Agile methods and how they bring value in the context of a deterministic project. It then shows how the same tools fail to bring progress visibility and outcome predictability when applied to data science. We describe an alternative list of components than can replace Agile tools in stochastic processes, and also how they can cohabit with Agile ones. A critical review of those tools is also provided.
Agile methods and deterministic project management tools
In computer sciences, Agile methodologies help a project to deliver value[13][14]. Those methods spread the uncertainty[15] over all the duration of the project by introducing a continuous delivery process where stakeholders meet regularly and often in order to rectify resource allocation. This is called iterations. The key concept behind iterations is to deliver a small portion of the project, and confront it to the buyer (not necessarily paying, but validating the success). Delivery is done regularly, throughout the project, to ensure lost resources are limited and that the overall project does not spend all the resources without answering the need. The following figure (1) describes the steps in an iteration.
Figure 1: the user feedback loop
Iterations necessitate processes. Processes are rules to describe how stakeholders must behave. During step 2, regular meetings ensure a continuous communication flow between the stakeholders. Iterations require defining atomic deliverables, to be built in step 3. It allows developers to build a scoped and small set of features. Then, feedback (step 5 & 1) allows stakeholders to measure and track the outcome of iterations. The overall process is called the user feedback loop.
Iterations rely on user stories (the atom), that define clearly what the successful result should look like from a user point of view. It allows stakeholders to be aligned on the iteration objective. Version control commits (the atomic deliverable) allow to concretely track iterations in the product. Next, demos (to get feedback) expose the produced result to all stakeholders. Finally, sprints planning (to define the next iteration) allow stakeholders to align on the content of the next iteration.
Other tools are often used to facilitate the operations. Boards (e.g Kanban) are used to track the advance and collaboration all along a project, Git allows stakeholders to keep track of versions and analytics tools (e.g Google Analytics) allow project managers to get feedback from users.
We saw that Agile methodologies rely on a feedback loop to bring value when it comes to building a product. The feedback loop is composed of timed steps during which stakeholders formalize the context. Actors use tools to create a system model, including the product, its features, the stakeholders and their interactions. This way, actors are aligned on what is expected from the product, they have a history of changes and visibility on what’s happening at any point in time.
Limitations of these tools in a stochastic paradigm
We have listed the tools used in the latest methodologies and applied to data science. Tools are not the only factor but they have a high impact on a project’s chances of success[16] because they are the building blocks of project planning and management. However, as mentioned before, we know that data science projects fail to deliver value. Hence, we assume that the tools used have a non-negligible level of responsibility regarding the failure rate. This section will analyze how Agile tools fail to embrace the stochastic process of data science, and the consequences leading to projects failure.
Data science is in an exploratory paradigm. The result may or may not have been found, and the conclusion (if any) can say everything and its opposite. Stochastic processes are harder to control because of the absence of a closed loop[17]. As a result, the concept of iterations is harder to apply while being the strongest impacting factor on stakeholders satisfaction[18]. Hence, the tools used to manage data science projects are ineffective, because they rely on a closed feedback loop. Next we will determine how those tools fail to track and control iterations in data science projects.
Whilst being a core tool of project management methodologies, road maps are no longer valuable when applied to data science. In theory, road maps rely on two assumptions. First, that the outcome is a concrete deliverable that can bring user value. Second, that the outcome will be obtained at the due date. However, a data science iteration may not result in a concrete deliverable (e.g datasets exploration). Moreover the stochastic nature of data science makes due dates untrustworthy. As a result, road maps fail to deliver visibility on the project planning when it comes to data science.
User stories have a clear layout of “as a <user> I want to <perform action> in order to <get value>”. But expressing them at the data science level is hard because a <user> uses a product, not an algorithm. Hence, <performing an action> requires an interface, which is generally a visual one like a website. But a model does not have an actionable interface, and its output is rarely the whole value the user is looking for. For example one may need a data visualisation tool, or simply an abstraction to interact with the model (buttons, filters, …). As a result, data science user stories are not atomic anymore. They fail to clearly state the objective and they introduce interdisciplinary complexity resulting in a lack of atomicity when applied to data science.
Version control systems (VCSs) are not designed to deal with data science because they use a hash-based comparison to determine if a change has been made. Hence, any minor change in a dataset (e.g changing a character or adding an entry) is detected and versioned wether or not it has impact on the algorithm. In addition, VCSs do not integrate any intelligence. They are not aware of the typings, the distributions or any meaningful characteristics of a dataset. However, versioning is useful when it exposes impacting changes on the objects, in order to understand the major evolutions of a system. In the context of a dataset, an impacting modification corresponds to a change in data distributions, changes in columns, formats etc. As a result, VCSs fail to version datasets in a way that highlight a concrete change in the final product.
Using concise ticket names along with a simple hierarchy (epics, user stories, tasks), column based boards (e.g Kanban) provide an overview of the advances in a project with a single page view. However, in data science, a given task does not have a binary state such as “to do” or “done”. Boards focus on concrete outcomes with a single measurable metric stating wether or not the task has been achieved. The ticket names are a valuable information. But tickets do not provide a timed evolution of a data science algorithm (e.g displaying the precision or the recall). The Agile columns on the boards only allow explorations to be put in the “doing” column, and moving a ticket to “done” is purely arbitrary since stakeholders didn’t state the metric of success. Hence, boards fail to report data science exploration as a one page overview.
In this section, we saw how the deterministic design of Agile tools are not adaptable to a stochastic paradigm. As a consequence, data science project management cannot be conducted effectively. That’s because those tools, when applied to data science, fail to keep track of the progression and prevent controlling the overall process, thus leading to a failure to meet the demand.
An alternative set of tools tailored for stochastic processes
Figure 2: Stochastic processes as an exploration tree
As explained, the core change is that research processes are not led by closed loops of iterations. Deterministic projects can put an hypothesis, go for action, measure the impact, and conclude on the hypothesis (fig 1). However stochastic approaches rely on experiments that may not follow a logic connection, neither between them nor with the goal of the project, and output various unrelated conclusions (fig 2). This alternative approach embraces this change in paradigm by using a different set of management tools when it comes to data science.
Figure 3: Exploration map
In a stochastic process we have replaced roadmaps by exploration maps. That removes the notion of time scoping and instead relies on short, mid and long term. The short term are the coming, precise, concrete experiments to do (e.g apply a logistic regression to cut the dataset in two). It’s generally experiments to conduct within weeks. The mid term are a list of more general axes of research (e.g use an alternative ML approach like testing neural networks), than can be explored during the next two to six months. The long term experiments are very business oriented (e.g stop focusing on users basket but instead on their typology), and describe strategies that take more than 6 months to implement. This approach embraces the stochastic nature of data science by giving an overview of the different possibilities in the way the project may evolve. Moreover, the use of time horizons (short, mid, long) instead of due dates removes the unreliable dimension of strict time planing. As a result, teams can still use their methodology for deterministic disciplines while getting visibility on the data science advances and conclusions.
The alternative of user stories can be experiments. By experiments, we mean specifying the question asked, the possible results, and the associated conclusions. Like sprints, experiments have a duration that allows stakeholders to keep control of resource spending. Whatever the result at the end of the experiment, no more time is spent on the task unless a new experiment is formulated. Experiments can be formed like this (in bold, the structure to keep on each experiment): If KPI is comparison operator than threshold value then we can conclude conclusion. For example If the proportion of true positives is more than 90% then we can conclude it's the appropriate algorithm to predict the churn rate of our users. The experiment is also enhanced after the execution with the actual results dated, and with examples. By expressing experiments instead of user stories, stakeholders state and evaluate concrete and measurable metrics. Moreover, they focus on the data science only. As a result, experiments bring visibility and autonomy to the stakeholders by differentiating the data science atoms from the other disciplines’.
Software engineers use version control to keep track of changes in the product. In our exploratory paradigm we would suggest that data scientists should use data shape control (DSC) in order to keep track of their data sets. Instead of tracking line changes, DSCs track changes in data shapes. If a distribution changes, or a new column is added, then the tool will expose the differences. It's up to the scientist to "commit" a new version of the dataset. A version of a dataset is linked with a model, it is timed and documented. This way, versioned changes show an impact on the overall product. Furthermore, the minor changes are not adding noise to the history of versions. By linking versions to models, times and documentation, we can identify what algorithm and data were used for a given version of the product. As a result, stakeholders are able to match product metrics (e.g churn rate) with the data science history and decisions.
While column based boards are valuable in an execution paradigm, its equivalent in a stochastic process is called an experiment report. An experiment report shows the evolution in time of a specific experiment. An experiment is composed of a hypothesis, stating precisely which metric of success is evaluated. Hence, experiment reports can show, for each point in time during the experiment, what was the value of the metric. Graphical display of this metric evolution brings a clear view on the overall state of the experiment. Additionally, instead of grouping experiments whether or not they reached the metric of success, they can be ordered by creation date, current metric value, number of attached conclusions, and so on. As a result, stakeholders can visualize the advance of all experiments in a single page, getting an accurate overview of the progress made on the data science part of a project.
We have found a set of tools that put the stochastic nature of data science at the core of their design. Removing the deterministic & binary concepts like “to do” or “done”, we determined which tools allow us to translate the complexity of data science into intelligible objects that help stakeholders understand the evolution of the process, and thus make decisions. Those tools provide simple yet complete models to formalize the context of a data science project. This way, stakeholders have visibility on the research evolution, along with the flexibility to detach it from another execution process.
Combining deterministic and stochastic project management tools
Now that we’ve described alternative tools to the Agile ones, this section shows how the two paradigms can be aligned by combining their methodologies. The idea of combining deterministic and stochastic project management tools is far from absurd[19]. However the tools helping for a stochastic project do not address the problems of an execution one. Hence, it may be mandatory to use both in case of an interdisciplinary product (e.g a website relying on visual design, software engineering and a machine learning model).
By essence, Agile is not a methodology[20] but more of a philosophy in the sense that it doesn’t fix a set of tools and processes[21]. Instead, it’s a set of small building blocks that a company must pick and combine in order to fit its context[22][23] (constraints, culture, timing, etc). The result of combining some tools may be stated as Scrum or Extreme Programming for example. The alternatives proposed in this document are not a replacement for Agile ones. Instead, we’ve offered here an extension of the Agile building blocks, a complementary set of tools that are compatible with the conventional Agile ones.
As a first example, let’s assume a context where a team of 5 data scientists work in an isolated team inside a big company. Management’s request is to design an algorithm to identify market segments in the users database. The team already uses Agile tools, working with sprints of two weeks. Management uses a Kanban board to track the ongoing user stories corresponding to the data scientists explorations steps. Because this small team is only composed of data scientists inside a bigger company, they can cherry pick the use of experiments. Before, the team expressed their explorations by stating a user story describing a user that does not exist (since the actual user won’t use the model in a Python notebook). Using experiments, they now have a clear metric of success to define wether or not a ticket can be put in the “done” column. In the mean time, management and other stakeholders don’t have to worry about data science concepts, and continue to use the Kanban board to see what’s been accomplished yet.
As another example, let’s imagine a product team composed of a product manager, two software engineers and one data scientist. The objective is to leverage data science to optimize the data center power consumption. So far, the company embraced Agile methods, but the team considers the project likely to not give any result. The team can decide to make a high level roadmap listing the actions to be done by the software engineers (e.g centralize historical data, configure a supporting infrastructure, develop a data center API, and so on). It’s also possible to provide a high level roadmap of the data science challenges by scoping phases like “data exploration”, '“optimization” or “industrialization”. Because the objective is very complex and uncertain, an exploration map is used by the product manager and the data scientist to list the algorithms and experiments to conduct. Before, planning the testing of various approaches to the problem would have resulted in an untrustworthy road map that could have negatively impacted the proper execution by the software engineers. Instead, the product manager has a clear idea of the next experiments to come, is able to report to upper management what the mid-term possibilities are, while offering a time-based roadmap of the project. In this context, the data scientist is working alone on his/her scope, and is very close to the software engineers. For this reason, the team can decide to not implement data version control that could lower the velocity of the data scientist.
As a last example, we consider a small company of 10 people composed of one developer, 6 data scientists and a web designer. The company is a data studio responsible for developing a custom algorithm for a hospital to predict the number of new patients coming the next day. The company may envision to drop conventional agile tools, except demos with the client, and to perform an exploratory design either on the algorithm but also maybe on the web design of the final tool. As a roadmap, the client is only offered “a 3 weeks exploration phase conducting to a 6 months building phase in case of promising results during phase 1”. Because the algorithm output is at the heart of the challenge, the company may use an exploration map along with experiments reports to track progress. Before, the company maintained a collaborative document where data scientists put their latest metrics regarding their latest experimentations. But the document was hard to maintain because is was never up to date, each collaborator expressed things differently, the document didn’t display the evolution history, and as a result stakeholders could not identify what impact an iteration had on the overall metric of success. By using experiment reports, the whole company can consult an up-to-date view and history of who made any given modification, and how it affected the algorithm performance.
Those examples show three important traits of our solution. First, the tools can be implemented in an isolated team without forcing the others to adapt their own processes or tools. Second, our tools can cohabit with Agile ones within a project without creating redundancy or integrity challenges. Third, they are self sufficient and do not introduce a dependency on conventional Agile methods.
However, using composable project management methods (like Agile) requires self-awareness from a company to implement it successfully. It is a change in culture that can create mis-understandings and necessitates resources investment[24]. Doing so using Agile blocks along with our proposed set of tools is likely to increase the difficulty. As a result, we consider our proposal to be used by companies with an already existing Agile culture, that already reached a very good level of maturity in terms of processes and infrastructures. This way, they elevate their chances of integration success by lowering the number of changes needed. This point is further explored in the next section reviewing and defining the limits of the content of this paper.
Critical analysis of the proposed alternatives
This paper focuses on alternative tools to be used in data science projects to replace Agile ones. However, the proposed tools and concepts suffer limitations. This section describes a non-exhaustive list of criticism to be pointed out.
To begin with, it’s important to note that not all alternative tools have been described in this paper. Indeed, the current state of our research has a clear view on the tools listed above, but other ideas can be mentioned. As a first example, we did not expose any tools to ease the collaboration between data science and other disciplines. A data science algorithm often requires infrastructure to run in production[25][26]. While computer engineers may use tools like staging environments and release processes, we didn’t integrate tools to formalize the transmission of artifacts from the data science to infrastructure management. In the same way, a product requires design and reflexions about the end user experience[27]. On this point, a huge friction exists between the maths in data science, and the experience of a non-technical user. We didn’t describe any tools to ensure algorithms are designed to provide the expected behavior from a user point of view. Overall, we focused on isolating stochastic projects to conduct management, but doing so we didn’t tackle the new challenges created by the isolation (e.g effective collaboration).
This article describes a set of concepts. That means, we discuss mental models as they are, without showing their concrete manifestation in a company. To better understand, let’s reason about Agile tools. To actually perform version control, actors use softwares like Git. To display a Kanban board, teams create an account on Trello or Jira. To formalize user stories and track their progress, Kanban boards have a ticket or epic feature, allowing one to state the user story. However, this paper does not describe the concrete software that implements the mental models we listed. Hence, implementing this vision requires - for now - to use custom software, or leverage workarounds with existing products (e.g using automated Excel files for experiment reports).
Reasoning about mental concepts can be easy. But implementing them in concrete tools, as mandatory as it is, brings the question of technical feasibility. For example, Git relies on hashes to detect changes. Producing such hashes is relatively easy (the algorithms are widely known and implemented), and comparing them is not too challenging also. Hence, and because string comparison is a simple task for a computer, Git brings a lot of value by doing it effectively. However, when it comes to datasets versioning, finding an alternative to the hash-based comparison is a challenge. In fact, and because of the the data complexity our world offers, we cannot envision a silver bullet of how to perform such a comparison process. As an example, some algorithms may be impacted by a change in a dimension distribution, but not by the amplitude of the vectors composing it. While another algorithm may behave the opposite way, developing a software allowing proper control of the two situations requires extensive work along with the use of biases (i.e choosing to address a particular subset of contexts).
The problems highlighted in this paper regarding Agile methods are not new. Biggest data companies like the GAFAMs, or even SMEs with data science at their core business model often developed tools to address those same problems[28]. As a result, the industry found workarounds in order to higher the stochastic projects’ success rate. That means, the solution provided can be useless and replaced with home-made alternatives. For example, a team can use Jira and user stories, but formulate them in a way that separates data concerns (like experiments). Maybe engineering teams developed a test suite performing against datasets in production in order to monitor the product’s output. In other words, formalizing and implementing the tools described in this paper may not be the answer for the original success rate problem. Instead, and because workarounds exist, it may be necessary to question why companies didn’t yet include them in their processes.
In the last paragraph, we concluded that our models are not a solution unless they are actually used by a company. However, changing a company’s habits is a risky operation that must be wanted by the employees[29]. Moreover, the concepts in this paper - much like Agile methodologies - are composable and adaptable tools. That means, implementing our approach requires not only to use the tools, but to adapt the whole culture of the stakeholders, moving towards a culture that grasps the stochastic nature of data science. For example, if upper management forces data scientists to create road maps instead of accepting the fact that timing is untrustworthy, our proposal won’t solve the problem. But this article does not describe the culture that fits and accepts the proposed alternatives. We don’t discuss the philosophy behind it like the Agile manifesto does.
The necessary culture change to effectively integrate our tools brings a new set of questions. On one side, we identified that - for example - the use of roadmaps is ineffective for data scientists. On the other, a company still needs to offer timing visibility to its collaborators. As a result, we can observe that the alternatives proposed in this paper hardly address the current market constraints. For example, investors or clients must have certainty on wether or not a project will bring value. In this context, this paper fails to explain how a team must behave in order to comply with investors and clients’ expectations in terms of visibility and predictability.
This section has determined a list of limitations when it comes to confronting our tools to the real world. That is because we focused on the conceptual tools and their characteristics, without showing how they impact the already-in-place mechanisms (culture, contracts, habits, …). The conclusion will expose the potential axes of research to be conducted in order to fill the missing gaps.
Conclusion
The current essay exposed three main subjects. First, we described the elementary blocks composing the Agile toolbox. We stated that a feedback loop is at the heart of agility, and also that the components need to be composed together in order to build a methodology that fits the company’s context. Second, we assumed that the stochastic nature of data science is the root cause of the projects’ failure rate in this industry because stochastic researches don’t have a closed feedback loop. Hence, we highlighted how the Agile tools alone are failing to help project management when it comes to data science. Third, we described a set of additional tools a team can use in order to bring visibility and control over a data science project. We also showed how the proposed tools tackle the limitations of the original ones from Agile culture. Fourth, we discussed examples about how our proposal can be integrated and combined with an existing Agile culture. Finally, we pointed out some of the known limitations regarding the idea described in this essay. Whilst describing the limitations by confronting the concepts with the reality of the industry, two axes of questioning arose.
The first axe focuses on easing the collaboration of data scientists with other disciplines. Indeed, isolating stochastic projects to manage them is powerful, but it also brings collaboration challenges by creating silos[30]. Collaboration in the context of data science is expressed in three ways. To begin, data science requires to collaborate with computer engineering to maintain data and algorithms in production environments. For example, a data scientist must communicate with a data manager regarding data shapes, location or privacy. Hence, further research may be conducted towards standardizing the transmission of artifacts from an experimental environment (e.g the data scientist’s computer) to a production one (i.e the servers actually accessed by the clients). To continue, data science often requires a human-machine interface in order to bring value to the end user. Because an algorithm is complex to understand by someone who is not specialized, theses interfaces have a strong responsibility of vulgarizing and abstracting the underlying data science. For example, displaying a dotted line is often used to show the probabilistic nature of a given value, but it requires the data scientist to describe the data to the designer. Hence, creating a design that helps the user get value from a product requires extensive collaboration between the data scientist who knows the meaning of the data points, and the designer who creates the data visualization. As a result, reflexions are necessary to make the product lifecycle align the data science with product design. A good human-machine interface is mandatory to get value from data science. But it does not prevent the need of education towards the end user. For example, a software is released with some documentation on how to use it, its features and limitations. When it comes to data science, algorithms are rarely explained. However, and because of the probabilistic, non-deterministic nature of many data science algorithms (especially machine learning), we consider mandatory that a data scientist delivers the documentation of data algorithm produced. This documentation shall allow the end user to understand the limits of the product used, the potential biases, insisting on the probabilities for the algorithm to fail, etc. As a result, research is to be conducted to formalize the collaboration between the data scientist and the user in terms of transparency and knowledge.
The second axis of questions that the critical review showed us is about the impact of a company’s culture on the data projects success rate[31]. Indeed, after describing a set of conceptual tools, it’s clear that the tools and processes do not bring value unless the company culture is adapted to it. For example, agility is higher when teams are small, interdisciplinary, and have a complete ownership on a product (or a subset of it). The same way, using our methodology requires a culture that embrace data driven decisions, in which data science is handled at the same level as the design and programming processes. But, as we stated, changing a company’s culture cannot be forced and is a real challenge. The need of an adapted culture and the fact that doing so is hard, emphasizes the need for further explorations in the following three subjects. First, it’s important to identify how the culture of a company impacts its success compared to the tools used. This way, one can measure precisely the role of tooling, hence providing more precisions on the interest of this article. Second, there is a need to formalize the culture traits necessary to rely on the alternative toolset in data science. For example, researches can be conducted to expose the impact of a teams’ size on its data science projects success rate. Third, we can explore how cultures are influenced in the data science industry. For example, it’s important identify the role of open source software, and the impact of the biggest data leaders (e.g Google, Netflix, Spotify, …). Indeed, and in order to elevate the success rate of data science projects, cultures must evolve. We need to understand why companies have not yet embraced a stochastic culture, and also how to influence an evolution towards it.
To conclude, the introduction of abstract modeling tools helps to think in a way that embraces the stochastic nature of data science. However, abstract concepts are only the starting point. We saw that implementing concrete software to support the abstract concepts is mandatory to bring the overall approach any value. Also, we highlighted the importance of the company culture when it comes to data science project management. Creating the required software and impacting the culture at the industry level is a great business opportunity, with various business models available. We must be aware that realistically, it will take years to actually see the changes and eventually a raise in data sciences projects’ success rate.
References
- [1] Seymour, T., & Hussein, S. (2014). The History Of Project Management. International Journal of Management & Information Systems (IJMIS), 18(4), 233.
- [2] Cleland, D. I. (2004). The Evolution of Project Management. IEEE Transactions on Engineering Management, 51(4), 396–397. https://doi.org/10.1109/tem.2004.836362
- [3] Hall, N. G. (2012). Project management: Recent developments and research opportunities. Journal of Systems Science and Systems Engineering, 21(2), 129–143. https://doi.org/10.1007/s11518-012-5190-5
- [4] Highsmith, J., & Cockburn, A. (2001). Agile software development: the business of innovation. Computer, 34(9), 120–127. https://doi.org/10.1109/2.947100
- [5] Munns, A., & Bjeirmi, B. (1996). The role of project management in achieving project success. International Journal of Project Management, 14(2), 81–87. https://doi.org/10.1016/0263-7863(95)00057-7
- [6] Cohen, D., Lindvall, M., & Costa, P. (2004). An Introduction to Agile Methods. In Advances in Computers (pp. 1–66). Elsevier. https://doi.org/10.1016/s0065-2458(03)62001-2
- [7] Abbas, N., Gravell, A. M., & Wills, G. B. (2010). The Impact of Organization, Project and Governance Variables on Software Quality and Project Success. In 2010 Agile Conference. IEEE. https://doi.org/10.1109/agile.2010.16
- [8] Van Der Aalst, W. (2016). Data Science in Action. In Process Mining (pp. 3–23). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-662-49851-4_1
- [9] Matt Asay (2017), TechRepublic. In 85% of big data projects fail, but your developers can help yours succeed (https://www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/)
- [10] Ahmed, A., Ahmad, S., Ehsan, N., Mirza, E., & Sarwar, S. Z. (2010). Agile software development: Impact on productivity and quality. In 2010 IEEE International Conference on Management of Innovation & Technology. IEEE. https://doi.org/10.1109/icmit.2010.5492703
- [11] Matsudaira, K. (2015). The science of managing data science. Communications of the ACM, 58(6), 44–47. https://doi.org/10.1145/2745390
- [12] Doug Gray (2019). Caserta Data Blog. In 10 Reasons Why Analytics & Data Science Projects Fail (https://caserta.com/data-blog/reasons-why-data-projects-fail/)
- [13] Serrador, P., & Pinto, J. K. (2015). Does Agile work? — A quantitative analysis of agile project success. International Journal of Project Management, 33(5), 1040–1051. https://doi.org/10.1016/j.ijproman.2015.01.006
- [14] Lindvall, M., Basili, V., Boehm, B., Costa, P., Dangle, K., Shull, F., … Zelkowitz, M. (2002). Empirical Findings in Agile Methods. In Extreme Programming and Agile Methods — XP/Agile Universe 2002 (pp. 197–207). Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-45672-4_19
- [15] Chuck Cobb. High Impact Project Management, Inc. In Management of uncertainty in agile project (https://managedagile.com/management-of-uncertainty-in-agile-projects)
- [16] Patanakul, P., Iewwongcharoen, B., & Milosevic, D. (2010). An Empirical Study on the use of Project Management Tools and Techniques across Project Life-Cycle and their Impact on Project Success. Journal of General Management, 35(3), 41–66. https://doi.org/10.1177/030630701003500304
- [17] Bar-Shalom, Y., & Tse, E. (1974). Dual effect, certainty equivalence, and separation in stochastic control. IEEE Transactions on Automatic Control, 19(5), 494–500. https://doi.org/10.1109/tac.1974.1100635
- [18] Ferreira, C., & Cohen, J. (2008). Agile systems development and stakeholder satisfaction. In Proceedings of the 2008 annual research conference of the South African Institute of Computer Scientists and Information Technologists on IT research in developing countries riding the wave of technology - SAICSIT ’08. ACM Press. https://doi.org/10.1145/1456659.1456666
- [19] Henneke, D., & Lüthje, C. (2007). Interdisciplinary Heterogeneity as a Catalyst for Product Innovativeness of Entrepreneurial Teams. Creativity and Innovation Management, 16(2), 121–132. https://doi.org/10.1111/j.1467-8691.2007.00426.x
- [20] Kevin Lonergan (2018). PMIS (Project Management Informed Solution) Project Management Blog. In Why Agile will never be a project management framework (https://www.pmis-consulting.com/why-agile-will-never-be-a-project-management-framework/)
- [21] Cao, L., Mohan, K., Xu, P., & Ramesh, B. (2009). A framework for adapting agile development methodologies. European Journal of Information Systems, 18(4), 332–343. https://doi.org/10.1057/ejis.2009.26
- [22] Jory MacKay (2018). Planio GmbH. In The Ultimate Guide to Implementing Agile Project Management (and Scrum) (https://plan.io/blog/ultimate-guide-to-implementing-agile-project-management-and-scrum/)
- [23] Chow, T., & Cao, D.-B. (2008). A survey study of critical success factors in agile software projects. Journal of Systems and Software, 81(6), 961–971. https://doi.org/10.1016/j.jss.2007.08.020
- [24] Boehm, B., & Turner, R. (2005). Management Challenges to Implementing Agile Processes in Traditional Development Organizations. IEEE Software, 22(5), 30–39. https://doi.org/10.1109/ms.2005.129
- [25] Demchenko, Y., Grosso, P., de Laat, C., & Membrey, P. (2013). Addressing big data issues in Scientific Data Infrastructure. In 2013 International Conference on Collaboration Technologies and Systems (CTS). IEEE. https://doi.org/10.1109/cts.2013.6567203
- [26] Avery, P. (2002). Data Grids: a new computational infrastructure for data-intensive science. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 360(1795), 1191–1209. https://doi.org/10.1098/rsta.2002.0988
- [27] Hise, R. T., O’Neal, L., McNeal, J. U., & Parasuraman, A. (1989). The Effect of Product Design Activities on Commercial Success Levels of New Industrial Products. Journal of Product Innovation Management, 6(1), 43–50. https://doi.org/10.1111/1540-5885.610043
- [28] Andrew Maher (2020). Spotify. In How we improved data discovery for data scientists at Spotify (https://labs.spotify.com/2020/02/27/how-we-improved-data-discovery-for-data-scientists-at-spotify)
- [29] McManus, Kevin. "The challenge of changing culture." Industrial Engineer, vol. 35, no. 1, Jan. 2003, p. 18. Gale Academic OneFile, Accessed 29 May 2020.
- [30] Miller, L. C., Jones, B. B., Graves, R. S., & Sievert, M. C. (2010). Merging Silos: Collaborating for Information Literacy. The Journal of Continuing Education in Nursing, 41(6), 267–272. https://doi.org/10.3928/00220124-20100401-03
- [31] Garmendia, J. A. (2004). The Impact of Corporate Culture on Company Performance. Current Sociology, 52(6), 1021–1038. https://doi.org/10.1177/0011392104046620