Can machine learning help us to estimate the costs of software development projects?
Christos Zois
Business Analyst, IT Project Manager & Product Owner | Agile Certified | Specializing in Enterprise Architecture, Visualizing & Simplifying complex Requirements
In today’s fast-changing world, where success in managing projects is a critical factor for the success of the entire organization, and many projects are terminated when it becomes apparent that they will significantly exceed their planned time and budget goals, accurate estimation of project duration and required resources becomes an issue of prime importance. On the one hand, overestimating time and effort (or budget), due to a presumed lack of resources or because the projected completion is too late, can convince management not to approve projects that may otherwise contribute to the organization. Subsequently, underestimation may result in the approval of projects that will fail to deliver the expected product within the time and budget available. In spite of the critical role of accuracy, examples of incorrect estimation are present, especially in IT projects, resulting in enormous waste of time and money.
Da Yang et al. investigated in a survey potential causes of inaccurate estimations. The two most severe causes are volatile requirements and unclear requirements. Lederer proposed in his book “Nine Management Guidelines for Better Cost Estimating”, the volatile requirements may make computing managers and professionals view systems estimating and development as an effort to hit a moving target. The third cause is pressure from senior managers and clients to set or change the estimation results because many organizations use capacity-related or price-to-win methods. These methods reinforce poor practices and generally produce large overruns. The fourth reason is not having enough resource for estimation. Computing managers and professionals are neither particularly satisfied nor dissatisfied with the software cost estimation, so they do not have a great desire to improve software cost estimation methods and processes. Magazinius et al. conducted a study on six large mature companies that all have complex product development processes where estimates are used as a regular part of the organization practice. All interviewees take an active part in the estimation process and represent the project organization, the line organization, product planning, higher management, or the developers. Based on the result of the survey, they defined categories by considering the two dimensions of intentionality (intentional or unintentional). A brief description of these categories is given below.
1) Intentional increases of estimates
An increase of estimates was reported by 11 different respondents and all were described as intentional. The most commonly reported reason for estimate increase was hiding other activities in the estimated ones. Project managers were reported to add activities such as the development of extra functionality or entire smaller projects in the estimates for larger projects. An increase of the estimate that was indicated to be more common was reported from project teams that experienced that the estimated costs for the development of the functionality assigned to them were often too low. By increasing estimates, they tried to ‘buy’ extra development time, to be used for example for testing or maintenance activities.
The interviewees reported that the overall knowledge of software development processes in their firms was low outside the software development department. Respondents described how this limited knowledge could make it easier to mask intentional increases, i.e. to distort.
In a similar way, respondents reported that the limited knowledge about basic software development principles among project planners had been exploited by developers to increase estimates to exclude functionality considered superfluous. If developers think some functionality or product features are less urgent, less important or even unnecessary they would sometimes increase the estimates for these parts. The increased costs would make it more likely that project planners decide to remove the functionality or postpone it to later/future releases.
Another type of intentional increase reported in the interviews aimed at avoiding overspending. Some project managers were said to want to appear competent by ensuring that the estimate they provide was the worst case estimate and thereby would most likely be higher than the actual project outcome. Other project managers were said to intentionally add padding to their estimates in order to avoid re-estimation due to future changes in requirements.
Another type of intentional increase of estimates was reported as line managers increasing estimates to increase the number of employees in their group or to save jobs of their staff. Yet another reason underlying this type of intentional increase was described by describing how the role of line managers is a difficult one where they sometimes have to lay off people in times of economic downturn or if the company loses and important customer. This can also lead to job securing behavior to avoid later frustration or difficult tasks.
2) Intentional decreases of estimates
Ten (10) of the interviewees reported that they have seen some evidence of intentional decrease of estimates. The overlap with respondents reporting increase in estimates was high but not complete.
The most frequently reported reason for this type of estimate change, and also one perceived as frequently occurring, was management pressure to lower the estimates. One of the interviewees stated: ‘(…) removal of functionality is always expected to lead to reduced costs. Certain functionality is a side-product of others and removing it does not decrease the costs’. Thus, it is difficult to decrease costs by removing functionality since there are so many interdependencies, yet, the request to decrease the total cost remains. As a consequence, the overall estimate must be decreased without being able, more or less, to actually remove any functionality. This in turn leads to an intentional decrease in estimate without obvious or rational reasons.
Another common reason for the decrease in estimates was reported to be the intentional ‘selling’ of ideas for a project or for features/functionality (selling ideas). This ‘selling’ of ideas to higher management and product planners is a deliberate activity. Some of the interviewees said that when the money runs out in the projects, more money is almost always possible to add and most often gets added. So, project managers know that an already started project will be finished anyhow, which opens up for this type of behavior since it is more important to get the project started than what the actual costs later will be.
3) Unintentional distortions
Risks and tasks are sometimes overlooked and not included in the estimates (missed risks/tasks), leading to unintentional decreases in estimates. For instance, according to one of the interviewees, important risks can be rated as unlikely when the risk information is discussed and documented by project teams. When the risks as perceived by different development teams are summarized and forwarded to the higher management, it is the highest ranked risks that are summarized for each team excluding risks that are marked as less severe or less likely to occur. As described by the respondent, there was no intentional tampering with this process or actions taken that directly lead to risks being overlooked or information missed. However, it still leads to decreased estimates.
Two project managers recognized optimism as a bias that affects their subordinates which makes them produce too low estimates. Interviewees only reported this bias as something that occurred in other people s behavior. They believed that optimism could not be avoided as it is unconscious and most people are not aware of how it affects their estimates. This is a type of unintentional distortion that clearly is more related to an individual stakeholder's general disposition or personality than to any intentional choices.
Misunderstanding of requirements and Inexperience are two factors that were reported to change estimates as well, neither of them was considered intentional and the direction of their effect varied
ROLE OF MACHINE LEARNING & ARTIFICIAL INTELLIGENCE IN SOFTWARE COST ESTIMATION
Generally, estimating project resources continues to be a critical step in project management, including software project development. The ability to predict the cost or power of a software project directly affects the management's decision to accept or reject any given project. For example, overestimation of software costs can result in resource loss and lower delivery time, while underestimation can lead to project understaffing, higher budget costs, and longer delivery time. This can lead to loss of contracts and thus significant financial losses. Traditionally, researchers estimate software effort using off-the-shelf algorithmic models such as COCOMO [where the effort is expressed as a function of expected size; or have developed local models using statistical techniques such as stepwise regression. Since algorithmic methods are often not able to adequately model a complex set of relationships that appear in many areas of software development, the results remain inaccurate. Recently, attention has shifted to various forms of machine learning (ML) and Artificial intelligence (AI) methods of predicting software development efforts.
ML techniques incorporate some of the features of the human brain that allow to solve complex problems faster than even the fastest computers. Many electronic learning methods make no or minimal assumptions about the form of function under study (e.g., development effort), but like other methods they rely on historical data. Basically, in addition to a well-known training set, the learning algorithm creates “rules” that match the data, and which hopefully reasonably fit previously unseen data as well. Consequently, AI incorporates powerful automation techniques to measure costs with high accuracy based on collected project data. AI techniques are aspects of human knowledge and computational adaptively to become more vital in system modeling than classical mathematical modeling. Depending on the AI, an intelligent system can be developed to produce results and actions that are effective depending on the observed input and the results of the system. There are various ML and AI based methods proposed by researchers for predicting the cost of a software project e.g. Case-based consultation (CBR). CBR is a sustained learning and incremental approach that solves problems by searching the most similar past case and reusing it for the new problem situation [14]. Thus, CBR mimics problem-solving capability of human. The basic processes of CBR are retrieval, reuse, review, and retrieve. Retrieving process to resolve a new case by retrieving previous cases. Finally, the retaining process is to update the stored past cases with such a new case by incorporating the new case into the existing case-base.
As the software cost estimation process changes rapidly which may include technological advancement, team skills and experience, and available programming tools and languages, it offers superiority in ML and AI strategies than other methods that adhere to mathematical work. Therefore, using ML and AI-based methods can be an ideal way to build software cost estimates due to the ability to learn from historical data and to adapt the wide variations that join the development of a software project.
REFERENCES
M.K. Ewusi, Z.H. Przasnyski, On information system project abandonment: an exploratory study of organizational practices, MIS Quarterly 15 (1) (1991) 67–86.
[A.L. Lederer, J. Prasad, Causes of inaccurate software development cost estimates, Journal of Systems and Software 31 (1995) 125–134.
Yang, D., Wang, Q., Li, M., Yang, Y., Ye, K., & Du, J. (2008, October). A survey on software cost estimation in the chinese software industry. In Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement (pp. 253-262).
Lederer, A.L. and Prasad, J. 1992. Nine Management Guidelines for Better Cost Estimating. Communications of the ACM. 35, 2 (Feb, 1992), 51 - 59.
Magazinius, A., B?rjesson, S., & Feldt, R. (2012). Investigating intentional distortions in software cost estimation – An exploratory study. Journal of Systems and Software, 85(8), 1770-1781.
M. Jorgensen and M. Shepperd, “A systematic review of software development cost estimation studies,” IEEE Transactions on Software Engineering, vol. 33, no. 1, pp. 33–53, 2007.
F. J. Heemstra, “Software cost estimation,” Information and Software Technology, vol. 34, no. 10, pp. 627–639, 1992.
M. Azzeh, A. B. Nassif, and S. Banitaan, “Comparative analysis of soft computing techniques for predicting software effort based use case points,” IET Software, vol. 12, no. 1, pp. 19–29, 2018.
Boehm, B.W. (1981). Software Engineering Economics. New York: Prentice Hall.
Kok, P., B.A. Kitchenham, and J. Kirakowski. (1990) The MERMAID approach to software cost estimation, in Proc. Esprit Technical Week.
Schank, R. (1982). Dynamic Memory: A theory of reminding and learning in computers and people. Cambridge University Press.
Siddique, N., and Adeli, H. (2013). Computational intelligence: synergies of fuzzy logic, neural networks and evolutionary computing. John Wiley & Sons, Chichester, West Sussex.
Bishop, C. M. 2006. Pattern recognition and machine learning, 1–58.New York: Springer
Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications, 7(1), 39-59.
Kolodner, J. L. (1992). An introduction to case-based reasoning. Artificial intelligence review, 6(1), 3-34.
Al Asheeri, M. M., & Hammad, M. (2019, September). Machine learning models for software cost estimation. In 2019 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT) (pp. 1-6). IEEE.