登录查看更多内容

Business Intelligence as a question of Supervised Learning for the Prediction of Company Dynamics.

Hannington W.

Mathematician, Bioinformatics, Financial and Scientific Modeler. Software and Systems Architect.

发布日期: 2022年12月20日

Due to the increasing availability of granular, yet high-dimensional, firm level data, machine learning (ML) algorithms have been successfully applied to address multiple research questions related to firm dynamics. Supervised learning (SL), the branch of ML dealing with the prediction of labelled outcomes, has been used to better predict firms’ performance. A series of SL approaches can be used for prediction tasks, relevant at different stages of the company life cycle. The stages can include

startup and innovation,
growth and performance of companies, and
firms’ exit from the market.

First, consider SL implementations to predict successful startups and R&D projects, and further look at how SL tools can be used to analyze company growth and performance. Secondly, review SL applications to better forecast financial distress and company failure. Lastly, consider the employing of SL methods in the light of targeted policies, result interpretability, and causality.

In particular, SL methods improve over standard econometric tools in predicting firm success at an early stage, superior performance, and failure. High-dimensional, publicly available data sets have contributed in recent years to the applicability of SL methods in predicting early success on the firm level and, even more granular, success at the level of single products and projects. While the dimension and content of data sets varies across applications, Support Vector Machines - SVM and Random Forests - RF algorithms are oftentimes found to maximize predictive accuracy. Even though the application of SL to predict superior firm performance in terms of returns and sales growth is still in its infancy, there is preliminary evidence, sometimes from the empirical data and at times unstructured data, that RF can outperform traditional regression-based models while preserving interpretability. Moreover, shrinkage methods, such as Lasso or stability selection, can help in identifying the most important drivers of firm success. Coming to SL applications in the field of bankruptcy and distress prediction, decision-tree-based algorithms and deep learning methodologies dominate the landscape, with the former widely used in economics due to their higher interpretability, and the latter more frequent in computer science where usually interpretability is deemphasized in favor of higher predictive performance.

In general, the predictive ability of SL algorithms can play a fundamental role in boosting targeted policies at every stage of the lifespan of a firm—i.e.,

(1) identifying projects and companies with a high success propensity can aid the allocation of investment resources;

(2) potential high growth companies can be directly targeted with supportive measures;

(3) the higher ability to disentangle valuable and non-valuable firms can act as a screening device for potential lenders.

As granular data on the firm level becomes increasingly available, it will open many doors for future data employment directions focusing on SL applications for prediction tasks. To elucidate more on the SL algorithms employed in the literature of firm dynamics, namely, decision trees, random forests, support vector machines, and artificial neural networks, should further be inclined on in order to make sense out of the vast amounts of data to the firms' disposal.

Besides reaching a high-predictive power, it is important, especially for policy-makers, that SL methods deliver retractable and interpretable results. For instance, the US banking regulator has introduced the obligation for lenders to inform borrowers about the underlying factors that influenced their decision to not provide access to credit. Hence, different SL techniques should be evaluated, and firms should opt for the most interpretable method when the predictive performance of competing algorithms is not too different. This is central, as the understanding of which are the most important predictors, or which is the marginal effect of a predictor on the output (e.g., via partial dependency plots), can provide useful insights for scholars and policy-makers. Indeed, data scientists in the firm can enhance models’ interpretability using a set of ready-to-use models and tools that are designed to provide useful insights on the SL black box. These tools can be grouped into three different categories: tools and models for

(1) complexity and dimensionality reduction (i.e., variables selection and regularization via Lasso, ridge, or elastic net regressions, ;

Sanjay Kumar MBA,MS,PhD 10 个月前

Technical Deep-Dive: Data-Centric…

LandingAI 9 个月前

The Power of Machine Learning Algorithms

Fusion Informatics Limited 10 个月前

(2) model-agnostic variables’ importance techniques (i.e., permutation feature importance based on how much the accuracy decreases when the variable is excluded, Shapley values, SHAP [SHapley Additive exPlanations], decrease in Gini impurity when a variable is chosen to split a node in tree-based methodologies); and

(3) model-agnostic marginal effects estimation methodologies (average marginal effects, partial dependency plots, individual conditional expectations, accumulated local effects).

Higher standards of replicability should be reached by releasing details about the choice of the model hyperparameters, the codes, and software used for the analyses as well as by releasing the training/testing data (to the extent that this is possible), anonymizing them in the case that the data are proprietary for instance data sources collected by banks, financial institutions, and business analytics firms. This not only applies to the proprietary data but also data in jurisdictions that are extremely restrictive regarding the privacy aspect as in the case of GDPR - Europe.

Here, I would want to stress once more that SL learning per se is not informative about the causal relationships between the predictors and the outcome; therefore data engineers who wish to draw causal inference should carefully check the standard identification assumptions and inspect whether or not they hold in the scenario at hand. Besides not directly providing causal estimates, most of the reviewed SL applications focus on pointwise predictions where inference is de-emphasized.

Providing a measure of uncertainty about the predictions, e.g., via confidence intervals, and assessing how sensitive predictions appear to unobserved points, are important directions to explore further.

Considering how SL algorithms can predict various firm dynamics on “intercompany data” that cover information across firms, so many aspects have to be put into play. Yet, nowadays companies themselves apply ML algorithms for various clustering and predictive tasks, which will presumably become more prominent for small and medium-sized companies (SMEs) in the upcoming years. This is due to the fact that

(1) SMEs start to construct proprietary data bases,

(2) develop the skills to perform in-house ML analysis on this data, and

(3) powerful methods are easily implemented using common statistical software.

Against this background, I would want to stress that applying SL algorithms and economic intuition regarding the business problem at hand should ideally complement each other. Economic intuition can aid the choice of the algorithm and the selection of relevant attributes, thus leading to better predictive performance. Furthermore, it requires a deep knowledge of the studied research question to properly interpret SL results and to direct their purpose so that intelligent machines are driven by expert human beings.

Business Intelligence as a question of Supervised Learning for the Prediction of Company Dynamics.

Hannington W.

Mathematician, Bioinformatics, Financial and Scientific Modeler. Software and Systems Architect.

领英推荐

Business Intelligence

925 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Integrating Real-time Responsiveness into Machine Learning: The Power of Online-Offline Feature Stores

Strategies for Improving Machine Learning Algorithms: Tips & Tricks

Machine Learning - The main impact areas where we can use it

5 Common Machine Learning Problems & How to Solve Them

A Practical Guide to XGBoost for Enterprise

Big Data Risk Analytics

The Role of AI and Machine Learning in Data Modernization

Data Tuesday: Leveraging Machine Learning for Predictive Analytics

Types Of Machine Learning Algorithms

End-to-End Workflow Model Development and Experimentation

领英推荐

Business Intelligence

925 位关注者

A focus on life as a turning point through an ADHD lense!

2023年7月9日

Revolutionizing Healthcare and Customer Analytics with AI: Exploring Opportunities for Funding a Promising Startup – wamalAI, using #ASML Case Study

2023年3月26日

Insider Threat Classification using Image-Based Feature Representation

2022年3月9日

Fake News Detection on Social Media

2021年4月16日

Living with ADHD in Africa

2020年7月28日

Exploratory data analysis

2020年7月28日

Use of artificial intelligence to analyze social networks on the web

2019年7月22日

Artificial Intelligence in MARKETING

2019年7月22日

Artificial Intelligence in Human resources and recruiting, and Job Search

2019年7月22日

Customer Review Analytics

2019年6月16日

社区洞察

其他会员也浏览了

Integrating Real-time Responsiveness into Machine Learning: The Power of Online-Offline Feature Stores

Strategies for Improving Machine Learning Algorithms: Tips & Tricks

Machine Learning - The main impact areas where we can use it

5 Common Machine Learning Problems & How to Solve Them

A Practical Guide to XGBoost for Enterprise

Big Data Risk Analytics

The Role of AI and Machine Learning in Data Modernization

Data Tuesday: Leveraging Machine Learning for Predictive Analytics

Types Of Machine Learning Algorithms

End-to-End Workflow Model Development and Experimentation