Software Testing & Quality Assurance for Machine Learning Software

Today it is rare to find people who would disagree with the growth and promises of artificial intelligence (AI) in general and machine learning (ML) in particular. More people are talking about AI/ML; more students and professionals are skilling and reskilling themselves on the same; more executives and business leaders are evaluating the promises of AI/ML and each day every one of us are using more AI/ML than we did yesterday. Research in the field have been leapfrogging at a quite brisk pace pushing the boundaries of what-machines-can-do farther. Since 2000, I have been quite actively following AI/ML scientific literature – don’t remember when state-of-the-art numbers get rewritten at a faster rate!

Another remarkable change that has happened in recent times in AI/ML field is the growing importance of principles and practices of software engineering and large-scale systems. Traditionally, AI/ML systems were more developed in laboratories of academic institutions and a few large technology organizations. Today ML engineering and development are being done and used at varied scale in almost every organization – in a very analogous manner as general software development has happened over the years. However, ML software have distinct characteristics which makes it differ considerably from traditional enterprise software development:

  • Model is based on historical data while data itself keeps changing. Data quality is often taken for granted, without adequate testing and validation.
  • Performance summaries (accuracy, precision, F1-score) on specific test data sets does not provide assurance of generalization performance on real world data.
  • Development and operation teams work in silos, leading to quality assurance at different stages becoming a disconnected ‘no-man’s land’.
  • Integration efforts are typically under-estimated, often hand-waved as mere plugging of models to downstream processes resulting in bloated or failed integrations.
  • Ongoing evolution of tools and techniques in the ML ecosystem creating a perpetual phenomenon of multitude of moving parts.

Given these distinct characteristics, ML software development and deployment life cycle require considerable adaptation and extension over traditional software development life cycle (SDLC). In particular, we note opportunities towards rigorous quality assurance and comprehensive testing at various stages of ML software development. Some of these challenges and gaps in quality assurance and testing of AI applications have largely remained unaddressed contributing to a poor translation rate of ML applications from research to real world. ML applications have largely taken an ad-hoc approach to ensure software quality assurance either by falling back to traditional software testing approaches, which are inadequate for this domain, due to its inherent probabilistic and data dependent nature, or rely largely on non-rigorous self-defined QA methodologies. Hence it is critical for ML community to be abreast of the current state of the art techniques in testing of ML applications and this tutorial intends to address this need.

I, along with Sandya Mannarswamy and Saravanan Chidambaram, will be hosting a half-day tutorial session on “Software Testing & Quality Assurance for Machine Learning Applications” in the upcoming CoDS-COMAD conference in Hyderabad on 3rd-5th January, 2020. Beside covering a range of scientific literature from Software Engineering and AI/ML domain, we will share our knowledge and insights from our conversations with numerous industry experts working developing and deploying cutting-edge AI/ML. Please consider attending if you are a researcher/developer/practitioner/user of AI/ML software.

We would love to hear your reaction to this topic – importance/need/relevance. Any specific sub-topic/problem you feel we should focus on? We are currently preparing the content for the tutorial; so can include new content. Let us know on the comments below!

Julien Brault

Sign up for my free newsletter Global Fintech Insider

1 个月

Great read!

回复
Sweekar Revanna

Senior Data Engineer | Snowflake - Fivetran- Dbt | Palantir Foundry | Data Science | PySpark

5 年

Dear Sir, I am Master's student at University of Stuttgart. My master-thesis was on this research topic. It would be great if we can connect to that I can explain some aspects about the research I conducted. Thanks, Sweekar

回复
Shourya Roy

Senior Vice President - Data Science, ACM Distinguished Member, Member of ACM India Executive Council

5 年

Indeed Anirban - aspects such as fairness and interpretability are integral and important aspects of QA of ML models. These have become even more critical with the rise of black-box and deep neural models. In fact, the FAT (fairness, accountability and transparency) topics are perhaps one of the most important research directions in ML today. Would love to have conversations as per mutual availability. Thanks.

回复
Anirban Chatterjee

Applied Science Leader@WalmartLabs, India

5 年

Testing explainability - If a model can explain its prediction, it is important to keep it within the scope of testing exercise. A generalised strategy about how to test the explainability of a model would be much appreciated.?

回复
Anirban Chatterjee

Applied Science Leader@WalmartLabs, India

5 年

Another important aspect could be stress testing - more precisely testing the resilience of a model against adversarial attack. The typical setting is that an adversary would wish to flip the labels of the some of the training data to turn the model's decision in its favour. However the model, being aware of the behaviour of the adversary, should assume that part of the training data could be corrupted and offer its predictions accordingly. This is extremely important for many of the practical use-cases especially for online businesses. It is nice to have a formal strategy to stress test a model to check its resiliency against such scenarios. It is important to test a model's robustness against both local and global adversaries.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了