2020 For Course5 Artificial Intelligence Labs: The Year In Review
Tamal Chowdhury, Ph.D.
CTO | Computer Scientist & Mathematician | Artificial Intelligence R&D ? Autonomous Computing ? Product Engineering
2020 has been one of the most tumultuous years in modern history. The global economy has already contracted by over 4%, human lives and businesses have been severely disrupted, and, at the same time, digital adoption is gradually accelerating in most industries. A high degree of uncertainty still prevails globally, and even after the situation stabilizes, the new world order will be distinctively different from the pre-Covid one.
Amidst this global turbulence, the engineers, researchers and scientists at Course5 AI Labs worked extremely hard to ensure that planned product releases, client commitments, and other deliverables are not impacted. Our efforts were predominantly focused on AI research & development, and engineering the company's flagship products & platforms. This paper discusses the major technical areas that received most of our focus in 2020, and briefly shares our plans for 2021.
Major AI Research & Engineering Areas In 2020
Seven key focus areas were identified for the year 2020. While the first four were largely new areas of research and development, the others were a continuation of our 2019 efforts.
- Advanced Object Detection
- Human Action & Emotion Recognition
- Neural-Symbolic Reasoning
- Anomaly & Causality Discovery In Noisy Temporal Data
- Cloud-Native AI Development
- Efficiencies In AI Operationalization
- Model Interpretability & Explainability
Focus Area 1: Advanced Object Detection
One of our major focus areas in 2020 was to expand the capabilities of our existing object detection systems. Four important aspects of our R&D in this area are highlighted below.
- Small-object and 3D-object detection: While modern object detectors perform well in cases of regular-sized 2D-objects, they often perform poorly with smaller-sized objects. Similarly, 3D-objects do not follow any specific orientation, and this poses considerable challenges in detecting them. These limitations get compounded in video data due to the added complexity of temporal dependencies, and our research was primarily focused on addressing these challenges.
- Low-shot/Few-shot object detection: Most deep learning-based detectors require large corpora of labeled/annotated data for high performance, which is expensive, inefficient and time-consuming. Low-shot/few-shot detection techniques help to address this. Our work largely focused on semi-supervised and weakly-supervised approaches, and addressing class, scale and spatial imbalances.
- Anchor-free methods: Anchor-free detection techniques (e.g., keypoint-based or center-based) eliminate many of the limitations of anchor-based methods, such as the need for anchor-related hyperparameter setup. However, most anchor-free detectors today exhibit average to mediocre performance during production inference, particularly for large-scale workloads. Our research focused on addressing these problems.
- Deep contextual encoding: Most object detectors do not efficiently capture contexts in computer vision data (e.g., the relationship between different objects, spatial correlations of objects, or the object-text similarities in video frames.) Context-awareness is critical to building efficient AI systems, and our research focused on capturing and encoding deep contexts in video data.
Focus Area 2: Human Action & Emotion Recognition
Understanding human actions and facial expressions/emotions are emerging areas of AI research. The past few years have witnessed important innovations, such as two-stream & multi-stream networks, 3D-CNN architectures, and others. Some of the key aspects of our work in these areas are highlighted below:
- Skeleton-based action recognition through attention-based and graph-based architectures that capture both short-term and long-term temporal information in videos.
- Focus on early action recognition (i.e., recognizing actions before they are completed) in videos to extract maximum information from temporal data, and reinforce the predictive power of the models.
- Adaptive learning-based emotion detection to capture different emotional states as well as the emotional intensity of those states.
- Sophisticated Java-based and Hadoop-based backend systems that enable distributed and parallel processing of heavy-duty workloads, high concurrency, low latency, etc.
Focus Area 3: Neural-Symbolic Reasoning
This was the most complex area of research for us in 2020. While neuro-symbolic reasoning is often explored for interpretability purposes, our research stemmed from the need to build a cognitive solution for automating highly unstructured manual processes, which could not be effectively addressed through regular machine learning or deep learning techniques.
Despite recent advances in deep learning, building cognitive systems for large corpora of unstructured, multi-level hierarchical data is still a big challenge. This is especially true for data with negligible patterns, or where even abundant labeled data cannot capture all variations in patterns. Our research was primarily aimed at building complex human-like logic generation capabilities for one of our flagship products.
The goal was to create an end-to-end AI architecture where (i) knowledge is represented in a symbolic form, (ii) machine learning components are built to learn from that knowledge, and (iii) a reasoning system is built to generate (and apply) complex logic based on the learnings of the machine learning components. Our work is still in the initial phases, and early signs have been encouraging.
Focus Area 4: Anomaly and Causality Discovery In Noisy Temporal Data
Real-world, multivariate time series data of many domains are characterized by high degrees of noise, complex abnormal patterns, and unstable distributions. Under such circumstances, commonly-used anomaly and causality discovery techniques often become ineffective, particularly on account of two reasons:
- the absence of a consistent understanding of anomalies, especially as time progresses; coupled with the absence of adequate labeled data in many cases
- the difficulties in encoding both the long-term temporal dependencies within each time-series, and the complex inter-correlations between the different time-series
Many existing anomaly detection systems focus primarily on the identification of anomalies, and pay limited attention to diagnosing or explaining the root causes of the anomalies. This becomes a problem in many real-world applications where the first-order, second-order (..) causal factors that create the anomalies also need to be accurately understood. Furthermore, this causality discovery should account for confounding and instantaneous effects, and more importantly, the time delays between the root causes and the occurrences of their effects.
The above reasons necessitated the development of an unsupervised anomaly and causality discovery system that could be effectively applied to noisy domains. Additionally, this integrated system was designed to determine the severity or impact of each anomaly for better decision-making. Our work involved experimenting with multiple approaches, such as reconstruction-based methods, generative modeling, and convolutional-based architectures.
Focus Area 5: Cloud-Native AI Development
Our AI systems do not operate in isolation but as the core drivers of our enterprise products and solutions. This necessitates them to be seamlessly integrated into the products as regular software components. This implies that they need to be architected, designed, and developed in line with modern engineering practices, particularly as cloud-native systems. We established a two-prong approach to achieve the same.
i. Building Cloud-Native Everything
An important architectural decision that we took a few years back was to design and develop all our products and solutions as cloud-native to 'build for the future'. This involves the adoption of design patterns and engineering practices that enable high application scalability, extensibility & maintainability; allow seamless portability from one infrastructure ecosystem to another, as well as interoperability with other applications; ensure high performance with thousands of concurrent user traffic; and provide high availability and safety mechanisms against system failures and security vulnerabilities.
DevOps, distributed engineering, microservices, function-based development, and test-driven development are the standard industry strategies that we deploy.
Moreover, API design and lifecycle management receive significant focus, including critical requirements like API security, forward & backward compatibility, gateway design, etc. Some of the key aspects of our API strategy were/are as follows:
- While REST remains our de-facto standard, we also explored gRPC and GraphQL for certain specific requirements.
- We prioritize API security, particularly for those services that drive the core runtime capabilities, over the ease of API creation & configuration.
- Multi-purpose APIs are leveraged for traditional systems & components, while single-purpose APIs are the norm for complex systems.
ii. Migrating Older AI Systems to Cloud-Native Architectures
Our efforts were focused on re-designing, re-factoring and migrating the older AI systems to cloud-native architectures. Special attention was paid to detecting and remediating problems such as common smells, complex glue codes, configuration-related issues, pipeline jungles, and undeclared consumers. This also involved re-writing our existing caching mechanisms, optimizing circuit-breakers, reducing system resource consumption, replacing older libraries with newer ones, upgrading our asynchronous development strategies, and other tasks.
Focus Area 6: Efficiencies In AI Operationalization
Efficient operationalization is critical to the success of any product or solution. This is especially true for AI applications where deployment and production management are fraught with several challenges. We adopted a three-prong strategy to address this.
i. Integrated DataOps - ModelOps - DevOps
In 2020, we significantly invested in transitioning from our traditional DevOps structure to an integrated DataOps-ModelOps-DevOps one. This covers the entire spectrum of machine learning & software development, deployment, and production management, ranging from:
- efficient data pipeline creation to large-scale data orchestration,
- automated code analysis to CI & CD,
- ML metadata-stores to ML feature-stores,
- low-latency model serving to production model performance evaluation,
- schedule-based & on-demand model re-training/revamp to online machine learning.
The new integrated structure allows us to rapidly and efficiently prototype, build, test, deploy and maintain our AI products and solutions.
ii. Automated Machine Learning
Open-source and proprietary AutoML libraries and tools do not always provide enterprise-grade output, particularly for tasks that involve complex learning, deep feature engineering, or high explainability. The need for better inference, higher scalability, lower costs, and greater integration with our DevOps structure necessitated our AutoML efforts, particularly for Computer Vision and NLP workloads in our problem-domain. Two key aspects of our AutoML engineering were/are:
- Addressing the highly compute-intensive nature, and instability problems of traditional Neural Architecture Search (NAS) frameworks.
- Building end-to-end AutoML pipelines that directly integrate with our product development setup.
iii. Deep Neural Network Compression
Deep neural networks are computation-memory-power-intensive, and this often creates problems in deploying AI systems with multiple deep learning models in edge devices, or even in regular CPU environments. As a result, sophisticated compression strategies are needed to optimize these networks for greater production efficiency. As our AI products kept increasing in scope and scale, the significance of compression also kept increasing.
Our compression techniques are based on four approaches: Knowledge Distillation, Low-Rank Matrix Factorization, Network Pruning, and Quantization. We explored and exploited various state-of-the-art and emerging techniques pertaining to these approaches.
Focus Area 7: Model Interpretability & Explainability
Our strategies for model interpretability & explainability rest on four pillars.
Model Explainability: Linear proxy approaches (e.g., LIME or Locally Interpretable Machine Agnostic Explanations), Shapley additive explanations, and logic-based/rule-extracted explanations form most of our explainability methods. Our 2020 focus was to explore more sophisticated techniques, such as neural dissection and explainable neural networks (e.g., DeepLift.)
Interpretable or White-Box Modeling: We have generally leveraged decision trees for explanations, and explainable boosting machines to address this area. In 2020, we primarily focused on improving the implementation of these techniques.
Visual Interpretation: We have traditionally relied on accumulated local effects, partial dependence/residual plots, correlation network graphs, and conditional expectation visualizations. Our 2020 focus was to further improve the way we deploy these methods.
Sensitivity Modeling: Our focus in 2020 was to deploy adversarial-based techniques for our computer vision workloads.
Special Mention: Transformers
While this was not an explicit focus area, it receives a mention because the transformer architecture was the overwhelming theme of our NLP work in 2020. We studied, explored and exploited multiple types of transformers: GPT-2, BERT, ALBERT, XLNet, DistilBERT, ELECTRA, LongFormer, Reformer, RoBERTa, StructBERT, T5, and others. Moreover, we built the preliminary version of our own (proprietary) transformer for AI components where existing open-source transformers failed to provide the desired results.
What's Next?
As we continue our journey in 2021 (and beyond), our goal is to keep strengthening our capabilities and offerings in different technologies related to computer vision, speech recognition, natural language processing, and machine/deep learning. Some of our existing focus areas are expected to gain additional momentum, particularly the ones on human action and emotion detection, and neural-symbolic reasoning. New areas of AI research and development are planned as well.
We expect our Narrow AI Strategy to keep generating significant value for our customers. Complex AI transformations need domain-specific data, strong functional knowledge, and 'precisely-engineered solutions' for critical problems. As a result, focused AI systems that are specifically designed, developed and optimized to address domain-specific or complex problems will continue to be more efficient than generic industry solutions.
An important global trend that is emerging is the focus on shifting away from traditional AI development approaches that are either inefficient (e.g. building task-specific models) or constraint-laden (e.g., high reliance on labeled data.) We expect this trend to gain greater momentum this year, and to continue influencing our internal AI development strategies. This implies more innovations in areas like Active Learning, Multi-Task Learning, One/Few-Shot Learning, and in semi-supervised and self-supervised learning methods. As the open-source ecosystem continues to get stronger, the external innovations will keep augmenting our internal innovations.
Designing AI applications as truly complex, adaptive systems will be a critical aspect of our architectural evolution this year. As our products get enhanced with advanced features and more deep learning components, the emergent behaviors of the overall systems are expected to increase as well. Hence, state-of-the-art reinforcement learning systems will be designed and developed to address the increased emergent behaviors of our AI products. Similarly, cognitive architectures, particularly those pertaining to the symbolic paradigm, will witness greater adoption in our AI workloads. These architectures enable the modeling of core cognitive abilities such as attention mechanism, dynamic action selection, learning, memory, perception and reasoning.
Another important area that is expected to witness key innovations in 2021 is multi-modal AI development. Intelligence from multiple sources will be integrated to gain a greater understanding of the subjects of interest. For instance, our emotion detection systems will be enhanced by integrating video-based emotions, speech-based emotions, and emotions from conversational systems. Furthermore, we expect increased adoption of advanced optimization techniques, such as evolutionary/genetic algorithms.
Finally, 2020 has been a reasonably decent year for Course5 AI Labs in terms of innovations and new releases, and I am extremely proud of what the team has achieved. We executed more R&D experiments, built more algorithms, wrote (and debugged) more code, shipped more releases, and managed more large-scale production systems than the preceding years. We also faced setbacks and challenges, but the team had enough grit to keep progressing. 2021 is expected to be 1.5x to 2x of all of these things.
Senior Director Of Client Services at Course5i
4 年Inspiring for upcoming Data Scientists. Great stuff ??