Reusable, Efficient AI - AI Meets Engineering or Simply call it Engineered!!
Google Pathways:- https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/

Reusable, Efficient AI - AI Meets Engineering or Simply call it Engineered!!

Engineering is the application of science and maths to solve problems. While scientists and inventors come up with innovations, it is engineers who apply these discoveries to the real world.

Ever wondered what is with AI/Deep learning why does not it look like Software Engineering? rather a Science? been integrated in Software Engineering processes for a decade, but for every new problem to solve, the models has to be recreated, retrained. There is no concept reusability neither breaking problem down into simpler manageable building blocks.

A model which is trained for forecasting Stock prices cannot be reused without retraining with large data set for forecasting Housing prices. But forecasting a Price follows same economic models, which an Human being would be able re-apply from Stock prices. In fact in real world if housing prices goes down Stock brokers will immediately bring down the housing stocks.

Now that seems to have Caught attention at Google and elsewhere, google is developing AI framework - Pathways, that will solve following three problem.

  • Today's AI models are typically trained to do only one thing. Pathways will enable us to train a single model to do thousands or millions of things.
  • Today's models mostly focus on one sense. Pathways will enable multiple senses. For Instance, Pathways could enable multimodal models that encompass vision, auditory, and language understanding simultaneously in a single model.
  • Today's models are dense and inefficient. Pathways will make them sparse and efficient, given an input model can route the and activate only few paths thus than current state of Art which activates entire model. For example for vision a specific paths are activated, but Vision and Auditory share some Path.

Now behind the scenes - the techniques that are being employed are 'Sparse activation' and 'meta- learning' and 'Few Shot learning'. One of the many implementation of meta-learning uses a meta-learner based on LSTM.

The Idea of the behind meta-learning is to create a learner(the model) that can be reused and have parameters already initialised when a model for new class (never seen before by a model) has to be created, rather than with traditional approach where Parameters for new model learning are initialised randomly for every new problem. So for instance if you already have a meta-learner that has capability to output parameters weights for a model that learnt to identify certain class of images, those parameters can be input to a learner (new model), and the Learner with very few images of new class can learn and adjust parameters of meta-learner and Learner itself.

Meta-Learner for every new class of training will take loss function of learner as input and previous time step output meta-learner as state, and create new state and new output parameters. The above process would continue for each new class of image identification, and eventually build a meta-learner trained to Learn any Classes of images.

No alt text provided for this image
No alt text provided for this image

Also above Technique would require very few labelled data set to train new classes as meta-learner has already initialised the parameters and the learning does not start with random Parameters.

More info here https://www.youtube.com/watch?v=Kk1I0i6SGzY&list=LL&index=1

Sparse Activation allows only few network paths to be activated based on input data, for example Class of image of trees may be require different activation than that of Flowers but both can still learn from same meta-learner with few shots. The idea is to be efficient in terms of the amount of compute that is being utilised, by not activating the entire Network.

No alt text provided for this image

More - info on Switch transformer and sparse activation for a trillion parameter model https://www.dl.reviews/2021/02/10/switch-transformers/

Just taking an example of meta-learner for images, use of CNN and LSTM to integrate for image classification or video analysis to remember and update the state of current output with each time-step or each frame is tried and tested multiple times. The same technique can be employed for meta-learning as we described above.

From an Engineering Management perspective, going by google blog on Pathways why did it take two decades for Technology Companies to realise re-usability and efficiency in AI, both of which are Engineering traits.

Did Somewhere as industry we always thought AI as research (Science) and never focused on Engineering it ??? What may be cost implication of not able to realise to Engineer a solution ? How many duplicate models did industry create and how much money was spent on creating labelled data-sets? and how much compute we lost in lieu of sparse activation.

要查看或添加评论,请登录

Dhruvin Desai的更多文章

社区洞察

其他会员也浏览了