One Model To Learn Them All

"Abstract Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all."

Authors: Lukasz KaiserAidan N. GomezNoam ShazeerAshish VaswaniNiki ParmarLlion JonesJakob Uszkoreit

Read full paper at https://bit.ly/2tRKJ8z

Kirill Morozov

Software Engineer | Ecommerce | Delivering great scalable and performant cloud applications.

7 年

Looks nice, hope there is way to teach this kid/model over time, build up continuously running model.

回复
Mike Leaman

Enterprise Data Architect

7 年

This part sounds intriguing "especially since our model shows transfer learning from tasks with a large amount of available data to ones where the data is limited. " Did you find this across any particular types of data sets - e.g more homogeneous or heterogeneous?

回复

要查看或添加评论,请登录

Diego Marinho de Oliveira的更多文章

社区洞察

其他会员也浏览了