Top AI/ML Papers of the Week [31/07 - 06/08]
Source: https://arxiv.org/abs/2307.15189

Top AI/ML Papers of the Week [31/07 - 06/08]

During last week [31-07 to 06-08], I picked out 8 scientific articles that I found noteworthy to share with you. Each one will be showcased with a short synopsis and a link to further investigate the subject. At the end of the article, a reflection on how these advances may impact your projects or companies in the future will be presented!


[1] Digital Twin Brain: a bridge between biological intelligence and Artificial Intelligence

Recent advances in neuroscience and artificial intelligence have created new opportunities for understanding brain complexity and emulation. Cutting-edge research has shown the relationship between brain structure and function, emphasized by the success of artificial neural networks. The Digital Twin Brain (DTB) is introduced as a platform to bridge biological and artificial intelligence, consisting of three core elements: brain structure, bottom-layer models, and applications. Brain atlases preserve the brain's network within the DTB. Open questions invite interdisciplinary collaboration, and the DTB holds the potential for insights into intelligence and neurological disorders, advancing both biological and artificial intelligence and mental healthcare. [Link ]


[2] The Hydra Effect: Emergent Self-repair in Language Model Computations

The internal structure of language model computations is investigated using causal analysis, revealing two motifs: (1) adaptive computation where ablations of one attention layer cause another to compensate (termed the Hydra effect), and (2) a counterbalancing function in late MLP layers to downregulate the maximum-likelihood token. Ablation studies show that language model layers are typically loosely coupled, affecting only a few downstream layers. These effects occur even in models trained without dropout. The effects are analyzed in the context of factual recall, and the implications for circuit-level attribution in language models are considered. [Link ]


[3] Experimental Results regarding multiple Machine Learning via Quaternions

This paper explores an experimental study on using quaternions in various ML algorithms. Quaternions, representing rotations in three-dimensional space, can symbolize complex data transformations. The study involves representing and classifying rotation data using randomly generated quaternion data, converting them to rotation matrices, and using them as input features. Implementing quaternions with multiple machine learning algorithms has demonstrated higher accuracy and significantly improved performance. The research establishes an empirical foundation for utilizing quaternions in ML tasks. [Link ]


[4] ToolLLM: Facilitating LLMs to Master 16000+ Real-world APIs

Despite advancements in open-source LLMs like LLaMA and Vicuna, they are still limited in higher-level tasks like using external tools (APIs), unlike state-of-the-art closed-source LLMs such as ChatGPT. To address this, the paper introduces ToolLLM, a framework for tool-use capabilities, including ToolBench, an instruction-tuning dataset created using ChatGPT, comprising 16,464 real-world APIs. A novel depth-first search-based decision tree (DFSDT) enhances the planning and reasoning of LLMs. An automatic evaluator, ToolEval, is developed, and after fine-tuning LLaMA, ToolLLaMA shows a remarkable ability to execute instructions and generalize to unseen APIs. A neural API retriever is also devised to recommend appropriate APIs, removing the need for manual selection. [Link ]


[5] Discovering Adaptable Symbolic Algorithms from Scratch

Autonomous robots require control policies that adapt quickly to environmental changes. This paper introduces AutoRobotics-Zero (ARZ), a method inspired by AutoML-Zero that creates zero-shot adaptable policies from scratch. Unlike typical neural network adaption, ARZ can construct control algorithms with the full expressive power of a linear register machine. These policies can tune their parameters and adjust their inference algorithms in real-time. The method is demonstrated on a simulated quadruped robot, evolving control policies to prevent falls when limbs break, a task where two popular neural network baselines fail. A detailed analysis of a challenging non-stationary task, Cataclysmic Cartpole, confirms ARZ's robustness and ability to build simple, interpretable control policies. [Link ]


[6] Semi-Supervised Meta-Learning for Spatiotemporal Learning

The goal of applying meta-learning to self-supervised masked autoencoders for spatiotemporal learning is approached in three steps, aiming to understand the impact of meta-learning on existing state-of-the-art architectures. The study tests spatiotemporal learning in three ways: meta-learning architecture alone, representation-learning architecture alone, and a combination of both. The Memory Augmented Neural Network (MANN) is used for meta-learning. Specific experiments include applying a pre-trained MAE and fine-tuning it on a small-scale dataset for video reconstruction tasks; training an MAE encoder for action classification tasks; and applying a pre-trained MAE and fine-tuning with a MANN backbone for action classification. [Link ]


[7] Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding

This work focuses on reducing the generation latency of large language models (LLMs), primarily caused by the sequential decoding approach common in most state-of-the-art LLMs. Inspired by human thinking and writing processes, the paper introduces "Skeleton-of-Thought" (SoT). This method guides LLMs to create a skeleton of the answer first and then utilizes parallel API calls or batched decoding to fill in the content simultaneously at each skeleton point. SoT not only speeds up the process (up to 2.39x across 11 different LLMs) but also potentially enhances answer quality in terms of diversity and relevance. It represents an initial attempt at data-centric optimization and reveals the potential for making LLMs think more human-like. [Link ]


[8] Med-Flamingo: a Multimodal Medical Few-shot Learner

Medicine necessitates the synthesis of information across various modalities, but existing medical generative vision-language models (VLMs) often need to be fine-tuned on large datasets, which is a limitation in data-scarce medical applications. This paper proposes Med-Flamingo, a multimodal few-shot learner adapted to the medical domain, continuing the pre-training on medical image-text data. Med-Flamingo enables few-shot generative medical visual question answering (VQA), evaluated on several datasets including challenging USMLE-style problems. The first human evaluation involving physicians is conducted through an interactive app, showing that Med-Flamingo improves performance in generative medical VQA by up to 20% in clinician's rating and enables multimodal medical few-shot adaptations like rationale generation. [Link ]


How might these advances impact the future?

The advances in neuroscience and artificial intelligence toward understanding the complexity of the brain and emulating it through computational systems, as seen in the Digital Twin Brain (DTB) concept, could revolutionize both our understanding of biological and artificial intelligence. This platform could further propel artificial general intelligence development and mental healthcare.

The study of the internal structure of language models, such as the Hydra effect and counterbalancing function of MLP layers, uncovers intricate relationships within language models. These findings could potentially shape our understanding of how models work, thus leading to more robust and efficient models.

The experimental study on the application of quaternions in machine learning algorithms opens up new pathways for representing and classifying rotation data. This mathematical advancement could significantly enhance the accuracy and performance of prediction tasks, broadening applications in various fields.

The introduction of ToolLLM, a general tool-use framework for open-source large language models, offers a strategic method to facilitate tool-use capabilities within these models. This innovation may streamline the implementation of external tools (APIs) and expand the practical applications of language models.

The proposal of AutoRobotics-Zero (ARZ), a method for discovering zero-shot adaptable policies for autonomous robots, showcases the potential for creating robust control policies that adapt to sudden environmental changes. This could lead to more resilient robotic systems and innovative non-stationary control tasks.

The development of the "Skeleton-of-Thought" (SoT) method to decrease the end-to-end generation latency of large language models offers a significant improvement in speed and potentially answer quality. This efficiency-oriented approach could lead to a more human-like thinking process in LLMs, enhancing various applications.


In conclusion, these advancements set the stage for:

  • Revolutionizing our understanding of biological and AI;
  • Shaping our understanding of the intricacies of language models;
  • Enhancing accuracy and performance in prediction tasks through quaternions;
  • Expanding the practical applications of open-source LLMs;
  • Creating resilient control policies for autonomous robots;
  • Advancing medical visual question answering and few-shot adaptations;
  • Improving speed and quality in LLMs through efficiency-oriented approaches.

By leveraging these advancements, organizations can stay at the cutting edge of technological innovation and continue to push boundaries in AI research and development.

If you found value in these insights and reflections, please don't forget to share and interact. Your participation not only helps spread the information but also contributes to a more engaged and informed community.??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了