Explore the Future with Gen AI: Your Weekly Passport to Innovation!
Perpetual Block - a Partex Company
Leveraging DeepTech to democratise asset classes
We are back with another exciting edition, ready to dive into the fascinating world of GenAI, and the future-shaping tech trends.
Multimodal LLMs
A Multimodal Large Language Model (LLM) is a type of LLM that can process and generate text, as well as other modalities of data, such as images, audio, and video. Multimodal LLMs are trained on large datasets of text and other modalities, and they use this training to learn the relationships between different modalities.
Recently, OpenAI released GPT-4, the latest edition of its flagship Multimodal Large Language Model.
Multimodal LLMs use modules that encode not just text but other types of data into the same encoding space. Multimodality can solve some of the problems of the current generation of LLMs that were not possible with text-only Unimodal.
Some key differences between Unimodal LLMs and Multimodal LLMs
The spectrum of approaches to building multimodal LLMs ranges from having the LLM use existing tools or models, to leveraging domain-specific components with an adapter, to joint modelling of a multimodal model.
Examples of Multimodal LLMs
1. OpenAI: GPT-4
GPT-4 has shown human-level performance on numerous professional and academic benchmarks. It can handle prompts involving text and images, which allows users to specify any vision or language task.
2. Microsoft: Kosmos-1
Kosmos-1 natively supports language, perception-language, and vision activities, and it can handle perception-intensive and natural language tasks
3. Google: PaLM-E
PaLM-E trains the language model to incorporate raw sensor data from the robotic agent directly. This results in a highly effective robot learning model, a state-of-the-art general-purpose visual-language model.?
Multimodal Use Cases?
Life Sciences?
Deep learning algorithms are being used to develop AI-powered medical imaging analysis tools that can detect diseases and abnormalities with greater accuracy and speed than human experts. For example, the company Arterys is using deep learning to develop AI algorithms that can identify subtle signs of heart disease on MRI scans.
Finance
The financial services company Fidelity is using a multimodal LLM to develop a new investment research platform. The platform will use the LLM to analyze financial reports, news articles, and other types of data to identify promising investment opportunities.
Legal
The law firm Dentons is using a multimodal LLM called ROSS to review contracts and identify relevant clauses and provisions. ROSS can process and understand both text and images, which allows it to identify relevant information in contracts more quickly and accurately than human lawyers.
Challenges in building Multimodal LLMs
Conclusion
We are currently at the forefront of a new era in artificial intelligence, and despite its current limitations, multimodal models are poised to take over. The possibilities of multimodal LLMs are endless, and we have only begun to explore their true potential. Given their immense promise, it’s clear that multimodal LLMs will play a crucial role in the future of AI.
Sources:
Artificial General Intelligence
Artificial intelligence (AI) is a branch of computer science involved in building smart machines capable of performing human-like tasks.
Artificial Intelligence is categorized based on the capacity to mimic human characteristics. Using these characteristics for reference, all artificial intelligence systems fall into one of the following three types:
Let us delve deeper into Artificial General Intelligence!
Artificial general intelligence (AGI) is a hypothetical form of artificial intelligence in which a machine can learn and think like a human. For this to be possible, AGI would need self-awareness and consciousness, so it could solve problems, adapt to its surroundings and perform a broader range of tasks.?
“The best way to describe it is that AGI implements logic into the process rather than just applying an algorithm or coded process,” Amruth Laxman, co-founder of 4Voice.
Researchers from Microsoft and OpenAI claim that GPT-4 could be an early but incomplete example of AGI. As AGI has not yet been fully achieved, future examples of its application might include situations that require a high level of cognitive function, such as autonomous vehicle systems and advanced chatbots
Why do we need Artificial General Intelligence (AGI)?
AI struggles with reasoning that requires some real-world common sense and basic abstract thinking. LLMs are still statistical engines (neural networks) to predict the next most likely token based on a prompt and context.
Takeaways from the above examples:
Characteristics of Artificial General Intelligence
Defining artificial general intelligence is very hard. But there are several characteristics that AGI systems should have as you see in all humans.
The above characteristics of AGI systems can be built through the following:
Risks of Artificial General Intelligence
The Future of Artificial General Intelligence (AGI)
The year when we will be able to achieve AGI is a topic of much debate. However, the future of AGI remains an open-ended question. AI researcher Goertzel has explained that it’s difficult to objectively measure the progress toward AGI, as “there are many different routes to AGI, involving the integration of different sorts of subsystems” and there is no “thorough and systematic theory of AGI.” Rather, it’s a “patchwork of overlapping concepts, frameworks, and hypotheses” that are “often synergistic and sometimes mutually contradictory.”
Sources:
Numbers every LLM Developer should know
领英推荐
A large language model is a deep learning model that is trained on a massive dataset of text and code. It is able to perform a wide range of tasks, including generating text, translating languages, writing different kinds of creative content, and answering your questions in an informative way.
By understanding the following parameters that control their behaviour, we can better harness their power to accomplish a wide range of tasks.
Let us explore how the parameters above can be used to control the behaviour of an LLM:
1. Model Size and Number of Tokens:
The model size refers to the number of parameters in the LLM. A parameter is a variable that is learned by the LLM during training i.e. learning phase. A larger model size will typically result in better performance, but it will also require more computing resources to train and run. GPT-3 has 175 billion parameters.
The number of tokens refers to the size of the vocabulary that the LLM is trained on. A token is a unit of text, such as a word, a punctuation mark, or a number. A larger vocabulary allows the LLM to generate more creative and accurate text, but it also requires more computing resources to train and run.
Generally, larger models, i.e. models with more parameters, require larger datasets for training. Many large language models may be over parameterized and under trained, indicating the potential for achieving similar or even better performance with smaller models on larger datasets.?
The Chinchilla paper introduced the concept of computing optimal models. These models, such as the 70 billion parameter Chinchilla model, have been shown to outperform non-optimal models like GPT-3 on a wide range of downstream tasks.?
As per this law, optimal training data size is 20 times number of parameters. By following the recommendations from Chinchilla, researchers have achieved promising results with smaller, more optimised models.
2. Temperature:
The temperature is a parameter that controls the randomness of the LLM’s output. A higher temperature will result in more creative and imaginative text, while a lower temperature will result in more accurate and factual text.
For example, if you set the temperature to 1.0, the LLM will always generate the most likely next word. However, if you set the temperature to 2.0, the LLM will be more likely to generate less likely next words, which could result in more creative text.
3. Context window:
The context window is the number of words that the LLM considers when generating text. A larger context window will allow the LLM to generate more contextually relevant text, but it will also make the training process more computationally expensive. For example, if the context window is set to 2, the LLM will consider the two words before and after the current word when generating the next word.
The context window determines how far back in the text the model looks when generating responses. A longer context window enhances coherence in conversation, crucial for chatbots.
For example, when generating a story, a context window of 1024 tokens can ensure consistency and context preservation.?
4. Top-k and Top-p:
These techniques filter token selection. Top-k selects the top-k most likely tokens, ensuring high-quality output. Top-p, on the other hand, sets a cumulative probability threshold, retaining tokens with a total probability above it. Top-k is useful for avoiding nonsensical responses, while Top-p can ensure diversity.
For example, if you set Top-k to 10, the LLM will only consider the 10 most probable next words. This will result in more fluent text, but it will also reduce the diversity of the text. If you set Top-p to 0.9, the LLM will only generate words that have a probability of at least 0.9. This will result in more diverse text, but it could also result in less fluent text.
Here are some examples of how we can control the behaviour of an LLM through parameters:
In conclusion, large language model parameters are essential for shaping the capabilities and behaviour of LLMs, making them powerful tools for a wide range of natural language processing tasks.
Sources:
Exploring Legal Innovations: Case Law and Harvard's Casebert Project
In the complex world of legal research, finding the right information efficiently is vital. Harvard Law School's Caselaw Access Project is changing the game, making legal analysis more accessible and accurate than ever before.
Case law, also known as common law, is the body of law that is the written decisions issued by judges and other legal authorities. Traditionally, legal experts spend endless hours reading and analysing case laws manually, a slow and error-prone process. The overwhelming volume of information often leads to important details being missed. Complex legal nuances often get lost in the process, making it a challenge to extract valuable insights.
Caselaw Access Project and LegalBERT
The Caselaw Access Project offers free, public access to over 6.5 million decisions published by state and federal courts throughout U.S. history. by digitising over 40 million pages that represent 360 years of U.S. legal history. The Caselaw Access Project API (CAPAPI) and bulk data service offer researchers a comprehensive resource. With the help of this milestone project,? researchers can build AI and ML-empowered tools that are tailored specifically for legal texts.?
For example, LegalBERT is a language model fine-tuned on legal texts. Powered by the capabilities of a Large Language Model (LLM), LegalBert can automate complex legal tasks such as classifying legal documents, answering intricate legal questions,? and comprehending the intricacies of legal language.
Benefits
These AI-powered? tools have a number of potential benefits for caselaw research. For example
Challenges
There are also a number of challenges associated with the use of AI and ML for caselaw research. For example:
Conclusions
Other law schools are also developing initiatives in the field of AI and ML. The Stanford Law School Center and The University of Chicago Law School for Legal Informatics has developed a number of AI and ML tools for legal research, including the Legal Robot and the Legal AI Platform.
In addition to the benefits and challenges discussed above, it is important to consider the ethical implications of using AI and ML for caselaw research. We have to ensure that AI and ML systems are used in a way that is fair and impartial. It is important for legal system to be aware of these developments and to be prepared to adapt to the changing landscape of legal research.
Sources:
Recent Business News
Visa is investing $100 million in a generative AI initiative to support companies developing AI technologies for commerce and payments. This investment will be made through Visa Ventures, the company's corporate investment arm. Visa sees this initiative as an extension of its longstanding leadership in AI and aims to drive innovation, create value for partners and clients, and enable global commerce.
Adidas and Traeger Grills are utilizing generative AI capabilities through AWS' Amazon Bedrock service. Adidas is using the platform to assist engineers in finding information and solutions, while Traeger Grills is leveraging its data visualization capabilities. The technology allows both companies to create visuals, perform calculations, and manage inventory through natural language descriptions. Gartner predicts that conversational user interfaces powered by generative AI will handle 60% of B2B seller work within five years.
Microsoft's partnership with Mercy will utilize its Azure OpenAI Service to provide resources powered by generative AI for communication. This collaboration aims to improve patient care and experiences through the use of advanced AI technologies. Mayo Clinic and Duke Health are also part of Microsoft's efforts to explore the potential of generative AI in healthcare.
The Zoom AI Companion incorporates generative AI capabilities to provide automated meeting summaries, draft email suggestions, and generative AI summaries of chat threads for healthcare customers. This integration of generative AI technology enhances the productivity and efficiency of communication within the Zoom platform.
UiPath and Apprio have partnered to offer automated revenue cycle management services to healthcare organizations. This collaboration simplifies automation practices and provides an immediate return on investment. With the use of AI, Apprio has deployed new automation quickly and cost-effectively. This partnership addresses the growing demand for AI-driven solutions in healthcare to improve treatment and payment offerings.
If you are looking for Generative AI Solutions, check out our offerings at www.perpetualblock.io
Explore a treasure trove of knowledge in our library of articles and newsletters. Dive into the world of cutting-edge insights and stay ahead of the curve. Discover more today! https://www.dhirubhai.net/newsletters/generativeai-newsletter-7108431030507745280/