Top 5 Open Medical AI highlights

Top 5 Open Medical AI highlights

May2024

Summary:

  • Microsoft Unveils Open-Access Pathology Model Microsoft released an open-foundation model for whole-slide pathology, confirming the future of medical AI is open, despite data and license restrictions.
  • OpenBioLLM Outperforms Larger Language Models The OpenBioLLM 70B and 8B models demonstrated superior performance on clinical tasks compared to GPT-3.5 and Meditron-70B.
  • Sum Small: Powerful SOAP Summary Generator A fine-tuned language model, Sum Small outperforms GPT-4 in generating SOAP summaries from medical dialogues.
  • Researchers Tout Benefits of Open-Source Generative AI A new paper argues the advantages of open-sourcing generative AI, like advancing research and accessibility, outweigh the risks.
  • Measuring the Openness Spectrum of AI Models A working paper presents a framework for ranking the openness of 11 AI foundation models, challenging the open vs. closed divide.


1. Open Pathology AI

Microsoft has released an open-foundation model for whole-slide pathology. The training data hasn’t been opened and the license model used way too restricted, but this is a big leap forward and confirms one of the hypotheses I made a few years ago; the future of medical AI is open.

Strengths:

  • Open-weight pathology foundation model, accessible to research community
  • Pretrained on large, diverse real-world Prov-Path dataset, much > than TCGA
  • Novel GigaPath architecture effectively captures local & global slide patterns
  • State-of-the-art performance on 26 digital pathology tasks
  • Strong zero-shot vision-language capabilities

Weaknesses:

  • Lower performance on mutation prediction than cancer subtyping
  • Suboptimal pretraining process due to memory constraints
  • Vision-language capabilities still limited for clinical assistant use
  • Full Prov-Path dataset not publicly available
  • The license grants rights for non-commercial only

Comment:

When Meta released their first LLaMa model, it was released under a non-commercial, for research only license. Meta's licensing decisions have sparked debates within the open-source community, with some arguing that Llama's licenses do not fully align with open-source principles due to the commercial restrictions and lack of transparency around training data and code. GigaPath also isn’t open source, they restricted the license and did not share the training data, still, by publishing this model, it will grow the community and inspire others to take openness one step further. One interesting sentence in the license that strictly prohibits the use of any personal data included in the Materials for any purpose other than the authorized research, requires maintaining strict confidentiality of such data, and mandates its immediate destruction upon completion of the research to prevent re-identification. Although no personal data hasn’t been shared, it seems they want to protect them from data leakage or users crafting various techniques to reveal information about its training data. From the Data Repo, it seems the data donators came from Karolinska and Raboud, both European well knows university medical centres.

Model: https://huggingface.co/prov-gigapath/prov-gigapath

Code: https://github.com/prov-gigapath/prov-gigapath

Paper: https://www.nature.com/articles/s41586-024-07441-w

License: https://github.com/prov-gigapath/prov-gigapath/blob/main/LICENSE

2. OpenBioLLM-70B leads on

OpenBioLLM-70B delivers SOTA performance, while the OpenBioLLM-8B model even surpasses GPT-3.5 and Meditron-70B! The models underwent a rigorous two-phase fine-tuning process using the LLama-3 70B & 8B models as the base and leveraging Direct Preference Optimization (DPO) for optimal performance.



You can download the models directly from Huggingface today.

- 70B : https://huggingface.co/aaditya/OpenBioLLM-Llama3-70B?- 8B : https://huggingface.co/aaditya/OpenBioLLM-Llama3-8B


Summarize Clinical Notes :

OpenBioLLM can efficiently analyze and summarize complex clinical notes, EHR data, and discharge summaries, extracting key information and generating concise, structured summaries

Use Cases: De-Identification, Biomarkers Extraction, Medical Classification, Clinical Entity Recognition, Answer Medical Questions, Summarize Clinical Notes

3. Omi Phi-3-mini-4k-instruct superior to GPT-4

Sum Small is a small language models (SLMs) with groundbreaking performance at low cost and low latency, specifically designed to generate SOAP summaries from medical dialogues. It is a fine-tuned version of the Microsoft/Phi-3-mini-4k-instruct using the Omi Health/medical-dialogue-to-soap-summary dataset. This model demonstrates superior performance compared to larger models like GPT-4.

This model is intended for research and development in AI-powered medical documentation. It is not ready for direct clinical use without further validation and should be integrated with additional safety guardrails before deployment in a medical setting. The model was trained on the Omi Health's synthetic medical-dialogue-to-soap-summary dataset, which consists of 10,000 synthetically generated dialogues and corresponding SOAP summaries.


The Sum Small model is released under the MIT License, which permits broad use with fewer restrictions, making it accessible for both commercial and non-commercial use.

https://huggingface.co/omi-health/sum-small

4. Risks and Opportunities of Open-Source Generative AI

An excellent paper from a Champions Team league of Phd Researchers (Oxford, Berkeley, University of Luxembourg,..) argues that the benefits of open-sourcing generative AI outweigh the marginal increase in risks compared to closed-source models. They want that over-regulation could be catastrophic to open-source Gen AI.

Benefits of Open-Source Generative AI:

  • Advances research by enabling transparency, reproducibility, and fostering new innovations.
  • Can be more affordable and accessible, especially for organizations and individuals with limited resources.
  • Provides more flexibility and customizability to meet diverse needs and contexts.
  • Empowers developers and fosters innovation by giving them more control and autonomy. - Improves public trustworthiness through transparency.
  • Can help reduce copyright disputes by providing more transparency around training data.
  • Can drive sustainability in generative AI development by enabling sharing of resources.

Risks of Open-Source Generative AI:

  • Open models can also be used to generate unsafe or harmful content, similar to closed models.
  • Open models cannot be easily rolled back or forced to update, which is a safety concern.
  • Potential risks around social manipulation, mental health impacts, and eroded autonomy, which may be exacerbated under closed-source AI.

In the long-term, the paper suggests that open-sourcing AGI (if achieved) could help increase the likelihood of technical alignment, maintain a balance of power, and enable better decentralized coordination mechanisms - all of which can help mitigate existential risks. It also discusses the potential benefits and non-existential risks of open-sourced AGI.

5. Measuring the Openness of AI Foundation Models: Competition and Policy Implications

Another recent paper presents a comprehensive methodology for measuring the openness of AI foundation models, considering technical, economic, legal, and social factors. The analysis covers 11 prominent AI foundation models, providing a detailed ranking based on 18 variables across three key areas: knowledge problem, implicit contracting problem, and collective action governance problem. The findings challenge the common perception of a clear divide between "open" and "closed" AI models, showing a more nuanced spectrum of openness. Although closed-source AI models pose greater anti-competitive risks, making them a clear target for antitrust scrutiny, as open models are more transparent, can be forked, and do not allow the same leveraging of market power. The implications for antitrust enforcers highlight how the degree of openness can serve as a guide for identifying potential anti-competitive risks.

Strengths of open-source AI:?

  • Promotes innovation and collaboration: Open-source AI models allow for greater collaboration, knowledge sharing, and collective problem-solving within the research community. This can accelerate innovation and the development of new capabilities.
  • Transparency and accountability: Open-source models can be scrutinized, tested, and improved by a wider community, increasing transparency and accountability compared to closed-source alternatives.
  • Lower barriers to entry: Open-source models and tools can reduce the costs and technical expertise required to work with and build upon AI technologies, enabling more individuals and organizations to participate.
  • Avoids vendor lock-in: Open-source models are not tied to a single provider, giving users more flexibility and control over their AI infrastructure.
  • Fosters competition: The availability of open-source models can stimulate competition and innovation among commercial providers, leading to better products and services.?

Weaknesses of open-source AI:?

  • Sustainability and maintenance challenges: Sustaining and maintaining open-source AI projects can be difficult, as they often rely on volunteer contributions and may lack dedicated funding and resources.
  • Quality and reliability concerns: Without centralized oversight and quality control, open-source AI models may have inconsistent quality, performance, and reliability compared to commercially-backed alternatives.
  • Potential security and safety risks: The open nature of the code and data used in open-source AI models may increase the risk of security vulnerabilities, misuse, or unintended consequences.
  • Commercialization challenges: It can be more difficult for open-source AI projects to generate revenue and sustain commercial viability, which may limit their ability to compete with well-funded proprietary alternatives.
  • Fragmentation and lack of standardization: The proliferation of open-source AI projects and models can lead to fragmentation, making it challenging to ensure interoperability and standardization across the ecosystem.
  • Talent attraction and retention: Open-source AI projects may struggle to attract and retain top AI talent, who may be drawn to higher-paying opportunities in the commercial sector.?



Olivia Heslinga

Talk AI with me

4 个月

Wow this is wild - I’m a bit concerned with this whole branding of open source ? It seems to me the temptations to crowdsource novel ideals and models have already shown too profitable to keep that ethos alive when there are billions ?? on the table. Example : example so far that we are publically aware of is openAI and mistral selling out Microsoft?

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了