登录查看更多内容

Jumpstart Your CDx Development: Utilize ChatGPT to Create an FDA-Approved CDx Database in Just Five Minutes

Kui Shen

Director of Clinical Statistics at Bayer | Bayer Science Fellow

发布日期: 2023年12月10日

If you're a CDx lead and you are gearing up to develop a new CDx, it's crucial to start by exploring FDA-approved CDx devices. You can find a comprehensive list at FDA's Official Website, https://www.fda.gov/medical-devices/in-vitro-diagnostics/list-cleared-or-approved-companion-diagnostic-devices-in-vitro-and-imaging-tools. ?However, navigating this list can be daunting due to the absence of a search interface, making it a time-intensive task to pinpoint the information you need.

?In larger organizations, the usual route is to schedule a meeting with the support team, request a web crawl, and await the creation of a local database. But how long does this process take? Weeks? Months?

?Here's a thought: Why not do it yourself? Lack of Python programming skills? It is no longer a barrier. With ChatGPT, you can now achieve this in just five minutes! Dive into the ease of accessing and organizing FDA-approved CDx information without the wait.

What you need to do is write two prompts. The first one is as follows. Prior to proceeding, it is advisable to download the website's content and upload it to ChatGPT. This precaution helps to prevent potential blocking by the FDA website.

Task Description for an IT Developer:

Develop a Python script to extract URLs that begin with "https://www.accessdata.fda.gov/scripts/cdrh" from the uploaded webpage.  The script should efficiently parse the webpage's HTML content to find these specific URLs. Additionally, please ensure the script handles possible exceptions, such as connection errors or changes in webpage structure, and it should output the list of retrieved URLs in a readable format.

Below is the python code provided by chatGPT.

from bs4 import BeautifulSoup
import requests
import os


with open('List of Cleared or Approved Companion Diagnostic Devices (In Vitro and Imaging Tools) _ FDA.html','r', encoding='utf-8') as file:
    html_content = file.read()
    
soup = BeautifulSoup(html_content, 'html.parser')

fda_urls = []
for link in soup.find_all('a', href=True):
    href = link['href']
    if href.startswith("https://www.accessdata.fda.gov/scripts/cdrh"):
        fda_urls.append(href)

# Print the extracted URLs
for url in fda_urls:
    print(url)

Then ask the chatGPT to retrieve information for each device.

Software Imaging 1 年前

How I use ChatGPT as my rubber duck

Kevia Cloud 1 年前

ChatGPT Won't Make Us Redundant. Just Stronger

Omer Jacoby 1 年前

Task Description for an IT Developer:

Develop a Python script to parse specific data from the URL list: fda_url.

For context, the webpage's HTML structure contains lines like the following: <TR><TH Align="Left">PMA Number</TH><TD Align="Left">P190033</TD></TR>. In this example, "PMA Number" is the variable name, and "P190033" is its associated value.

Your script should extract and return the values for the following variables in a structured format, preferably a DataFrame:

Device
Generic Name
Applicant
PMA Number
Date Received
Decision Date
Product Code
Docket Number
Notice Date
Advisory Committee
Clinical Trials
Expedited Review Granted?
Combination Product
Approval Order Statement
Ensure the script is robust and can handle potential variations in the webpage's structure. Additionally, include comments in the code for clarity and maintainability.

And the code:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

def scrape_fda_data(url):
    # Send a GET request to the URL
# Initialize a session object
    session = requests.Session()

    # Add User-Agent header to the session
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }

    # Send a GET request to the URL with the session
    response = session.get(url, headers=headers)

    # Check if the request was successful
    if response.status_code != 200:
        return f"Failed to retrieve webpage. Status code: {response.status_code}"


    # Parse the HTML content
    soup = BeautifulSoup(response.content, 'html.parser')

    # Variables to extract (excluding "Approval Order Statement" for now)
    variables = [
        "Device",
        "Generic Name",
        "Applicant",
        "PMA Number",
        "Date Received",
        "Decision Date",
        "Product Code",
        "Docket Number",
        "Notice Date",
        "Advisory Committee",
        "Clinical Trials",
        "Expedited Review Granted?",
        "Combination Product"
    ]

    # Extract data for each variable
    data = {}
    for variable in variables:
        row = soup.find('th', text=variable)
        if row and row.next_sibling:
            data[variable] = row.next_sibling.get_text(strip=True)
        else:
            data[variable] = None

    # Special handling for "Approval Order Statement"
    approval_order_statement = soup.find('span', text='Approval Order Statement')
    if approval_order_statement and approval_order_statement.parent:
        data['Approval Order Statement'] = approval_order_statement.parent.get_text(strip=True).replace('Approval Order Statement', '', 1).strip()
    else:
        data['Approval Order Statement'] = None

    return data


# Initialize an empty DataFrame
df_all = pd.DataFrame()

# Loop through each URL and scrape data
for url in fda_urls:
    print(url)
    time.sleep(30)
    data = scrape_fda_data(url)
    df_all = df_all.append(data, ignore_index=True)

print(df_all)

download_directory = 'your folder' 
filename = "PMA_list.xlsx"
df_all.to_excel(os.path.join(download_directory, filename))

???????????? ???? ?????????? ???????? ????'?? ?????????????????? ???? ???????????? ???? ?????? ??????'?? ???????????? ?????????????????? ?????? ????????????????. ???? ?????????? ?????????? ?????????????? ???? ?????? ?????? ??????????????, ?? ???????? ???????????????? ?????? ????????.??????????(30) ?????????????? ???? ?????? ????????, ?????????? ???????????????????? ?? 30-???????????? ?????????? ?????????????? ????????????????.

That's all it takes to retrieve information on all FDA-approved CDx devices. Simple, isn't it?

Wenting Wang

Global Head of Biomarker Statistics at Sanofi

10 个月

Thanks for sharing. Very inspiring!

1 次回应

查看更多评论

要查看或添加评论，请登录

Kui Shen的更多文章

Precision Medicine in Health Agency Submissions: Biomarker Data Management

2023年12月8日

Precision Medicine in Health Agency Submissions: Biomarker Data Management

Disclaimer: The views and opinions expressed in this presentation are solely those of the author's and do not represent…
Deep Learning in Survival Analysis: Implementation of Breslow Approximation for Tied Event Times from Scratch

2023年12月3日

Deep Learning in Survival Analysis: Implementation of Breslow Approximation for Tied Event Times from Scratch

In my previous post, Deep Learning in Survival Analysis: loss function, we explored the loss function in survival…

Jumpstart Your CDx Development: Utilize ChatGPT to Create an FDA-Approved CDx Database in Just Five Minutes

Kui Shen

Director of Clinical Statistics at Bayer | Bayer Science Fellow

领英推荐

Kui Shen的更多文章

社区洞察

其他会员也浏览了

ChatGPT Won't Make Us Redundant. Just Stronger

Create a RAG Knowledge Base from Large Documents with ChatGPT and Code Analysis

How to Build a Speaking Robot using ChatGPT

Mastering ChatGPT: Debugging and Improving Prompts

How to Build ChatGPT like Custom Chatbot in 3 simple steps!

Rubber Duck Debugging with ChatGPT: A Developer's New Best Friend?

ChatGPT in 30 Minutes: NEW Prompt Engineering & AI Skills

Playing with ChatGPT

My 2nd date with ChatGPT

Unleashing the Power of ChatGPT for Application Development

领英推荐

Kui Shen的更多文章

Precision Medicine in Health Agency Submissions: Biomarker Data Management

Deep Learning in Survival Analysis: Implementation of Breslow Approximation for Tied Event Times from Scratch

社区洞察

其他会员也浏览了

ChatGPT Won't Make Us Redundant. Just Stronger

Create a RAG Knowledge Base from Large Documents with ChatGPT and Code Analysis

How to Build a Speaking Robot using ChatGPT

Mastering ChatGPT: Debugging and Improving Prompts

How to Build ChatGPT like Custom Chatbot in 3 simple steps!

Rubber Duck Debugging with ChatGPT: A Developer's New Best Friend?

ChatGPT in 30 Minutes: NEW Prompt Engineering & AI Skills

Playing with ChatGPT

My 2nd date with ChatGPT

Unleashing the Power of ChatGPT for Application Development