Jumpstart Your CDx Development: Utilize ChatGPT to Create an FDA-Approved CDx Database in Just Five Minutes
If you're a CDx lead and you are gearing up to develop a new CDx, it's crucial to start by exploring FDA-approved CDx devices. You can find a comprehensive list at FDA's Official Website, https://www.fda.gov/medical-devices/in-vitro-diagnostics/list-cleared-or-approved-companion-diagnostic-devices-in-vitro-and-imaging-tools. ?However, navigating this list can be daunting due to the absence of a search interface, making it a time-intensive task to pinpoint the information you need.
?In larger organizations, the usual route is to schedule a meeting with the support team, request a web crawl, and await the creation of a local database. But how long does this process take? Weeks? Months?
?Here's a thought: Why not do it yourself? Lack of Python programming skills? It is no longer a barrier. With ChatGPT, you can now achieve this in just five minutes! Dive into the ease of accessing and organizing FDA-approved CDx information without the wait.
What you need to do is write two prompts. The first one is as follows. Prior to proceeding, it is advisable to download the website's content and upload it to ChatGPT. This precaution helps to prevent potential blocking by the FDA website.
Task Description for an IT Developer:
Develop a Python script to extract URLs that begin with "https://www.accessdata.fda.gov/scripts/cdrh" from the uploaded webpage. The script should efficiently parse the webpage's HTML content to find these specific URLs. Additionally, please ensure the script handles possible exceptions, such as connection errors or changes in webpage structure, and it should output the list of retrieved URLs in a readable format.
Below is the python code provided by chatGPT.
from bs4 import BeautifulSoup
import requests
import os
with open('List of Cleared or Approved Companion Diagnostic Devices (In Vitro and Imaging Tools) _ FDA.html','r', encoding='utf-8') as file:
html_content = file.read()
soup = BeautifulSoup(html_content, 'html.parser')
fda_urls = []
for link in soup.find_all('a', href=True):
href = link['href']
if href.startswith("https://www.accessdata.fda.gov/scripts/cdrh"):
fda_urls.append(href)
# Print the extracted URLs
for url in fda_urls:
print(url)
Then ask the chatGPT to retrieve information for each device.
领英推荐
Task Description for an IT Developer:
Develop a Python script to parse specific data from the URL list: fda_url.
For context, the webpage's HTML structure contains lines like the following: <TR><TH Align="Left">PMA Number</TH><TD Align="Left">P190033</TD></TR>. In this example, "PMA Number" is the variable name, and "P190033" is its associated value.
Your script should extract and return the values for the following variables in a structured format, preferably a DataFrame:
Device
Generic Name
Applicant
PMA Number
Date Received
Decision Date
Product Code
Docket Number
Notice Date
Advisory Committee
Clinical Trials
Expedited Review Granted?
Combination Product
Approval Order Statement
Ensure the script is robust and can handle potential variations in the webpage's structure. Additionally, include comments in the code for clarity and maintainability.
And the code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
def scrape_fda_data(url):
# Send a GET request to the URL
# Initialize a session object
session = requests.Session()
# Add User-Agent header to the session
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
# Send a GET request to the URL with the session
response = session.get(url, headers=headers)
# Check if the request was successful
if response.status_code != 200:
return f"Failed to retrieve webpage. Status code: {response.status_code}"
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Variables to extract (excluding "Approval Order Statement" for now)
variables = [
"Device",
"Generic Name",
"Applicant",
"PMA Number",
"Date Received",
"Decision Date",
"Product Code",
"Docket Number",
"Notice Date",
"Advisory Committee",
"Clinical Trials",
"Expedited Review Granted?",
"Combination Product"
]
# Extract data for each variable
data = {}
for variable in variables:
row = soup.find('th', text=variable)
if row and row.next_sibling:
data[variable] = row.next_sibling.get_text(strip=True)
else:
data[variable] = None
# Special handling for "Approval Order Statement"
approval_order_statement = soup.find('span', text='Approval Order Statement')
if approval_order_statement and approval_order_statement.parent:
data['Approval Order Statement'] = approval_order_statement.parent.get_text(strip=True).replace('Approval Order Statement', '', 1).strip()
else:
data['Approval Order Statement'] = None
return data
# Initialize an empty DataFrame
df_all = pd.DataFrame()
# Loop through each URL and scrape data
for url in fda_urls:
print(url)
time.sleep(30)
data = scrape_fda_data(url)
df_all = df_all.append(data, ignore_index=True)
print(df_all)
download_directory = 'your folder'
filename = "PMA_list.xlsx"
df_all.to_excel(os.path.join(download_directory, filename))
???????????? ???? ?????????? ???????? ????'?? ?????????????????? ???? ???????????? ???? ?????? ??????'?? ???????????? ?????????????????? ?????? ????????????????. ???? ?????????? ?????????? ?????????????? ???? ?????? ?????? ??????????????, ?? ???????? ???????????????? ?????? ????????.??????????(30) ?????????????? ???? ?????? ????????, ?????????? ???????????????????? ?? 30-???????????? ?????????? ?????????????? ????????????????.
That's all it takes to retrieve information on all FDA-approved CDx devices. Simple, isn't it?
Global Head of Biomarker Statistics at Sanofi
10 个月Thanks for sharing. Very inspiring!