登录查看更多内容

AI for Cancer Therapeutics: Machine Learning & Biomolecular Modelling of Binding kinetics of CAR-T Cells to Hematologic Neoplastic Cells

Jong Hang Siong

I founded OTONOCO in Singapore to design and build SaaS and Mobile Apps that incorporates Generative and Agentic AI to solve complex problems in the industry

发布日期: 2023年3月23日

No alt text provided for this image — The quote was taken from Narrative Economics course on Coursea with the permission from Professor Robert J. Schiller

The manufacturing of CAR-T cells begins with obtaining blood samples from a patient.?T-lymphocytes were purified from the samples.?These cells were artificially activated by presenting them with an antigen of interest using magnetic bead as illustrated in Fig 1 below.?

Upon activation, they were genetically engineered by inserting a foreign gene into their genome.?The integrated transgene synthesizes receptor protein and expressed it on the cell surface so that it can bind to antigens on cancer cells.?At the manufacturing facility, the transgenic cells were washed and filled into bags before shipped to hospital to be administered back to the patient where they originated from.?In the blood circulation, the transgenic cells remove the neoplastic cells by binding the their surface antigens to trigger a cascade of events that eventually lead to the destruction of neoplastic cells (Fig 1-1).

Selection of suitable a target antigen for CAR-T therapy is imperative to increase the probability of successful binding of receptor protein to the antigen of interest.?Another important criteria of selection is the strength of binding of the receptor protein to the antigen on neoplastic cells.?This aspect of selection is discussed in length in this article.?The following diagram shows some of the promising target antigens for CAR-T therapy (Fig 1-2).

Biomolecular Kinetics Modelling of CAR-T Cells Binding

Binding Mechanism of CAR-T Receptor and Cancer Antigen

Studies have found that the coupling of CAR-T (denoted C) and neoplastic cells (denoted L) to form C.L complex is not absolutely permanent.?The C.L complex may dissociate to produce free C and L cells before forming C.L again. This phenomena is similar to reverse reaction in chemistry (Fig 2-1). ?

If C and L remain in the circulation long enough, it will achieve an equilibrium. Equilibrium is the state at which the net exchange of C and L does not change over time. The kinetics of reverse reactions can be described mathematically using Le Chatelier principle.?The question now is how do we know if C and L are in equilibrium.?One way to determine this is to conduct laboratory experiments to measure the fraction of C.L complex when different concentration of C is added.??Reverse reaction such as this can be mathematically described using Le Chatelier principle.?Dissociation Constant, Kd at equilibrium can then be determined from data collected from laboratory experiments using curve fitting techniques.?Besides Dissociation Constant, Association Constant, Ka can be used.?In this article, Kd will be used as basis for calculation.?The following illustrates derivation of design equation following Le Chatelier principle illustrated in the following diagram:

[C]?concentration of free CAR-T cells

[L]?concentration of free Leukemia/Lymphoma cells

[C.L]?concentration of coupled cells

I present here with 2 schools of thought with respect to the justification of the above design equation.

The second school of thought is adopted here because it is theoretically sound and practically possible and plausible.?Furthermore, future development of CAR-T technologies favors the latter.??As a consequence, whatever amount of total concentration of [C] we added, it is closed to concentration of free [C]:

Biomolecular Method to Determine Kd

Surrogate Datasets: Imatinib and BCR-ABL Complex Binding

I have used 2 datasets obtained from MIT Course 7.QBWx Quantitative Biology Workshop to determine the Equilibrium Constant as surrogate to CAR-T binding.?The first dataset contains measurements of BCR-ABL fraction bound by therapeutic protein, Imatinib.?

Imatinib is used to treat hematologic neoplasms such as chronic myelogenous leukemia.?It is also called Tyrosine Kinase inhibitor (TKI) manufactured by Novartis under the name of Gleevec and Glivec.?The second dataset contains measurements of the BCR-ABL-Imatinib binding fraction using fluorescent technique.?Binding of Imatinib to BCR-ABL complex increased the intensity of fluorescence picked up by spectrophotometer.

At a glance, data points in the first dataset seemed to start from (0,0) origin while data points in the second dataset appeared to have started at close to 500 fluorescence unit (FU) of intensity.

Aside, Pathogenesis of Chronic Myelogenous Leukemia

A piece of chromosome 9 and a piece of chromosome 22 break off and trade places. The BCR-ABL gene is formed on chromosome 22 where the piece of chromosome 9 attaches. The changed chromosome 22 is called the Philadelphia chromosome (National Cancer Institute).

Imatinib works by binding close to the ATP binding site of BCR-ABL. This blocks the enzyme activity of the protein semi-competitively.

Calculation of Kd using MATLAB

MATLAB was used to compute Kd values from the datasets.?For the first dataset, Equation 6 from Figure 3-1 was used as fitting equation.?Data was read and converted into a matrix.?The first column was extracted as variable x and the second variable was y.?MATLAB curve fitting function fittype was used to model the equation.?This model was then used to fit the dataset.?The following shows MATLAB codes and outcome:

Due to the apparent intercept on y-axis in the second dataset,?a constant denoted b was added to the equation.?Another practical consideration concerning measuring instrument was also included.?In laboratory, there are NO two measurement instruments that are absolutely identical (MIT 7.QBWX).?Hence, another constant denoted a which represents the inherent property of the instrument was added to the equation’s binding term.?As a result, 3 coefficients had to be determined - a, b and k.?The following diagram shows the outcome produced by MATLAB:

The value of Kd was found to be 1.438 x 10^4, intercepted y-axis at 451.7.?The instrument constant was found to be 2280.?The intercept on y-axis represents background fluorescence.?Before any binding took place, the dye used emit some background fluorescence.?Interestingly, the value of Kd intersected with the curve at 1600 FU.?This value is close to the resulting subtraction of a from b,ie., 2280 – 451.7 ~ 1800.

Machine Learning Approach

There are generally 2 machine learning approaches used to predict Drug-Target Interactions.?One is binary classification method to determine if an interaction exists for a given pair of drug and target.?Another one is regression method to estimate continuous values that indicate a drug’s ability to bind to the target of interest.?The ability to bind is also called Binding Affinity. ??Many of these methods are based on molecular structure that require three-dimensional (3D) structural information of targets which is still scarce at the time of this writing. ?In order to circumvent this condition, I have resorted to a recently developed graph-based representation learning technique developed by Thafar et al, 2022 called Affinity2Vec. ?This method has been published in Scientific Reports.?The authors have constructed a weighted heterogeneous graph that integrates data from several sources, including drug-drug similarity, target-target similarity, and Drug-Target binding affinities and equilibrium constants.?

Data Processing for Machine Learning

Two datasets were provided by the authors in github to benchmark Affinity2Vec, ie., Kiba Set and Davis Set. I have used Davis set to build machine learning regression models to predict Dissociation Constant (KD) and Binding Affinity for Drug-Target pairs.

Several variants for each target was created as follow:

Affinity Scores – (1) Logarithm Base10, (2) Normalized and (3) Exponential
Kd – (1) Orignal values and (2) Natural Logarithm

The following shows snippets of python codes that I have developed to process and assemble the data.?The logics introduced into the codes largely followed those recommended by Thafar et al.

Drug IDs and Protein IDs were the first to be retrieved.?There were 68 unique Drug IDs and 442 unique Protein IDs.?The product of all Drug – Target combinations resulted in a total of 30065 Drug – Target pairs.

领英推荐

How Micro-RNA Can Change the Way We Treat Cancer and…

GRG Health 1 个月前

AI Tool Using Single-Cell Data Has Promise for…

Monica Bertagnolli 10 个月前

Meet JESS: The latest addition to our biology…

o2h discovery 7 个月前

Equilibrium Constant Kd data was read as a numpy object:

Each unique Protein ID was indexed for later use.

This step is similar to the first one to produce the to store Drug – Target pairs and Equilibrium Constants, Kd as labels.

Binding Affinity values were retrieved from pickle file.?Three variants of Affinity were computed – Logarithm base 10, normalized and exponential.

These variants were added as new columns to the data frame.

Two variants of Equilibrium Constants, Kd were also computed and added as new columns to the data frame.

Drug-Target Combinations obtained from the first step were added to the data frame.

Targets for Predictions

A total of 5 targets have been created split into 2 groups: (1) Binding Affinity, (2) Equilibrium Constant, Kd

A ML model was trained for each of the above targets and its performance was measured using Test Set.

Machine Learning Training

Before the training a ML model, the final dataset was split into 3 parts: 75% training set, 20% test set and 5% acts as unseen data.?A commercially available automated ML platform from H2O.ai, DriverlessAI was used to train the data.

Performance of ML Model

Driverless AI comes bundled with a number of ML algorithms such as Decision Tree, Generalized Linear Model, LightGBM, XGBoost, etc.?During the training, different combinations of features from the data were automatically used to train intermediate models.?The performances of these models were evaluated internally through many iterations until the best model was discovered.?Upon completion of the ML model, Test Set was used to evaluate its performance using the following metrics:

First look at the Equilibrium Constant, Kd predictions.?The model performance was abysmal with MAPE of over 6000% and equally poor for the rest of the metrics.?After transforming Kd using natural logarithm, the performance improved tremendously with MAPE reduced to 31% with MSE and RMSE close to 2 respectively.

As for Binding Affinity, all 3 variants of this target showed comparable outcomes with Exponentially transformed variant scoring the best.?Normalized variant scored poorly at 33% with respect to the other 2 variants.

In conclusion, appropriate feature engineering and target transformation is crucial to train machine learning models that perform and generalize well.

Performance on Unseen Data

The Regression model to was used to predict the Kd on a total of 752 UNSEEN data.?The following screenshot shows the performance on H2O DriverlessAI:

R2: 0.58
MAPE: 25%
MSE: 1.9
RMSE: 1.4

The resulting performance on predicting Binding Affinity:

R2: 0.58
MAPE: 5%
MSE: 0.3
RMSE: 0.6

The 2 models above shows comparable results with Test Set performance.?In actual drug discovery setting, the UNSEEN data could potentially come from experiments to identify surface protein as disease biomarkers.?Curated proteins from past experiments are also ideal candidates to be screened by ML models for potential targets.

Protein and Target Selection

Prediction results of UNSEEN data for Kd and Affinity were combined using DRUG_ID and PROTEIN_NAME as join keys.

Predicted values of Kd and Affinity were standardized to 0 – 1.

In pursuing the most optimal Drug – Target pairs, I have set a new search criteria for the pairs with lowest possible Kd and highest possible Affinity.?In order to achieve this, a new column KD_AFF_DIFF was created by calculating substraction of standardized Kd from standardized Affinity.?The data was then sorted in descending order by the difference of this calculation.

Analysis of Top 5 Predictions

The top 5 pairs obtained were further analyzed using information from PUBCHEM.

Machine Learning Workflow for Discovery

要查看或添加评论，请登录

Jong Hang Siong的更多文章

Finding the Most Important Chromosome in Human using High Dimensional Data Analysis

2025年3月25日

Finding the Most Important Chromosome in Human using High Dimensional Data Analysis

The Chromosome A human cell is generally made up of cell membrane, cytoplasm and a nucleus. Inside the nucleus, there…
OTONOCO Medical AI at your Fingertips Phase 2 - Microcontrollers & Single Board Computers

2024年8月28日

OTONOCO Medical AI at your Fingertips Phase 2 - Microcontrollers & Single Board Computers

The Second Phase - Embedded AI on Microcontrollers and Pi OTONOCO is entering the second phase of 'AI at your…
Applications of Multimodal and Multilingual Generative AI for Patient Care at Home

2024年7月24日

Applications of Multimodal and Multilingual Generative AI for Patient Care at Home

The Problem Under the Hood Youtube Demo Getting to Know Your Medicine Before Taking It - ENGLISH Getting to Know Your…
Real-Time Anomaly Detection in Medical Images using Embedded Deep Learning Models on iOS and Android Devices

2024年7月19日

Real-Time Anomaly Detection in Medical Images using Embedded Deep Learning Models on iOS and Android Devices

Data Sources Images used to train deep learning models for real time anomaly detection from medical images were…
Embedded Machine Learning - Scaling Deep Learning Models for Medical Images to Mobile Devices

2024年7月17日

Embedded Machine Learning - Scaling Deep Learning Models for Medical Images to Mobile Devices

Data Sources Data for deep learning model training have been obtained from the following sources for NON-COMMERCIAL…
Generative AI and Large Multimodal Models for Petroleum Refining

2024年5月19日

Generative AI and Large Multimodal Models for Petroleum Refining

This use case presents the capability of Large Language Models and Large Multimodal Modals in transforming the…
7 Science and Engineering Masterpieces (books) that out-of-print

2024年3月29日

7 Science and Engineering Masterpieces (books) that out-of-print

I have compiled a list of science and engineering books that I consider to be masterpieces that are out-of-print but…
Massive Scale-Out of Deep Learning (DL) Models for Computer Vision to Android and iOS Devices using Flutter Framework

2023年11月21日

Massive Scale-Out of Deep Learning (DL) Models for Computer Vision to Android and iOS Devices using Flutter Framework

Gerald Yong What this Article is About This article discusses scaling out deployment of deep learning for computer…
Enhancing Discovery in Scientific Research through Object-Oriented Approach for Large Language Models

2023年10月28日

Enhancing Discovery in Scientific Research through Object-Oriented Approach for Large Language Models

The Motivation This article presents an idea of combining Object-Oriented programming (OOP) paradigm for Large Language…
AI for Engineering: GPT-Powered Numerical Methods to Solve Engineering Problems

2023年8月4日

AI for Engineering: GPT-Powered Numerical Methods to Solve Engineering Problems

Motivation The ability to solve complex problems methodically and systematically is of utmost importance in the…

1 条评论

See all articles

AI for Cancer Therapeutics: Machine Learning & Biomolecular Modelling of Binding kinetics of CAR-T Cells to Hematologic Neoplastic Cells

Jong Hang Siong

I founded OTONOCO in Singapore to design and build SaaS and Mobile Apps that incorporates Generative and Agentic AI to solve complex problems in the industry

Biomolecular Kinetics Modelling of CAR-T Cells Binding

Binding Mechanism of CAR-T Receptor and Cancer Antigen

Biomolecular Method to Determine Kd

Surrogate Datasets: Imatinib and BCR-ABL Complex Binding

Aside, Pathogenesis of Chronic Myelogenous Leukemia

Calculation of Kd using MATLAB

Machine Learning Approach

Data Processing for Machine Learning

领英推荐

Targets for Predictions

Machine Learning Training

Performance of ML Model

Performance on Unseen Data

Protein and Target Selection

Analysis of Top 5 Predictions

Machine Learning Workflow for Discovery

Jong Hang Siong的更多文章

社区洞察

其他会员也浏览了

Bioprinting Spheroids for High-Throughput Applications: Advancing Cancer Research

OncoNeo400 - A new Precision Oncology Research AI tool on BioAIWorks

Unveiling PERCEPTION: The AI-Powered Breakthrough in Precision Oncology

Generation and Application of BRCA1 Knockout Mice

?? Scalable Allogeneic Cell Expansion for Cancer Immunotherapy

A Step Towards Innovation: The Progressive Escalation of Genetic Cancer Biomarker Market in North America | UnivDatos

Fast Forwarding Innovative Therapies, Treatments & Technologies

Transforming Cancer Treatment: CRISPR-Cas9 Revolutionizes CAR-T Cell Therapy (Part 24- CRISPR in Gene Editing and Beyond)

cfDNA TAPS Technology Provides Multimodal Information for Early Cancer Detection | Literature Review --Genfine

Unveiling the mystery of ARID1A: a key player in cancer research

Biomolecular Kinetics Modelling of CAR-T Cells Binding

Binding Mechanism of CAR-T Receptor and Cancer Antigen

Biomolecular Method to Determine Kd

Surrogate Datasets: Imatinib and BCR-ABL Complex Binding

Aside, Pathogenesis of Chronic Myelogenous Leukemia

Calculation of Kd using MATLAB

Machine Learning Approach

Data Processing for Machine Learning

领英推荐

Targets for Predictions

Machine Learning Training

Performance of ML Model

Performance on Unseen Data

Protein and Target Selection

Analysis of Top 5 Predictions

Machine Learning Workflow for Discovery

Jong Hang Siong的更多文章

Finding the Most Important Chromosome in Human using High Dimensional Data Analysis

OTONOCO Medical AI at your Fingertips Phase 2 - Microcontrollers & Single Board Computers

Applications of Multimodal and Multilingual Generative AI for Patient Care at Home

Real-Time Anomaly Detection in Medical Images using Embedded Deep Learning Models on iOS and Android Devices

Embedded Machine Learning - Scaling Deep Learning Models for Medical Images to Mobile Devices

Generative AI and Large Multimodal Models for Petroleum Refining

7 Science and Engineering Masterpieces (books) that out-of-print

Massive Scale-Out of Deep Learning (DL) Models for Computer Vision to Android and iOS Devices using Flutter Framework

Enhancing Discovery in Scientific Research through Object-Oriented Approach for Large Language Models

AI for Engineering: GPT-Powered Numerical Methods to Solve Engineering Problems

社区洞察

其他会员也浏览了

Bioprinting Spheroids for High-Throughput Applications: Advancing Cancer Research

OncoNeo400 - A new Precision Oncology Research AI tool on BioAIWorks

Unveiling PERCEPTION: The AI-Powered Breakthrough in Precision Oncology

Generation and Application of BRCA1 Knockout Mice

?? Scalable Allogeneic Cell Expansion for Cancer Immunotherapy

A Step Towards Innovation: The Progressive Escalation of Genetic Cancer Biomarker Market in North America | UnivDatos

Fast Forwarding Innovative Therapies, Treatments & Technologies

Transforming Cancer Treatment: CRISPR-Cas9 Revolutionizes CAR-T Cell Therapy (Part 24- CRISPR in Gene Editing and Beyond)

cfDNA TAPS Technology Provides Multimodal Information for Early Cancer Detection | Literature Review --Genfine

Unveiling the mystery of ARID1A: a key player in cancer research