Automated Peptide Mapping for Quantitative Comparison of Biotherapeutics
Jonathan Jones
Mass spectrometry - data science - scientific marketing. Connect with me to learn about Genedata and Danaher’s mission for #DigitalizingBiopharma
In this article, we explore how Genedata Expressionist streamlines and automates peptide mapping for the quantitative comparison in biotherapeutics - a key area of analysis in biopharmaceutical research and development.
Peptide mapping is an essential analytical technique for characterizing the primary structure of protein-based therapeutics. It is generally used for amino acid sequence confirmation, connectivity assessment, and characterization of post-translational modifications (PTMs). Its high sensitivity to the smallest covalent structural changes of a protein has also enabled its usage as a valuable ‘finger-print’ for comparative analysis. In bioprocess development, for instance, peptide maps are employed for lot-to-lot identity testing. Likewise, peptide mapping is considered a vital step in comparing the sequence and PTMs between an innovator therapeutic protein and a biosimilar.
Recent developments in separation techniques, MS instrumentation as well as sample preparation procedures have allowed scientists to implement peptide mapping in their routine biotherapeutics characterization pipelines. While generating peptide maps becomes easier, comparing them remains a tedious process. In addition, data collected over time show variability in chromatograms due to shifts in elution times. With the upscaling of these experiments, analysts need to reliably identify and quantify peptide maps in an automated fashion as well as perform comparability studies to report out-of-tolerance Critical Quality Attributes (CQAs). Genedata Expressionist addresses these issues by providing one platform for optimizing the analysis of peptide maps, automating the process and performing subsequent downstream comparative analysis.?
Automated workflow concept
?Genedata Expressionist ensures high quality, efficient and reliable analysis over time thanks to automated workflows. The basic concept involves two main phases. The first one comprises of setting up a customized workflow and saving it for later usage. This is normally done just once by an expert at the beginning of a project. The second phase is the execution phase which simply involves loading the raw data and running the saved workflow with a one-click operation. When data processing is complete, Expressionist provides the user with immediate browsing and downstream analysis capabilities including performing statistical tests, visual verification of the results as well as generating customized reports. The automated workflow concept is generic for all applications supported by Genedata Expressionist. Figure 1 illustrates an automated peptide mapping workflow.
The process of setting up a workflow starts by connecting different workflow nodes - called 'activities' - which are suited for the specific application. This is usually followed by optimizing certain parameter settings that meet the needs of the data type being analyzed. Settings optimization is particularly important for noise reduction where the requirements may differ for different data types or even the same data acquired on different instruments. Genedata expressionist offers optimized procedures to obtain good quality results from MS instruments from different vendors. Following optimization, the customized workflow is saved for future applications. In addition, when granted manager roles, one has the option of locking up parameter settings that need and should not be changed. This introduces the concept of ‘approved’ workflows which can be shared among lab members. Consequently, ‘approved’ workflows allow standardizing downstream data analysis in labs working in GxP environments. Automation ensures consistency and efficiency in running standardized workflows especially if comparisons need to be done over time. Downstream of running the saved workflow in Expressionist, it is possible to perform a variety of activities such as comparative analysis. Figure 1 illustrates a typical strategy following a peptide mapping workflow which starts with manually reviewing results, followed by performing statistical tests, validating the corresponding candidates using visualizers, and finally generating a customized report.?
A fully automated peptide mapping workflow
The detailed components of a peptide mapping workflow are shown in Figure 2A. The peptide mapping activity is the core activity of the workflow where all settings specific to the search can be configured. The workflow involves also several steps of signal pre-treatment prior to the peptide mapping activity. In the case reported in Figure 2, importing raw files is followed by data cleaning to get rid of noise characteristics of MS data. Genedata Expressionist offer the flexibility to optimize this step to the specific instruments employed for the analyses. Following data cleaning is retention time (RT) alignment, a crucial step when comparing different samples. It corrects for drifts in RT which can result from technical variability in the chromatography setup. The activity produces aligned chromatograms that allow for accurate quantitative comparisons. It is possible to align samples against each other or all samples against a reference which enables comparisons of data collected over time. After data cleaning and alignment, the objective of the second block of activities is to detect peaks (centers and boundaries) and group isotopic clusters to submit them to peptide mapping search.
The peptide mapping activity, as mentioned, is the activity specific to the application and this is where all calculations related to peptide identifications and quantifications are performed. Figure 2B shows the settings tabs of the peptide mapping activity. The general settings tab shown includes mass tolerances for searching peptides and fragments. It is also possible to specify whether fragmentation spectra are required for identifications or whether matching by mass only is sufficient. The option of manual revision of results can be activated here. If so, a pop-up window with a list of all peptides identified is triggered before the final execution of the activity. This window allows the user to manually accept or reject peptides based on a priori knowledge or manual inspection of the data. Table 1 illustrates such a case. The list of identified glycopeptides includes a candidate where the glycan identified does not fit the expected pattern of classical glycans on an antibody (G0, G0F, and G1F). Importantly, it has the lowest score compared to the other glycopeptides. In this situation, the peptide was considered a false positive and was rejected from the final results list as shown in the table.
Genedata Expressionist provides its own search and scoring algorithm for peptide mapping. The sequence information (text or file) needs to be added to the sequence tab. Additional input is required for searching modifications, glycosylation, and disulfide bonding in the consequent tabs accordingly. In the modifications tab, for instance, it is possible to limit the search space by setting restrictions on the number of modifications allowed per peptide and/or on their positions. This helps in reducing the number of false positives when several modifications are searched simultaneously. For searching glycosylated peptides, the glycosylation tab provides the option of performing a library-based or customized search as well as the option to search for partial glycosylation. Disulfide bonding specifications include whether a fixed, scrambled or de novo search needs to be performed.
领英推荐
After optimal customization of the peptide mapping workflow (Figure 2A), it is saved and parameter settings that need to be kept unchanged are locked in an 'approved' workflow. Running approved peptide mapping workflows allows for standardized and efficient comparative studies such as matching a biosimilar to an innovator therapeutic, assessing batch-to-batch variability, or monitoring manufacturing changes. When new peptide maps need to be analyzed, the saved workflow is simply run. If manual revision of results is activated, then the workflow will not completely go through until manually reviewing the peptide list and accepting the ones to be used in the final results. Once the results tables are obtained, it is possible to branch out from the saved workflow and perform statistical analysis on the spot (Figure 2C). As shown in the figure, the type of quantitative measure to be used is first specified in the data setup. This is followed by normalizing the data and performing basic statistical tests such as ANOVA and finally generating a report that lists, for example, the top 20 ranked significantly changing peptides.
Comparative analysis of peptide maps
?Peptide mapping is quite often a comparative procedure. When compared to a reference, peptide maps are capable of detecting structural alterations. Identifying significant changes between the peptide map of a reference and a sample of interest often requires statistical analysis. Here, percent abundance normalization was first employed to allow monitoring of changes in the expression levels of variable modifications (deamidations and oxidations) relative to their unmodified counterpart. The ANOVA test was subsequently performed to detect significantly changing peptides (Table 2). This test is often complemented with an Absent/Present search to identify peptides that are exclusively showing up in one group of samples and therefore, they are very interesting to identify. Importantly, all results tables are linked to visualizers which are associated with every activity in the workflow. This provides an excellent platform to visualize significantly changing candidates between samples. The chromatogram view, for instance, is an activity that can provide the classical 1D mirror plots of the chromatograms compared (Figure 3). However, these mirror pilots might suffer from problems related to co-eluting peptides or sub-optimal chromatographic separation.
The 2D visualizers associated with all the activities of the peptide mapping workflow provide a more accurate approach to follow-up on results and verify the statistically significant quantitative changes. Figure 4 illustrates the 2D visualization of two deamidations, one discovered by ANOVA (Figure 4A) to be significantly highly expressed in sample B, and the second found by the Absent/Present search (Figure 4B) to be exclusively present in sample B. These visualizers verify the results of both tests showing the higher level of expression of the deamidated peptides in sample B (stressed sample). Additionally, validating the sequence identity of these peptides can be done by examining their corresponding fragmentation spectra which are visualized along with annotations in the peptide mapping activity results. Figure 4C corresponds to the fragmentation spectrum of the deamidated peptide in Figure 4B overlaid with the fragmentation spectrum of its unmodified counterpart. Linked 2D visualizers are powerful and appreciated tools to validate the results of statistical analysis.
Summary
?Automated and standardized data analysis is key in settings where peptide mapping is a routine procedure. Genedata Expressionist provides a flexible software platform that can be tailored to your specific instrumentation and analytical needs. ?The platform allows running these workflows in an automated fashion offering efficient and standardized data analysis. Complete automation can be combined with user validation of results based on a priori knowledge or manual inspection of the data.
From raw data to the final report, Genedata Expressionist offers streamlined high-quality data analysis with considerable time savings.??