Pfizer leverages a serverless architecture on AWS to process large amounts of digital biomarker data

The Pfizer Digital Medicine and Translational Imaging team is actively engaged in the development and implementation of digital biomarkers. These biomarkers are generated from data collected through wearable devices during extensive clinical trials. In these trials, participants wear devices equipped with accelerometers and near-body temperature sensors for a specified monitoring duration. Once the monitoring phase concludes, participants return the devices to their respective clinical sites. At these sites, the sensor data from each device is extracted and transmitted to Pfizer for further analysis.


Data capture process

In 2020, Pfizer sought a scalable and budget-friendly approach to handle digital biomarker processing. They devised a serverless framework on AWS, blending AWS S3 for data storage, AWS ECS for executing data processing tasks, and Lambda functions for managing S3 event triggers and data consolidation. Initially, this solution accommodated a limited set of algorithms and addressed a narrow range of biomarker types. This solution enabled Pfizer to rapidly process wearable sensor data at scale (1000’s+ clinical trial participants at once) while meeting high quality standards, saving time, and cost.

Solution Build in 2020

In time, the original pipeline needed updating to compute digital biomarkers at scale while maintaining reproducibility and data provenance. The key component in re-designing the pipeline is flexibility so the platform can:

  • Handle various incoming data sources and different algorithms
  • Handle distinct sets of algorithm parameters

Pfizer has enhanced its serverless architecture on AWS for processing digital biomarker data, making it more adaptable and configurable. The new framework uses AWS Step Functions and other serverless services for a file processing pipeline, along with a custom Python package for data ingestion and processing.

Pfizer's improved architecture supports multiple data sources and different algorithms. They also developed a specialized Python package, SciKit-Digital-Health (SKDH), for computing digital biomarkers. The updated solution uses AWS services like S3 for storage, DynamoDB for metadata, AWS Step Functions for process orchestration, AWS Batch with Fargate for data processing, and AWS SQS for messaging. The architecture includes two processing workflows for scanning new configuration files and processing study data and metadata files.

The initial step involves scanning the S3 bucket to detect new configuration files that contain study configuration information in YAML format. If a new configuration file is identified, the workflow proceeds to confirm the existence of the bucket specified in the configuration file and checks for the presence of pertinent data files within it. Subsequently, the workflow initiates the file processing for any files configured within the study bucket.

The second workflow primarily centers around handling study data and metadata files. Initially, it has the option to input metadata into the DynamoDB table and verifies if the processing prerequisites have been satisfied. It remains in a waiting state until these requirements are fulfilled. Upon meeting the criteria, a batch job is triggered, running on AWS Fargate. This batch job performs the computation of digital biomarkers using the specified SKDH package version, and the resultant outcomes are subsequently uploaded back to the study's S3 bucket.

Digital biomarker catalog and file processing workflow

This architecture allows Pfizer to process large amounts of digital biomarker data quickly and cost-effectively. It also allows them to scale their data processing capabilities to meet the demands of their research.

References :

  1. https://aws.amazon.com/blogs/architecture/large-scale-digital-biomarker-computation-with-aws-serverless-services/
  2. https://www.nikhilmahadevan.com/work/aws-digita-biomarkers

#AWS #Serverless

要查看或添加评论,请登录

Zia Tahir的更多文章

社区洞察

其他会员也浏览了