AWS, Cloud, HPC, Machine Learning, BioTech

High Performance Computing using Parallel Cluster, Infrastructure Set-up

Key Challenges

Setting up a robust, stable, highly available, and scalable HPC infrastructure with automated deployment and termination aspects posed significant challenges. Additionally, ensuring end-to-end streamlined processes for users, implementing robust security measures to protect data, and facilitating GPU and CPU intensive models on a secured platform were critical challenges addressed during the project.

Key Results

The implementation of a high-performance computing (HPC) cluster infrastructure on AWS has resulted in accelerated processing of large protein datasets, significantly reducing computing time from hours to minutes. This has led to faster delivery of results to users within 45-60 minutes, increased throughput, scalability, and growth potential, ultimately providing IridescentBio with a competitive advantage in their field.

Overview

High Performance Computing using Parallel Cluster, Infrastructure Set-up

IridescentBio is headquartered in Leiden, The Netherlands, and focuses on using physicochemical modeling and AI to predict developability for antibody R&D in biopharma. Their cloud tool can reverse-engineer antibody patents to extract a developability profile, or predict this when prototyping new sequences. In the 21st century, Biology is increasingly intertwined with Data, the Cloud, AI, and modeling, creating a symbiotic relationship. Biomolecules have become essential foundations for potent therapies.

The importance of patenting and intellectual property (IP) in the field of biopharmaceuticals is set to multiply exponentially. The vision of IridescentBio is to provide the premium platform for unraveling, modeling, analyzing, and forecasting intellectual property (IP) in protein-based pharmaceutical sciences, encompassing every level from atomic structure to practical application and empower researchers and businesses to expedite the introduction of groundbreaking therapies to the market, catalyzing the transformation of biotechnology and generating wide-ranging benefits for society.

Challenges

IridescentBio has developed a web application where users can enter a biopharma patent number or antibody (protein) sequence . This will ignite the generation of 3D data for each sequence and perform complex calculations to generate reports and send them back to the end user. For performing complex calculations they needed a high performance computing infrastructure which is robust, stable, highly available and scalable with automated deployment and termination aspects. The IridescentBio team was looking for a partner who can build and manage their infrastructure on AWS.


IridescentBio wanted to establish an end-to-end streamlined process for the end-users. The user can input a patent number or antibody sequence seamlessly through a user-friendly web interface, triggering a series of calculations and operations in the cloud. Once the processing is complete, a comprehensive report has to be generated and securely delivered to the customer. A top priority was also to ensure the utmost security by implementing robust measures to protect the data, as well as considering all necessary security parameters for data, application hosting, and high-performance computing (HPC) infrastructure.

The deployment of GPU and CPU intensive models, which require significant core hours to complete simulations, has to be facilitated on a single, secured platform. Ankercloud actively supported IridescentBio in designing and implementing the appropriate solution on AWS which includes components such as virtual machines, storage, and networking to optimize resource utilization. Additionally, upon completion of the processing, all data, calculations and the underlying cluster will be systematically destroyed, guaranteeing complete data privacy and security.

Solution

  • The complete infrastructure is hosted on AWS. For the same Ankercloud created Pipeline for:
    1. Setting up an HPC cluster in AWS Environment for completing the desired task automatically without any human intervention.
    2.  For each patent ID or antibody sequence a separate HPC environment will be created.
    3. Deleting the resources created post completion of the task.
  • Setting up an HPC cluster in AWS Environment for completing the desired task automatically without any human intervention. 
  • API gateway is used for integrating web applications to the backend infrastructure.
  • Step functions are used for running  multiple Lambda functions as per the requirement.
  • Lambda function is used to create the HPC cluster, running the HPC workload, copy the data back to the website, and once the tasks are completed, terminate the HPC cluster.
  • Setting up S3 bucket for storing report and 3D data for each sequence.
  • AWS ParallelCluster is an open source cluster management tool that simplifies deploying and managing HPC clusters.
  • Setting up a protein Database, so that customers don't have any dependency on the publicly provided protein database.
  • Two different environments for production and testing will be provisioned.
  • Setup of Cloudwatch dashboard from where we can monitor each HPC cluster performance separately.

Business Outcome

  • Accelerated Processing: The decreased computing time enabled by the high-performance cluster has significantly accelerated the processing of large protein data sets. Tasks that previously took  hours can now be completed in a few minutes, leading to enhanced productivity.
  • Faster Results Delivery: With the reduced processing time, IridescentBio is now able to deliver results to users more quickly. The output files generated from the protein data set processing are uploaded to the client's website within 45-60 minutes, allowing users to access and utilize the information promptly.
  • Increased Throughput: The high-performance cluster's increased computing power and optimized algorithms enable the client to handle a larger volume of protein datasets within a given time frame. This increased through put enhances the overall capacity of the system and allows for more extensive analysis and processing capabilities.
  • Scalability and Growth Potential: The high-performance cluster infrastructure is designed to scale efficiently, accommodating future growth and increased demand for processing large datasets. This scalability provides the client with the flexibility to handle larger data sets and expand their operations without compromising performance.
  • Competitive Advantage: The improved productivity resulting from reduced computing time gives the client a competitive edge in their field. They can now deliver faster results, make more timely decisions, and stay ahead of competitors who may still be grappling with prolonged processing times.
  • Cost Effectiveness: As AWS resources follow the pay-as-you-go model, it reduces costs by committing to services or resources for a period of time used only.

Overall, the business outcomes achieved through the high-performance cluster solution and reduced computing time have positively impacted the client's productivity, user experience, and competitive positioning. The improved efficiency and throughput have set the stage for further growth and innovation in the client's domain with enhanced security along with cost effectiveness.

Based on the joint project and AWS set-up the IridescentBio  team was able to launch their platform for their customers. Are you interested to see this project in action? take a look at: https://www.iridescent.bio/

Share this post

IridescentBio

Related Case Studies

Autonomous Mobility MLOps with AWS Migration

AWS, Cloud Migration, MLOps
Read Case Study

AI & ML Solution for a Facade Building Company

AWS, AL & ML, Construction, APAC
Read Case Study

Bitech AG DevOps Migration from on-prem to AWS for German ISV

AWS, DevOps, SaaS
Read Case Study

The Ankercloud Team loves to listen