Senior Data Engineer

THE AI ACCELERATOR

The AI Accelerator is a brand-new, London-based hub, sitting within Computational Innovation (CI), which is a global organisation comprising computational biology, human genetics, data excellence and AI expertise.  

The purpose of CI’s AI Accelerator is to provision production-quality, versatile, foundational biomedical AI capabilities that can be adapted and deployed to improve and accelerate portfolio decision-making and increase the probability of success, by furthering understanding of the biology driving patient outcomes and identifying mechanisms involved in disease.  

A core component of the AI Accelerator is AI Enablement, a team focused on ensuring that the accelerator’s model provisioning teams can design, build and deploy versatile biomedical foundation models that can enhance human understanding of disease biology and help identify potential targets, biomarkers and patient segments for further research. 

This will be achieved by provisioning AI-ready, integrated, multimodal data for distributed training, managing the model lifecycle and partnering with the IT organisation to ensure that model builders and downstream users have the necessary infrastructure and tooling to prototype, implement, adapt and deploy AI capabilities to advance the portfolio. 

 

THE POSITION

We are seeking a Senior Data Engineer to join the AI Enablement team (@computationalinnovation) and contribute to the design and delivery of robust data engineering pipelines that transform harmonised biomedical datasets into AI-ready, integrated assets across multi-omics, clinical and health records, and medical imaging data.  

You will be an experienced, independent data engineer within AI Enablement, owning significant data engineering workstreams within the broader technical direction and architecture set by the Senior Staff Data Engineer. The pipelines and integrated datasets you build will enable model training, fine-tuning and inference.  

 

Key Responsibilities  

  • Transform harmonised datasets into AI-ready assets suitable for large model pre-training and fine-tuning within the defined standards and specifications 
  • Build and maintain entity linking pipelines that connect patients and biomedical entities across modalities 
  • Build and maintain cross-modal integration pipelines to support multimodal training, fine-tuning and inference 
  • Ensure pipelines and datasets are built and operated in accordance with data access permissions, consent conditions and usage restrictions 
  • Maintain data lineage and provenance throughout 
  • Build and maintain biomedical benchmark datasets with versioning and documentation 
  • Write clean, well-tested, well-documented code that meets the required engineering standards 
  • Contribute to code reviews within the data engineering team 
  • Stay current with advances in data engineering tooling and practices relevant to biomedical AI 

 

Required Qualifications 

  • PhD in Machine Learning, Computer Science, Bioinformatics, Computational Biology or a related quantitative field 
  • Strong hands-on experience in data engineering for machine learning 
  • Experience working with at least one biomedical data modality in a data engineering context 
  • Practical experience with entity linking or record linkage, ideally in a biomedical or clinical context 
  • Strong understanding of biomedical data characteristics such as variant data formats, expression matrices, clinical coding standards such as SNOMED and ICD-10 
  • Proficiency with modern data engineering tools  
  • Familiarity with data governance frameworks applicable to biomedical and clinical data 
  • Familiarity with Trusted Research Environments or controlled access biomedical data environments 
  • Experience with biomedical ontology systems and identifier mapping across modalities 
  • Contributions to open-source data engineering or bioinformatics tooling 

 

Second round interviews will take place weeks commencing 22nd and 29th June 

This is a hybrid role with approximately 3 days a week in the office 

 

WHY THIS IS A GREAT PLACE TO WORK

Boehringer Ingelheim has been recognised as a Top Employer in the UK, demonstrating our commitment to building an exceptional workplace through strong people practices and supportive HR policies.

To learn more about why BI is a great place to work, visit:

https://www.boehringer-ingelheim.co.uk/careers/uk-careers/why-great-place-work