GenoML

GenoML Genomic variant interpretation with machine learning

Realization
  • HES-SO Fribourg
  • Prof. Beat Wolf
  • beat.wolf@hefr.ch
  • Jonathan Donzallaz
Keywords
  • Machine learning
  • Genomics
  • NGS
Our skills
  • Applied Machine Learning, building pre-production software including deep learning algorithms
Valorisation
  • Technology transfer to industrial partners
Partnership
  • Phenosystems SA
Funding
Swiss Innovation Agency – innosuisse under grant 55606.1 INNO-ICT
Schedule
01.06.2021 – 01.06.2022
Improve DNA sequence analysis using machine learning to predict sequencing artefacts. In this feasibility study we explored the possibility to train a machine learning model to detect artefacts originating from either sequencing or alignment errors. We compared our approach with existing methods such as DeepVariant and GATK CNN and achieved similar results, but with an analysis speed that was much faster.

For this project we created an automated pipeline to create artificial NGS data with known errors using NEAT-genreads and aligned them against the human reference genome. This allowed us to show the ability to train a machine learning model against specific data and compare it against the known errors that were injected. In the end we compared our method on the well known HG001 (NA12878) dataset, which allowed us to verify the transferability between synthetic and real data.

The project was done as part of an Innocheque with Phenosystems SA.