11 MLM-SOS-ALL: With Machine Learning and Modelling in search of significant features in the SOS-ALL dataset

SOS-ALL Consortium: Baerenfaller Katja2/3, Schmid Marco1/2, Keller Thomas1/2, van Schie Alexander1/2, Roelke Heiko1/2

  1. University of Applied Sciences of the Grisons (FHGR), Chur, Switzerland 2             Center for Data Analytics, Visualization and Simulation (DAViS), Chur, Switzerland 3             Swiss Institute of Allergy and Asthma Research (SIAF), University of Zurich, Swiss Institute of Bioinformatics, Davos, Switzerland   Aim: The SOS-ALL Consortium acquired a large dataset of RNA sequencing and extensive questionnaire data from South African children with and without atopic dermatitis and living in rural or urban areas. Using biostatistics and modern computational tools including Machine Learning approaches, we aim at identifying genetic and environmental factors responsible for the development of allergic diseases. Methods: The RNA sequencing data for 149 individuals were subjected to the ARMOR workflow for combined statistical analysis. The questionnaire data were pre-processed to create a dataset, in which we first identified significant differences between healthy and diseased, rural and urban. This was followed by the selection of significant features using Machine Learning with the newly developed tool FeatureSelector. Results: The statistical analysis of the RNA sequencing data revealed a more pronounced effect of the environment on the expression pattern than the difference between healthy and diseased. Significant differences in the living and health conditions between rural and urban areas as evaluated based on the questionnaire data were linked with the sequencing data to identify gene expression patterns reflecting these differences and hence the impact of the environment. The newly developed Machine Learning tool FeatureSelector is now adopted and validated to identify significant features for the differences between healthy and diseased in the combined dataset. Conclusion: MLM-SOS-ALL is the first Life Science project in the newly established DAViS Center. In close collaboration between Computational and Life Science we managed to process a large sequencing dataset and to apply modern computational approaches in the analysis of a complex and diverse biomedical dataset to identify significant features of atopic dermatitis. With this we demonstrate the capacity of DAViS to enable new computational approaches.
  2. Center for Data Analytics, Visualization and Simulation (DAViS), Chur, Switzerland
  3. Swiss Institute of Allergy and Asthma Research (SIAF), University of Zurich, Swiss Institute of Bioinformatics, Davos, Switzerland

Aim: The SOS-ALL Consortium acquired a large dataset of RNA sequencing and extensive questionnaire data from South African children with and without atopic dermatitis and living in rural or urban areas. Using biostatistics and modern computational tools including Machine Learning approaches, we aim at identifying genetic and environmental factors responsible for the development of allergic diseases.

Methods: The RNA sequencing data for 149 individuals were subjected to the ARMOR workflow for combined statistical analysis. The questionnaire data were pre-processed to create a dataset, in which we first identified significant differences between healthy and diseased, rural and urban. This was followed by the selection of significant features using Machine Learning with the newly developed tool FeatureSelector.

Results: The statistical analysis of the RNA sequencing data revealed a more pronounced effect of the environment on the expression pattern than the difference between healthy and diseased. Significant differences in the living and health conditions between rural and urban areas as evaluated based on the questionnaire data were linked with the sequencing data to identify gene expression patterns reflecting these differences and hence the impact of the environment. The newly developed Machine Learning tool FeatureSelector is now adopted and validated to identify significant features for the differences between healthy and diseased in the combined dataset.

Conclusion: MLM-SOS-ALL is the first Life Science project in the newly established DAViS Center. In close collaboration between Computational and Life Science we managed to process a large sequencing dataset and to apply modern computational approaches in the analysis of a complex and diverse biomedical dataset to identify significant features of atopic dermatitis. With this we demonstrate the capacity of DAViS to enable new computational approaches.