Abstract
Measuring the state of a biological sample can be done in several ways, depending on the research question gene expression microarrays can quantify the expression levels for thousands of genes simultaneously while various types of mass spectrometry can measure the abundance of chemical compounds in the sample. Typically one experiment would involve only one type of data which would then by analyzed accordingly. As it is well known that the biological systems investigated do not operate independently but affect each other, this approach is not providing a comprehensive picture of interacting processes.
This objective of this thesis is to provide modeling techniques that allow the combined analysis of microarray and LC–MS data in order to discover associations between genes and metabolic compounds, as well as tools that aid in making conclusions and comparisons of such findings. The modeling framework presented in Manuscript I preserves the attributes of the compounds found in LC–MS samples while identifying genes highly associated with these.
The main obstacles that must be overcome with this approach are dimension reduction and variable selection, here done with PARAFAC and LASSO respectively.
One important drawback of the LASSO has been the lack of inference, the variables selected could potentially just be the most important from a set of non–important variables. Manuscript II addresses this problem with a permutation based significance test for the variables chosen by the LASSO.
Once a set of relevant variables has been selected they should be compared to previous findings, this becomes a problem of comparing ranked lists. A novel solution for these comparisons is presented in Manuscript III using the standard deviations of the ranks as a measure of agreement. This provides advantages over existing methods, particularly it scales to many lists and it provides an intuitive interpretation of the measure.
This objective of this thesis is to provide modeling techniques that allow the combined analysis of microarray and LC–MS data in order to discover associations between genes and metabolic compounds, as well as tools that aid in making conclusions and comparisons of such findings. The modeling framework presented in Manuscript I preserves the attributes of the compounds found in LC–MS samples while identifying genes highly associated with these.
The main obstacles that must be overcome with this approach are dimension reduction and variable selection, here done with PARAFAC and LASSO respectively.
One important drawback of the LASSO has been the lack of inference, the variables selected could potentially just be the most important from a set of non–important variables. Manuscript II addresses this problem with a permutation based significance test for the variables chosen by the LASSO.
Once a set of relevant variables has been selected they should be compared to previous findings, this becomes a problem of comparing ranked lists. A novel solution for these comparisons is presented in Manuscript III using the standard deviations of the ranks as a measure of agreement. This provides advantages over existing methods, particularly it scales to many lists and it provides an intuitive interpretation of the measure.
Originalsprog | Engelsk |
---|
Forlag | Department of Mathematical Sciences, Faculty of Science, University of Copenhagen |
---|---|
Antal sider | 80 |
Status | Udgivet - 2014 |