PhD Confirmation of Candidature Seminar – Matthew Sutton

Variable Selection for Structured Large Datasets

Student: Matthew Sutton

When: Friday, 17 March 2017  11:00 AM-12:00 PM

Where: GP-S Block, Level 3, Room 307

Supervisors:

  • Prof Kerrie Mengersen (Principal)
  • Dr Benoit Liquet

Panel Members:

  • Kerrie Mengersen (Chair)
  • Benoit Liquet
  • Ian Turner
  • Dale Nyholt

Abstract:

Over the last century, new technologies have brought about the study of large datasets in multiple disciplines, both research-based and in industry. The emergence of these large datasets has resulted in a fundamental change to traditional statistics. In traditional statistics, domain experts worked for years to provide small clean datasets for statisticians to analyse. The techniques developed for in this setting struggle to perform well for large datasets; where a large number of variables (features) may be irrelevant or there are a large number of outlying observations. In this research we consider large datasets that are common in biomedical research, typically these datasets contain observations of: genes (genomics), mRNA (transcriptomics), proteins (proteomics) and metabolites (metabolomics) and collectively are known as `Omics’ datasets (Schneider and Orchard, 2011).

Variable selection plays a pivotal role in contemporary statistics and scientific discoveries for Omics datasets. These methods assume that an unknown subset of the predictors exhibit the strongest effects for the underlying system. Identifying this set of variables improves interpretation, reduces computational issues and leads to more stable inferences. While many recent approaches to variable selection have been very successful, the majority of the literature focus on models with a univariate response.

This research will develop novel methods for analysing large multivariate datasets with additional structure including: underlying group structure, repeated measures, and known strong correlations. These methods will be based on variable selection techniques developed in both frequentist and Bayesian paradigms.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s