AMSI UNSW CrestUNSW Text



Australian Mathematical Sciences Institute Symposium



Statistical Learning




Abstracts

 

Back

Microarray and proteomic-based diagnosis of cancer

by David Bowtell
University of Melbourne


Adam Kowalczyk [1], Andrew Holloway [1], Richard Tothill [1], Ryan van Laar [1], Alex Boussioutas, Paul Waring, Danny Rischin, Anna de Fazio [2], Ken Mitchelhill and David Bowtell [1]
1. Research Division, Peter MacCallum Cancer Centre, Locked Bag 1 A'Beckett St, Melbourne 8006, VIC. Australia. d.bowtell@pmci.unimelb.edu.au
2. Westmead Hospital, Sydney Australia.

The presentation will consist of three parts sampling different aspect of PMCC activities. The first part involves processing of genomic (microarrays) data, the second the proteomic mass spectroscopy data and the third focuses on some advanced issues of predictive modeling.

We have applied microarray analysis to the clinical problem of classification of carcinoma of unknown primary (CUP). CUP affects approximately 5% of cancer patients and occurs when the site of origin of the cancer remains undiagnosed even after extensive clinical work-up. While representing a small fraction of all patients, CUP primary is a significant cause of cancer death, with median survival seven months, in part because therapy cannot be directed to a specific cancer. We have constructed a training set of greater than 170 well-characterized primary tumours hybridized to 10.5K spotted cDNA microarrays. We verified that metastatic tumours maintain gene expression patterns that are consistent with tissue of origin. We have applied different machine learning approaches to classification of samples of CUP and were able to make a clinically plausible prediction of the origin of the metastatic tumour in numerous cases. We have also identified a subset of ovarian cancer cases whose expression profiles are indicative of a non-ovarian site of origin. Our findings indicate that expression profiling allows the rapid identification of unrecognised metastases and may be of use in the clinical management of the disease.

We are also developing proteomic-based approaches to early detection of cancer. Briefly, our approach involves direct LC-electrospray mass spectrometry of low molecular weight components of serum. These experiments generate data sets of millions of detectable features. Our initial results demonstrate that advanced machine learning methods are capable of detecting patterns in the data that discriminate patients with and without gastric cancer.

We conclude our presentation with an overview of our recent experimental and theoretical findings showings that biological data can provide significant challenges for experimental and theoretical machine learning.

 

Algorithms for Supervised Learning with Generalised Sparse Grids

by Markus Hegland
Australian National University


Supervised learning methods using radial basis function and support vector machines can lead to very large systems of equations. We discuss approximations of these models using generalised sparse grids and how they can deal with some of the computational challenges. Sparse grids have earlier been used successively in integration and the solution of PDEs and have recently been applied to supervised learning.

The computational efficiency of sparse grids is based on the combination technique which allows the decomposition into small subproblems which can be solved in parallel. In my talk I will discuss sparse grid approximation and sparse grid algorithms for supervised learning including some current work with G. Golub, M. Gutknecht and S. Roberts on iterative methods.


References: (for other references see http://datamining.anu.edu.au)

Additive Sparse Grid Fitting, M. Hegland, Curve and Surface
Fitting: Saint-Malo 2002, pp. 209-219,Nashboro Press, Brentwood, 2003.

Adaptive sparse grids, M. Hegland, Proceedings of 10th Computational Techniques and Applications Conference CTAC-2001, ANZIAM Journal, vol. 44, C335--C353, Available online.

 

Estimation and Variable Selection in Nonparametric Heteroscedastic Regression

by Robert Kohn
University of New South Wales


The talk considers a Gaussian model with the mean and the variance modeled flexibly as functions of the independent variables. The estimation is carried out using a Bayesian approach that allows the identification of significant variables in the variance function, as well as averaging over all possible models in both the mean and the variance functions. The computation is carried out by a simulation method that is carefully constructed to ensure that it converges quickly and produces iterates from the posterior distribution that have low correlation. Real and simulated examples demonstrate that the proposed method works well. The method in this paper is important because (a) it produces more realistic prediction intervals than nonparametric regression estimators that assume a constant variance; (b) variable selection identifies the variables in the variance function that are important; (c) variable selection and model averaging produce more efficient prediction intervals than those obtained by regular nonparametric regression.

 

Decision trees and their hybrids: business applications. Case study.

by Inna Kolyshkina
PricewaterhouseCoopers


Interest in data mining techniques has been increasing recently amongst the actuaries and statisticians involved in the analysis of insurance data sets which typically have a large number of both cases and variables. Data mining is a modelling methodology based on modern, computer-intensive methods that are statistically reliable and very fast. This methodology can easily handle large amounts of data. It also allows quick selection of predictors out of hundreds available in the data by scanning every variable for predictive potential which guarantees that no important predictive relationship has been left out of the model. A case study is presented showing the application of data mining to a business problem that required modeling risk in health insurance, based on a project recently performed for a large Australian health insurance company by PWC Actuarial. The data mining methods discussed in the case study include: decision trees, multivariate adaptive regression splines and hybrid models that combined decision trees with logistic regression. The non-statistical issues of implementation and client feedback are also discussed.

 

Distance Weighted Discrimination & Geometrical Representation of High Dimension - Low Sample Size data

by Steve Marron
University of North Carolina


The Support Vector Machine is a discrimination method that was developed in the machine learning community. Statistical ideas are used to improve it in the important context of High Dimension - Low Sample Size data, resulting in a new method called Distance Weighted Discrimination.  The ideas are illustrated with some examples from micro-array analysis.  Some unexpected behavior is explained using a non-standard asymptotic analysis as the dimension tends to infinity.

 

BART: Bayesian Additive Regression Trees

by Rob McCulloch
University of Chicago


We consider a Markov chain Monte Carlo (MCMC) algorithm for building an additive model with a sum of trees. The trees themselves are treed models [1], with a separate linear regression model in each terminal node. To adopt a Bayesian approach, we put prior distributions on parameters within each tree and on the sum of trees. The MCMC algorithm used in [1] to train a single tree is extended to the additive framework. The key component of this extension is a step in which a single random tree is drawn conditional on all others in the sum. The extension is straightforward yet powerful, enabling a more flexible set of models. The model and associated training algorithm have some interesting similarities to Boosting and backfitting. If the priors are set so as to heavily regularize individual trees, we see Boosting-like behaviour with a large number of weak learners, each contributing a small amount to the overall model. Since a treed regression is anything but weak, careful attention must be paid to the choice of prior parameters. If instead we relax the regularization, then a smaller number of additive trees will contribute to the model. The iterated draws of each tree conditional on others is similar to a Bayesian version of backfitting [2]. The Bayesian framework and MCMC training algorithm yield a posterior distribution, which can be used to assess uncertainty. For example, posteriors for the number of weak learners and for predictions are easily available.

References
[1] Chipman, H., George, E. and McCulloch, R. (2002) Bayesian Treed Models, Machine Learning, 48, 299-320.
[2] Hastie, T. , and Tibshirani, R. (2000), Bayesian Backtting (with comments and a rejoinder by the authors), Statistical Science, 15 (3) , 196-223

 

Model Selection in Complex Classes of Models

by Brian Ripley
Oxford University


Model selection for even linear regression remains somewhat controversial, and the difficulties explode for e.g. classes of neural networks or multilevel models. The talk will be a tutorial on model selections, aiming to make precise the exact aims and assumptions of the competing techniques such as cross-validation, model-averaging, AIC/BIC/SBC/DIC ....

 

Learning from Brain Images

by Brian Ripley
Oxford University


Modern experimental techniques, especially MRI, can collect hundreds of megabytes of data per experimental 'point'. The challenge is to identify the small amount of signal amongst a lot of noise, to determine what is statistically significant and to take spatial structure into account. Jonathan Marchini and I have developed some successful examples in collaboration with Oxford's MRI centres.

 

From Margin-Based Classification to psi-Learning

by Xiatong Shen
Ohio State University


The concept of large margins plays an important role in analyzing learning methodologies such as Boosting, Neural Networks, and Support Vector Machine (SVM). In this talk, I will present a classification technique called psi-learning as well as the associated computational tools. While retaining the interpretation of large margins, psi-learning delivers high performance in generalization, especially in nonseparable cases, as it is derived from a direct consideration of generalization errors. Nonconvex minimization involved in psi-learning is solved via global optimization techniques based on d.c (differenced convex) programming. Finally, psi-learning will be illustrated via simulated and benchmark examples.

 

Gaussian Processes for Estimation and Clustering

by Alex Smola
Australian National University


Gaussian Processes, and their counterpart, Support Vector Machines, are frequently used for estimation of conditional probabilities from data. In this talk I will give an overview over classification and regression algorithms involving kernels, point out the connections between Gaussian Processes and Support Vector Machines, and finally I will show how they can be used to derive clustering algorithms in a natural fashion.

 

Learning in Vision Projects at UNSW: Overview

by Arcot Sowmya
University of New South Wales


This talk will give an overview of learning in vision research undertaken by the speaker's group at UNSW in the past few years, and will cover both symbolic and statistical learning techniques applied to image data. The talk will discuss novel techniques for learning of relational object models for object recognition, inductive clustering for image segmentation, and multi-view learning for classification in medical images.

 

Local Splitting Criteria for Classification and Regression Trees.

by Ross Taplin
Murdoch University


This talk will report on recent results suggesting that classification and regression trees grown using a local splitting criterion often outperform the corresponding trees grown using a global splitting criterion. While a global splitting criterion optimises the placement of a split under the assumption that no further splits will be made, a local splitting criterion assumes that only local observations will eventually end up in the same leaf of the tree. It therefore places early splits in positions that will allow later splits to successfully model patterns in the data such as interactions. Both the ability of the local splitting criterion to produce single trees that capture some forms of structure in data better and the benefits to model averaging of multiple trees will be considered. This is joint work with Alexandra Bremner, Murdoch University.

 

Statistical Learning for Statisticians

by Matt Wand
University of New South Wales


As a prelude to the symposium, this short talk will look at some of the issues in Statistical Learning from a statistician's viewpoint. Some examples will be given, and connections with methodology familiar to statisticians (e.g. logistic regression) will be made.