Projects

Advice for Prospective MSc or PhD Students

I am looking to work with enjoyable, hard-working, passionate and autonomous students. Have a look at my publication records to see how my research looks like and if you think you could fit in (please, remember that I work in a School of Mathematics and Statistics). Feel free to contact me to discuss any research related matter! More details that you should probably read first here.

Statistical Inference for Complex Random Vectors

New statistical methods and computing tools to analyse complex-valued random vectors

(Neuro) Imaging Genetics

Disentangling genetics from environmental factors using medical images

Big Data and Internet of Things: OpenData for everyone!

New statistical methods and computing tools to extract information from big data sets, with a specific focus on those coming from the Internet of Things

Data Science Book

Our new online book on Data Science

Dependence Measures

Finding links in the big data era

Current People

People currently working with me on one of the three projects

Below are people with whom I have the chance to work currently on my three research projects: Dependence Measures, (Neuro)Imaging Genetics and Internet of Things.

Faculty

minipic Benjamin Avanzi Dependence Measures
minipic Gery Geenens Dependence Measures
minipic Spiridon Penev Dependence Measures, Internet of Things
minipic Lionel Truquet Dependence Measures
minipic Benoit Gallix (Neuro) Imaging Genetics
minipic Benoit Liquet (Neuro) Imaging Genetics
minipic Pavlo Mozharovskyi (Neuro) Imaging Genetics, Internet of Things
minipic Myriam Vimond (Neuro) Imaging Genetics
minipic Wei Wen (Neuro) Imaging Genetics
minipic Fabien Navarro Internet of Things

Postdocs

minipic Chunhao Cai Dependence Measures

PhD Students

minipic Guillaume Boglioni-Beaulieu Dependence Measures

Honours

Collaborators

Past and present

Former Students

Ph.D. students

  • 2011-2016 Joseph Francois Tagne Tatsinkou, Ph.D. thesis attended on 2016/04/21, Smooth Goodness-of-fit tests in Time Series Models (joint supervision with P. Duchesne), Université de Montréal. Now, Joseph Francois Tagne Tatsinkou works as Analyst, Model Development for Canada Mortgage and Housing Corporation.

  • 2010-2013 Jérémie Riou, Ph.D. thesis attended on 2013/12/11, Multiplicity of tests, and computation of sample size in clinical research (joint supervision with B. Liquet and S. Marque), Université de Bordeaux Segalen. Now Lecturer at University of Angers, France.

  • 2008-2011 Bastien Marchina, Ph.D. thesis attended on 2012/12/12, Goodness-of-fit tests based on characteristic functions (joint supervision with Gilles Ducharme, MSER Grant), Université Montpellier II. Now secondary school teacher in mathematics.

M.Sc. students

  • 2015-2017 Guillaume Boglioni Beaulieu, Maitrise au DMS, UdeM, , Montréal.

  • 2013-2014 Iban Harlouchet, Master at DMS, UdeM, Optimisation d’algorithme d’analyse d’empreintes olfactives (In French), Montréal.

  • 2012-2013 Viet Anh Tran, Master at DMS, UdeM, Le package PoweR : un outil de recherche reproductible pour faciliter les calculs de puissance de certains tests d’hypothèses au moyen de simulations de Monte Carlo (In French), Montréal.

  • 2012-2013 Marc-olivier Billette, Master at DMS, UdeM, Analyse en composantes indépendantes avec une matrice de mélange éparse (In French), Montréal.

  • 2010-2012 Philippe Delorme, Master at DMS, UdeM, Approximations to the determination of the sample size for testing multiple hypotheses when r among m hypotheses must be significant (In French), Montréal.

  • 2008-2008 Bastien Marchina, Master 2 ICA, On the effect of parameter estimation in limiting $\chi^2$ $U$- and $V$-statistics involving complex-valued components, attended on 2008/11/09 in Grenoble.

Undergraduate Students

See my resume.

Recent Publications

Comment about order of authors:

  • Statistics and Probability journals = strict alphabetical order of authors
  • Neuroscience or other journals = by importance of contribution, or following the rules in the field

Go to my Technical reports and to my Ph.d. Thesis.


Technical reports

[1] Lafaye de Micheaux P. Test de normalité pour les résidus d'un modèle ARMA. Master's thesis (DEA), Université Montpellier II et École Nationale Supérieure Agronomique de Montpellier, 127 pages, June 1998.
[2] Ducharme G. and Lafaye de Micheaux P. Goodness-of-fit tests of normality for the innovations in ARMA models. Technical report number 02-02, Groupe de biostatistique et d'analyse des systèmes, Université Montpellier II, 34 pages, February 2002.
[3] Lafaye de Micheaux P. Méthodes statistiques multivariées en IRMF. Mémoire de Master 2 Recherche, Institut National Polytechnique de Grenoble, 92 pages, June 2007. (In french).
[4] Coeurjolly J.-F., Drouilhet, R., Lafaye de Micheaux P. et Robineau, J.-F. (February 2009). asympTest: an R package for performing parametric statistical tests and confidence intervals based on the central limit theorem. Technical report hal-00358375. Laboratoire Jean Kuntzmann. Université de Grenoble, 18 pages.
[5] Bordier C., Dojat, M. and Lafaye de Micheaux P. (December 2010). Temporal and Spatial Independent Component Analysis for fMRI data sets embedded in a R package, arXiv 1012.0269v1, 23 pages.
[6] Tabelow K., Clayden J.D., Lafaye de Micheaux P., Polzehl J., Schmid V.J., Whitcher B. (December 2010). Image Analysis and Statistical Inference in Neuroimaging with R, Technical report 1578, Weierstrass Institute for Applied Analysis and Stochastics, 9 pages.

[1] Test de normalité pour les résidus d'un modèle ARMA, June 1998.

DEA thesis.

Download:
Postscript format, PDF format.

Abstract:
The results contained in this document cast a new light on the important problem of testing the residuals of an ARMA model. Indeed, the validation stage when fitting a model to data, is the determinant step in selecting the best model. The Box and Jenkin's three stages method ends with the validation step which requires the portmanteau test. An advantage of such portmanteau tests is that they pool information from the correlations at different lags. However, a real disadvantage is that they frequently fail to reject poorly fitting models. In practice, they are more useful in disqualifying unsatisfactory models than for selecting the best-fitting model among closely competing candidates.
We need to stress that if it can be assumed that the white noise process of an ARMA process is Gaussian, then stronger conclusions can be drawn from the fitted model. For example, not is it only possible to specify an estimated mean squarred error for predicted values, but asymptotic prediction confidence bounds can also be computed. This being so, we propose to replace the portmanteau test with the one we developp here which is targeted at testing normality.

Localisation:
Bibliothèque de l'Université Montpellier II.
Author : Lafaye de Micheaux, Pierre
Title : Test de normalité pour les résidus d'un modèle ARMA / Pierre Lafaye de Micheaux
Editor : Montpellier : Université Montpellier II Sciences et Techniques du Languedoc, [1998]
Collation : 127 f. ; 30 cm
Note : thèse Mém. D.E.A. : Biostatistique-Option Agron.-Santé : Montpellier 2 : 1998
Subject : Biométrie -- thèses
Link: Link to bibliographical entry

[2] Goodness-of-fit tests of normality for the innovations in ARMA models, Février 2002.

Technical report number 02-02.
Groupe de Biostatistique et d'Analyse des systèmes.
Université Montpellier II

Download:
Postscript format, PDF format.

Abstract:
In this paper, we propose a goodness-of-fit test of normality for the innovations of an ARMA(p,q) model with known mean or trend. The test is based on the data-driven smooth test approach and is simple to perform. An extensive simulation study is conducted to see if, for moderate sample sizes, the test holds its level throughout the parameter space. The power of the procedure is also explored by simulation. It is found that our test is generally more powerful than existing tests while holding its level throughout most of the parameter space and thus, can be recommended. This meshes with theoretical results showing the superiority of the data-driven smooth test approach in related contexts.

[3] Méthodes statistiques multivariées en IRMF, June 2007. (In french).

Download:
Postscript format, PDF format.

Abstract:
The functional magnetic resonance imaging (fMRI) is a new neuroimaging technique that can localize neuronal activiy with a great spatial precision and with a good temporal precision. To detect activated areas in the brain, this method uses local blood oxygenation variations which are reflected by small variations in a certain kind of images obtained by magnetic resonance. The ability to obtain a functional map of the brain non invasively gives new opportunities to disentangle mysteries of the human brain. In this thesis, we describe some non parametric methods of multivariate analysis of fMRI data: Principal component analysis, Independant component analysis and Projection pursuit. We also try to explain the links between these methods, the different views one can have on them and we deal with underlying spatial and temporal aspects. We also provide a computer tool that can simulate, in a very simplified way, fMRI brain signals. This tool enable one to artificially generate fMRI data for which we control many parameters. It will serve as a basis to compare quantitatively the statistical methods presented. We also apply these various statistical methods on a real data set obtained from a human visual fMRI experiment. At least, we propose various avenues of research that could be explored to pursue this preliminary work.

[4] asympTest: an R package for performing parametric statistical tests and confidence intervals based on the central limit theorem, February 2009.

Download:
Postscript format, PDF format.

Abstract:
This paper describes an R package implementing large sample tests and confidence intervals (based on the central limit theorem) for various parameters. The one and two sample mean and variance contexts are considered. The statistics for all the tests are expressed in the same form, which facilitates their presentation. In the variance parameter cases, the asymptotic robustness of the classical tests depends on the departure of the data distribution from normality measured in terms of the kurtosis of the distribution.

[5] Temporal and Spatial Independent Component Analysis for fMRI data sets embedded in a R package, December 2010.

Download:
Postscript format, PDF format.

Abstract:
For statistical analysis of functional Magnetic Resonance Imaging (fMRI) data sets, we propose a data-driven approach based on Independent Component Analysis (ICA) im- plemented in a new version of the AnalyzeFMRI R package. For fMRI data sets, spatial dimension being much greater than temporal dimension, spatial ICA is the tractable ap- proach generally proposed. However, for some neuroscientific applications, temporal inde- pendence of source signals can be assumed and temporal ICA becomes then an attracting exploratory technique. In this work, we use a classical linear algebra result ensuring the tractability of temporal ICA. We report several experiments on synthetic data and real MRI data sets that demonstrate the potential interest of our R package.

[6] Image Analysis and Statistical Inference in Neuroimaging with R, December 2010.

Download:
Postscript format, PDF format.

Abstract:
R is a language and environment for statistical computing and graphics. It can be considered an alternative implementation of the S language developed in the 1970s and 1980s for data analysis and graphics (Becker and Chambers, 1984; Becker et al., 1988). The R language is part of the GNU project and offers versions that compile and run on almost every major operating system currently available. We highlight several R packages built specifically for the analysis of neuroimaging data in the context of functional MRI, diffusion tensor imaging, and dynamic contrast-enhanced MRI. We review their methodology and give an overview of their capabilities for neuroimaging. In addition we summarize some of the current activities in the area of neuroimaging software development in R.

Ph.D thesis

THÈSE DE DOCTORAT (82 pages) Transparents de la soutenance 193k

Auteur: Lafaye de Micheaux Pierre.
Titre: Tests d’indépendance en analyse multivariée et tests de normalité dans les modèles ARMA.
Lieu: Thèse de doctorat réalisée en cotutelle. Université Montpellier II et Université de Montréal.
Date de soutenance: 16 décembre 2002 à l’Université de Montréal.
Résumé:
On construit un test d’ajustement de la normalité pour les innovations d’un modèle ARMA(p,q) de tendance et moyenne connues, basé sur l’approche du test lisse dépendant des données et simple à appliquer. Une vaste étude de simulation est menée pour étudier ce test pour des tailles échantillonnales modérées. Notre approche est en général plus puissante que les tests existants. Le niveau est tenu sur la majeure partie de l’espace paramétrique. Cela est en accord avec les résultats théoriques montrant la supériorité de l’approche du test lisse dépendant des données dans des contextes similaires. Un test d’indépendance (ou d’indépendance sérielle) semi-paramétrique entre des sous-vecteurs de loi normale est proposé, mais sans supposer la normalité jointe de ces marginales. La statistique de test est une fonctionnelle de type Cramér-von Mises d’un processus défini à partir de la fonction caractéristique empirique. Ce processus est défini de façon similaire à celui de Ghoudi et al. (2001) construit à partir de la fonction de répartition empirique et utilisé pour tester l’indépendance entre des marginales univariées. La statistique de test peut être représentée comme une V-statistique. Il est convergent pour détecter toute forme de dépendance. La convergence faible du processus est établie. La distribution asymptotique des fonctionnelles de Cramér-von Mises est approchée par la méthode de Cornish-Fisher au moyen d’une formule de récurrence pour les cumulants et par le calcul numérique des valeurs propres dans la formule d’inversion. La statistique de test est comparée avec celle de Wilks pour l’hypothèse paramétrique d’indépendance dans le modèle MANOVA à un facteur avec effets aléatoires.

Télécharger:
[PDF A4] 574k, [PDF letter] 574k, [PS A4] 914k, [PS letter] 914k.

[PDF écran] 2.5M (avec hyperliens et programmes Fortran et C++).
Cette version PDF est très intéressante. Elle contient tous les programmes Fortran et C++ des simulations (presque 15000 lignes). Il y a aussi un programme Javascript inclus dans le premier article qui permet d’effectuer le test “en direct” (voir page 49, Javascript Application). Elle contient aussi des hyperliens facilitant la lecture. A visionner en mode plein-écran.

[MathML] 4.9M (Nécessite Mozilla ou Netscape 7.0. Temps de chargement assez long, patientez …).
[MathML.gz] 318k (Les fichiers MathML contiennent les programmes Fortran et C++).
MathML est une nouvelle technologie révolutionnaire qui permet de rechercher des expressions mathématiques dans le texte (et les copier-coller dans d’autres applications comme Mathematica). Les formules ne sont plus sous la forme d’images.

Software

minipic2

A.dep.tests dependogram
  • AnalyzeFMRI (initiated by Marchini, J.). Functions for I/O FMRI data in various formats and for treating brain images with ICA methods. Visualization. cran.r-project.org/package=AnalyzeFMRI

minipic2

minipic2

More Details

minipic2


Some Associated Publications or Talks

Books

bookpic bookpic bookpic bookpic bookpic
  • The website for my books on the R software above (with B. Liquet and R. Drouilhet). Versions in French, English, Chinese and Indonesian.

  • The website for the draft of my current book on Data Science.

  • I am also in the process of writing a book entitled Statistical Inference for Complex Random Vectors (with G. Ducharme).

Recent & Upcoming Talks

9th International Conference of the ERCIM (European Research Consortium for Informatics and Mathematics) Working Group on Computational and Methodological Statistics (CMStatistics 2016)

Journées de STAtistique de Rennes (jSTAR), 13rd edition on Big Data

Australian Statistical Conference in conjunction with the Institute of Mathematical Statistics Annual Meeting

Grants

  • 2014-2019 NSERC Discovery ($70,000)
    • Lafaye de Michaux P, “Multivariate Methods for the Treatment of High Dimensional Complex Neuroimaging Genetics Data”
  • 2013 Mitacs Accelerate Research Internships Program ($30,000)
    • Lafaye de Micheaux P, Laliberté G, Harlouchet I, _“Optimization of Olfactory Fingerprint Analysis Algorithm”
  • 2013 NSERC Research Tools and Instruments ($28,143)
    • Lafaye de Micheaux P (Lead CI), Granville A, Polterovich I, Patera J, Owens R, Lessard S, Murua A, Perron F, “Computational Resources for Research in Mathematics and Statistics”
  • 2010-2015 NSERC Discovery ($60,000)
    • Lafaye de Micheaux P, “Goodness-of-fit Testing and Independent Component Analysis with Applications to Cognitive Neuroscience”
  • Grenoble Institute of Technology (Bonus Qualité Recherche) grant (€122,880)
    • Achard S, Coeurjolly J-F, Lafaye de Micheaux P, Rivet B, Sato M, “MoDyC project (Modelisation of Dynamical Brain Activity)”.

Research Opportunities

RESEARCH OPPORTUNITIES FOR POSTGRADUATE STUDENTS

If you are interested in doing research with me and/or other members of our research projects and if you have the necessary qualifications and background (see below), feel free to contact me via email attaching the following documents in PDF format (I will disregard documents in any other format):

  1. Your CV;
  2. A copy of your academic transcripts;
  3. A brief statement of what area / problems you would like to work on, and what attracts you to this group.

You must have a strong background in statistics/mathematics and good programming skills (preferably in R and C/C++, with experience in Linux).

I will support your application as long as:

  • (a) you are indeed an outstanding student
  • (b) your project proposal is aligned with my research interests.

If you are interested, please first:

Then read:

SCHOLARSHIP LINKS

Scholarships are available for talented and enthusiastic students looking to study for a Ph.D. or a Master’s degree. Nevertheless, they are highly competitive: to be in the running, you need to demonstrate that you are among the top students in your cohort. If you wish to be considered for a scholarship, you will need to apply separately. (Unless advertised, our group do not offer scholarships.)

POSTDOC AT UNSW

If you are looking for a postdoc position, apart from directly advertised positions, some general options include:

HONOURS AT UNSW

If you are an undergraduate student interested in doing Honours in Statistics, we are looking for students with a strong background in statistics and good computational skills. If that’s you, then get in touch.

My Teaching

Current

I am a teaching instructor for the following courses at UNSW Sydney:

2017 MATH1041: Statistics for Life and Social Science (61h)
MATH5806: Applied Regression Analysis

Recent years

2017 R/Shiny : Introduction to RShiny, at French National School of Statistics and Analysis of Information (ENSAI) (9h)
2016 MBDINF14 : Programming with Big Data in R using Distributed Memory, at ENSAI (45h)
2015 MBDSTA02 : Statistical Inference and Hypothesis testing, at ENSAI (18h)
STT2700 : Mathematical Statistics and Data Analysis, at Université de Montréal (39h)
STT6300 : Large Sample Techniques, at Université de Montréal (37h)
2013 STT2400 : Linear regression, at Université de Montréal (36h)
STT6415 : Regression Analysis, at Université de Montréal (42h)
2012 STT1700 : Introductory Statistics, at Université de Montréal (36h)

Industry

Consulting work for the private sector

minipic3

Industrial Experience

  • 2016 Statistical consultant for BNP Paribas, which is one of the largest banks in the world, with a presence in 75 countries. I applied Deep Learning algorithms.

  • 2014 University supervisor for a student working (under a Mitacs internship) at Odotech, which is an environmental company specialized in real-time monitoring of gas pollutants. Optimisation of an algorithm that analyses olfactory fingerprints.

  • 2011 Statistical consultant for Olea Medical. Identification of predictors for stroke.

  • 2011 Statistical consultant for Danone Research France. Clinical trials.

  • 2008 Co-founder member of the statistical consulting service of the SAGAG team. Operations have now ceased.

  • 2008 Shareholder and co-founder member (with J.-F. Robineau and others) of the start-up CQLS whose aim is to analyze biotechnology data. Operations have now ceased.

  • 2007 Statistical consultant for the company Minvasys, which conceives drug eluting stents. Computation of necessary sample size for clinical trials.

  • 2006–2007 Statistical consultant for the company BioArtificial Gel Technologies. This company, which was a dermo pharmaceutical canadian private company based in Montreal, developped and marketed systems for the progressive release of active agents based on a hydrogel technological springboard. Operations have now ceased.

Recent Posts

The conjecture is the following. If you can find a proof of it, please contact me!

Let $X_1, X_2, \ldots, X_n$ be i.i.d. continuous random variables with $E[X_i]=0,$ Var$[X_i]=1$ and $E[\log(|X_1|)] < \infty$. Then

$$n^{-1}\sum_{i=1}^n\log|X_i-\bar{X}_n|1_{\{X_i\ne\bar{X}_n;i=1,\ldots,n\}}\stackrel{L}{\longrightarrow}E[\log|X_1|].$$

CONTINUE READING

Here are some nice references on Deep Learning:

And the most used libraries for Deep Learning:

Cheat sheets:

CONTINUE READING

There are two great R packages:

  1. bookdown to write books using Markdown;
  2. blogdown to create blogs/websites using Hugo/Markdown.

Content can be written using Markdown, LaTeX math, and Hugo Shortcodes. Additionally, HTML may be used for advanced formatting. This article gives an overview of the most common formatting options.

CONTINUE READING

Media

Our research in the spotlight

Nothing yet!

Contact

Current Position: Senior Lecturer

Affiliation: School of Mathematics and Statistics, UNSW Sydney

Meet in person: Office 2050, The Red Centre, Centre Wing, Kensington Google Map