Tests d’indépendance en analyse multivariée et tests de normalité dans les modèles ARMA

par
Pierre LAFAYE DE MICHEAUX
Thèse de doctorat effectuée en cotutelle
au

Département de mathématiques et de statistique
Faculté des arts et des sciences

Université de Montréal
ET

Département des sciences mathématiques
Formation Doctorale Biostatistique
École Doctorale Information, Structures, Systèmes

Université Montpellier II
Sciences et Techniques du Languedoc

Thèse présentée à la Faculté des études supérieures de l’Université de Montréal
en vue de l’obtention du grade de Philosophiæ Doctor (Ph.D.) en statistique
et à
l’Université Montpellier II en vue de l’obtention du grade de Docteur
d’Université en Mathématiques appliquées et applications des mathématiques

décembre 2002
© Pierre LAFAYE DE MICHEAUX, 2002
www.theses.umontreal.ca
Université de Montréal
Faculté des études supérieures
et
Université Montpellier II
Laboratoire de Probabilité et Statistiques
Cette thèse de doctorat effectuée en cotutelle et intitulée

Tests d’indépendance en analyse multivariée et tests de normalité dans les modèles ARMA

a été présentée et soutenue publiquement à l’Université de Montréal par
Pierre LAFAYE DE MICHEAUX
Elle a été évaluée par un jury composé des personnes suivantes :
Thèse acceptée le:

16 décembre 2002

_________________________________


PIC
L’art de la connaissance et l’exercice des vertus,
éternelle noblesse du chercheur.

À ma femme Dominique Delseny et notre fils Luka.
À mon meilleur ami Fabien Baudrier.
À mes parents.

_______________________________________________________________________________

Remerciements

Les premières personnes que je tiens à honorer ici sont mes deux directeurs de recherche Martin Bilodeau et Gilles Ducharme.

Je considère Gilles Ducharme comme mon père spirituel en statistique ; il m’a appris l’essentiel de ce que j’en sais. Depuis son arrivée à Montpellier, période à laquelle j’ai entamé mes études dans ce domaine, j’ai commencé à apprécier son approche innovante de la statistique au travers de ses enseignements beaucoup plus intuitifs que lourdement formels. Il m’a ensuite initié à la recherche lors de mon année passée en DEA. Après avoir goûté à sa grande expérience, à l’originalité de ses sujets de recherche et à la pertinence de son jugement pour en maîtriser les difficultés techniques menant à leur résolution, j’ai replongé avec plaisir pour quelques années supplémentaires ! Il m’a enseigné un métier passionnant. Je lui suis aussi reconnaissant pour son soutien moral dans certaines périodes difficiles, pour les contributions financières et matérielles apportées et pour la chance qu’il m’a donnée de pouvoir travailler avec un chercheur de grande valeur à Montréal.

Martin Bilodeau a pris le relais avec beaucoup d’attention et de professionnalisme. J’ai apprécié sa constante fiabilité, sa précieuse disponibilité et sa discrétion. Je lui suis également reconnaissant pour les connaissances scientifiques qu’il m’a transmises et pour les ressources financières dont j’ai bénéficié. A son contact d’une grande probité intellectuelle, j’ai pu apprécier et m’imprégner de son souci d’efficacité et de concision. Je veux lui témoigner ici toute mon estime.

Je tiens à exprimer ma reconnaissance aux membres du jury qui ont accepté de prendre du temps pour lire mon travail, pour leurs conseils judicieux et leurs remarques pertinentes ainsi que pour s’être déplacés pour assister à ma soutenance, ce rite d’acceptation dans une nouvelle communauté.

Merci aux chercheurs avec qui j’ai pu avoir des discussions scientifiques profitables tant à Montpellier (Benoît Cadre, Michel Cuer, Pierre Jacob, Irène Larramendy-Valverde) qu’à Montréal (Jean-Francois Angers, Anne Bourlioux, Richard Duncan, Marlene Frigon, Martin Goldstein, Michel Grundland, Anatole Joffe, Christian Léger, Jean-Marc Lina, Urs Maag, Éric Marchand, François Perron, Roch Roy), sans oublier ceux plus anonymes avec qui j’ai échangé des idées sur Internet.

Il est clair aussi que le travail de recherche effectué au cours d’un doctorat doit s’appuyer sur des bases techniques solides. Les principaux acteurs de l’excellente formation en statistique que j’ai suivie sont Martin Bilodeau, Yves Lepage et François Perron à Montréal et Denis Allard, Alain Berlinet, Gilles Caraux, Jean-Pierre Daures, Gilles Ducharme, Jean-François Durand, Ali Gannoun, Pierre Jacob, Jean-Dominique Lebreton, Pascal Monestiez, Robert Sabatier, Gilberte Vignau et Jean-Pierre Vila à Montpellier.

J’ai bénéficié pendant ces « quatre » années, dont trois consacrées à la recherche, d’un précieux soutien informatique de la part de Christopher Albert, Nicolas Beauchemin, Miguel Chagnon, Baptiste Chapuisat, Brigitte Charnomordic, Marc Fangeaud, Michel Lamoureux, Nathalie Malo, Pascal Neveu, Philippe Vismara ainsi que de Leslie Lamport, Linus Torvalds et des milliers de bénévoles oeuvrant à la conception et au développement de Latex, Linux et autres logiciels libres d’excellence.

Je veux également souligner l’appui administratif de qualité apporté par Jacques Bélair, Robert Cléroux, Véronique Hussin, Sabin Lessard, Thérèse Ouellet, Danièle Proulx, Jacqueline Reggiori-Lamoureux, Yvan Saint-Aubin, Danièle Tessier et Janet Zaki à l’Université de Montréal et par Pierrette Arnaud, Michel Averous, Yves Escoufier, Daniel Guin, Bernadette Lacan, Marie-Odile Morda, Florence Picone et Véronique Sals-Vettorel à Montpellier.

Tout ce travail de recherche aurait été certainement plus long, pénible et de moins bonne qualité, si je n’avais pas bénéficié de plusieurs financements octroyés généreusement par le Département de mathématiques et de statistique et la Faculté des Études Supérieures de l’Université de Montréal, par l’Université de Montréal, par l’Institut des Sciences Mathématiques de Montréal ainsi que par mes deux directeurs de recherche Martin Bilodeau et Gilles Ducharme. Merci au Gouvernement du Canada et à celui du Québec pour m’avoir donné ma chance.

Toute ma gratitude aux personnes rencontrées à l’Université Montpellier II et dont une bonne partie m’ont offert leur amitié : Gérard Biau pour ses précieux conseils et son amitié depuis l’année de DEA, Sandie Ferrigno pour son soutien au site du DEA, Benoît Frichot pour les intéressantes discussions et son soutien moral, Olivier Gimenez (il y aurait trop à dire ici), Mariem Mint-el-Mouvid pour les bons moments passés dans notre bureau 14 malgré nos difficultés financières, Nicolas Molinari pour ses précieux conseils et les bons moments passé à Sauve. Une pensée à Omar Anza, Élodie Brunel, André Diatta, Bénédicte Fontez, Laurent Gardes et Hassan Mouhanad. Toutes ces personnes ont contribué, à leur degré, à animer l’espace de création du bâtiment 9. Merci aussi à Alain Delcamp, Ali Gannoun et Jérome Saracco pour les bons moments passés au 3ème étage et à Stéphane Girard et Cécile Amblard pour leur accueil très chaleureux à Montréal.

Enfin merci à tous mes nouveaux amis rencontrés à Montréal : Yves Atchade (Bénin), Chafik Bouhaddioui (Maroc), Alain Desgagné (Québec), Alexandre Leblanc (Québec), Ghislain Rocheleau (Québec), Ndeye Rokhaya Gueye (Sénégal) qui ont fait une partie de leur doctorat en même temps que moi ; et aussi Christopher Albert (USA) et Carole (France), Marie-Soleil Beaudoin (Québec), Pascal Bouchard (Québec), Jean-Francois Boudreau (Québec), Pascal Croteau (Québec), Alexandre Cusson (Québec), Alina Dyachenko (Russie), Alexis Gerbeau (Québec), Mohammed Haddou (Algérie), Hassiba et Djamal Hellal (Algérie), Abdelaziz Khatouri (Maroc), Vincent Lemaire (France), Nathalie Malo (Québec), Hacène Nedjar (Algérie), Philippe Paradis (Québec), Fritz Pierre (Haiti), Alice Savu (Roumanie), Jib et Sarah. Vous avez facilité mon intégration. J’espère garder des liens solides avec la plupart d’entre vous.

_______________________________________________________________________________

Table des matières

  Remerciements
  Table des matières
  Sigles et abréviations
Chapitre 1.  Introduction
Bibliographie
Chapitre 2.  Goodness-of-fit tests of normality for the innovations in ARMA models
 1.  Introduction
 2.  Smooth test of normality in the ARMA context
 3.  Choosing the order K of the alternative
 4.  Simulation Results
   4.1.  Levels
   4.2.  Power
 5.  An example
Appendix A
Appendix B
Appendix C
Appendix D
Bibliographie
Chapitre 3.  A multivariate empirical characteristic function test of
independence with normal marginals

 1.  Introduction
 2.  Testing independence: the non-serial situation
   2.1.  The case of known parameters
   2.2.  The case of unknown parameters
   2.3.  Relation to V-statistics
   2.4.  Consistency
 3.  Testing independence: the serial situation
   3.1.  The case of known parameters
   3.2.  The case of unknown parameters
 4.  Properties of the limiting processes
 5.  One-way MANOVA model with random effects
 6.  Proofs
   6.1.  Proof of Theorem
   6.2.  Proof of Theorem
   6.3.  Proof of Theorem
   6.4.  Proof of Theorem
   6.5.  Proof of Theorem
   6.6.  Proof of Theorem
   6.7.  Proof of Theorem
Acknowledgements
Bibliographie
Chapitre 4.  Conclusion
Bibliographie
Annexe A.  Les Programmes Fortran 77 du premier article
 A.1.  Le script compile
 A.2.  Liste des différents programmes
 A.3.  Moyennes et variances empiriques des différentes lois
 A.4.  Le programme MAIN
 A.5.  Les programmes big_prog_ARMApq
 A.6.  Les programmes creerdat_ARMApq
 A.7.  Le programme calcstat.f
 A.8.  Les programmes simulARMApq.f
 A.9.  Les programmes pour la “random shock method” de Burn
 A.10.  Les programmes pour la simulation de différentes lois
 A.11.  Les programmes pour les différents polynômes de Legendre
 A.12.  Les autres petits programmes utiles
 A.13.  Le programme du calcul des quantiles du test de Brockwell et Davis
Annexe B.  Les Programmes C++ du deuxième article
 B.1.  Quantiles théoriques
 B.2.  Quantiles empiriques
 B.3.  Puissance comparée avec celle du test de Wilks
  Curriculum Vitae
  Documents spéciaux

Liste des tableaux

Liste des figures

_______________________________________________________________________________

Sigles et abréviations

L : Convergence en loi.
fd : Convergence des lois de dimension finie.
 : Convergence faible.
ARMA : Auto-Regressive Moving Average.
i.i.d. : Indépendant(e)s et identiquement distribué(e)s.
 : signe d’égalité fonctionnelle.
, : Produit scalaire.
Ω : Espace des évenements.
P : Mesure de probabilité sur l’espace des évenements.
πt1,,tk : Projection.
C C(p, ) : Espace des fonctions continues de p dans .
ρ : Métrique sur C(p, ).
m.l.e. : Maximum likelihood estimator.
T : Transposée d’un vecteur.
MANOVA : Multivariate analysis of variance.
EDF : Empirical distribution function.
CF : Cubature formula.

Chapitre 1
Introduction

Une procédure consistant à déterminer si un modèle
probabiliste particulier est approprié pour
un phénomène aléatoire donné...
Depuis le tout début de la statistique, nombre de statisticiens ont commencé leur analyse en proposant une distribution pour leurs observations et ont ensuite tenté de vérifier si leur distribution était la bonne. Ainsi, au fil des ans, un grand nombre de telles procédures sont apparues commençant alors à constituer un vaste champ d’études portant le nom de « tests d’ajustement ». Il est important de noter que, si l’on ne disposait pas des outils que sont les tests d’ajustement, il faudrait se baser sur des critères subjectifs pour valider la qualité d’un modèle. Malheureusement, comme l’a si bien souligné Fisher (1925)(5)
No eye observation of such diagrams, however experienced, is really capable of
discriminating whether or not the observations differ from the expectation
by more than we would expect from the circumstances of random sampling.
L’apport des tests d’ajustement est triple. Il permet d’obtenir une description compacte des données en leur attribuant une loi de probabilité. Ensuite, certaines techniques paramétriques puissantes sont valides uniquement sous l’hypothèse de normalité. Enfin, cela permet de mieux comprendre le mécanisme ayant généré les données, en obtenant de l’information sur les raisons ayant conduit à un rejet de l’hypothèse de travail.

Mathématiquement, le problème se présente de la façon suivante. Soit Y un élément aléatoire dont la fonction de répartition F est absolument continue par rapport à la mesure de Lebesgue. On désire tester l’hypothèse

H0 : F(x) {F0(x,θ),θ Θ} versus H1 : F(x){F0(x,θ),θ Θ},

Θ est un certain espace paramétrique. Pour ce problème, on peut distinguer deux grandes classes de tests. Les tests « omnibus » concernent les situations où l’on n’a a priori aucune indication sur la façon dont la distribution réelle F pourrait s’écarter de l’hypothèse nulle. Ils sont efficaces contre des alternatives non spécifiées et sont généralement basés sur la fonction de répartition expérimentale ou sur la fonction caractéristique expérimentale. On peut citer par exemple les tests de Kolmogorov, d’Anderson-Darling ou de Cramér-von Mises. Les tests « directionnels » quant à eux permettent de prendre en compte certaines informations sur les écarts les plus plausibles à l’hypothèse nulle. Ils sont construits de façon à détecter avec plus de puissance certains types d’orientation que pourrait prendre la distribution de Y . Au cours de la première partie de la recherche envisagée, on s’est intéressé à la seconde classe en adaptant la théorie des tests lisses, introduite par Neyman (1937)(12), au contexte particulier de données dépendantes, issues d’une loi non entièrement spécifiée.

En effet, une autre branche importante et originale de la statistique concerne l’analyse des séries chronologiques. Sa caractéristique essentielle réside en une dépendance des phénomènes étudiés vis-à-vis du temps, concept essentiel tant au niveau scientifique que philosophique. Il est de fait peu de disciplines qui ne soient confrontées à l’étude de variables évoluant dans le temps et qu’on désire décrire, expliquer, contrôler ou prévoir. Cette discipline puise ses origines dans le Moyen Âge comme en témoigne ce diagramme temporel (représentant l’inclinaison des orbites de planètes en fonction du temps) considéré comme l’un des plus anciens du monde occidental.


PIC
FIG. 1.1: 1. 10th Century time line Funkhauser (1936), pp 260-262(6)

Une avancée importante dans l’étude des séries temporelles a été de supposer que la série chronologique observée est engendrée par un processus stochastique {Y t,t }. Une condition souvent imposée sur ce processus générateur est qu’il soit stationnaire du second ordre. Un processus digne d’intérêt satisfaisant à ces conditions est le processus autorégressif à moyenne mobile (ARMA). Ce processus est très utilisé du fait de sa simplicité. Son introduction nécessite quelques définitions préalables.

Définition 1. Stationnarité faible
Un processus {Y t; t } est dit stationnaire au second ordre, ou stationnaire au sens faible, ou stationnaire d’ordre deux si les trois conditions suivantes sont satisfaites :

Nous avons aussi besoin de la notion de bruit blanc (ou white noise).

Définition 2. Bruit blanc
Un processus {εt; t } est un bruit blanc s’il satisfait aux deux conditions suivantes t  :

En s’appuyant sur ces deux définitions, on peut introduire le modèle ARMA.

Définition 3. Modèle ARMA(p,q)
On appelle processus autorégressif à moyenne mobile d’ordre (p,q) un processus stationnaire {Y t; t } vérifiant une relation du type

Y t i=1pϕ iY ti = i=1qθ iεti + εt , t

où les ϕi et les θi sont des réels et où l’erreur (εt; t ) est un bruit blanc de variance σ2.

Dans la démarche de modélisation d’une série temporelle, il est courant de suivre la procédure indiquée par Box et Jenkins (1976)(3). Cette procédure se déroule en cinq étapes : « Stationnarisation », « Désaisonnalisation », « Identification », « Validation et Tests », « Prévisions ».

L’attention dans la première partie de cette étude s’est portée sur la phase de « Validation et Tests », et plus particulièrement sur la construction d’un test lisse de normalité pour les erreurs d’un modèle ARMA(p,q) univarié complètement identifié, de moyenne connue.

Pour cela, la statistique du score de Rao est un outil classique pour la construction de tests d’hypothèses de la forme H0 : η = η0. Le test qui en découle est basé sur le principe général que le vecteur gradient du modèle non restreint (par H0), évalué en l’estimateur restreint, suit asymptotiquement une loi normale de moyenne 0, si H0 est vraie. Si l’hypothèse alternative est décrite par une certaine famille exponentielle de dimension K, le test du score résultant est aussi appelé test lisse (smooth test) ou test lisse de Neyman (1937)(12) d’ordre K. L’idée du test lisse est en fait d’emboîter la fonction de densité de l’hypothèse nulle dans une famille paramétrique plus générale. Cette famille doit être choisie pour détecter les alternatives les plus probables si l’hypothèse nulle est fausse. De cette façon, l’hypothèse nulle devient une hypothèse paramétrique où l’on cherche à tester la nullité de paramètres de la densité alternative. Cette approche fournit non seulement une statistique de test simple, mais aussi une bonne puissance pour une vaste famille d’alternatives. Considérons un échantillon ε1,,εn de variables aléatoires continues i.i.d. ayant pour fonction de répartition F et pour lequel on souhaite tester l’hypothèse H0 : F = F0. Il est possible de transformer cet échantillon en un échantillon U1 = 2F0(ε1) 1,,Un = 2F0(εn) 1 de variables aléatoires i.i.d. de densité g qui, sous H0, sont de loi uniforme sur l’intervalle [-1,1]. Utilisant cette propriété, Neyman (1937)(12) propose de considérer l’hypothèse

H0 : g(y) = 12 si y [1, 1], 0  sinon

(1)
et de choisir pour cette loi uniforme la famille alternative d’imbrication de fonctions de densité c. exp[P(y)]P(y) représente un polynôme et c est la constante de normalisation. Il appelle de telles densités des alternatives « lisses » car elles sont représentées graphiquement par des courbes lisses coupant la densite sous H0 un petit nombre de fois. Le fait de pouvoir se restreindre, par des considérations physiques ou en faisant appel à son intuition, à ce type d’alternatives permet d’obtenir un test plus sensible. Ensuite, il suggère de choisir le polynôme P(y) comme étant une combinaison linéaire des éléments d’un des systèmes de polynômes orthonormaux π0(y),π1(y), sur l’intervalle [-1,1]. Rappelons que, dans un tel système, le polynôme πi(y) est de degré i, et que pour tout i et j = 0, 1, 2,, on a
12 11π i(y)πj(y)dy = δij

δij = 0 si i = j, 1 si i = j.

Ces systèmes de polynômes orthonormaux peuvent être construits de plusieurs façons (voir Marsden, (1974), p.347(11)). Neyman (1937)(12), pour sa part, utilise le système de polynômes de Legendre. La famille d’imbrication s’écrit alors

gK(y,η) = C(η) exp i=1Kη iπi(y) si y [1, 1], 0  sinon.

(2)
η = (η1,,ηk)t K, C(η) est la constante de normalisation, K est appelé l’ordre de l’alternative et les πi(),i = 1,,K sont les K premiers polynômes de Legendre. Nous dénotons par g0() la fonction de densité sous l’hypothèse nulle, c’est-à-dire la distribution uniforme sur l’intervalle [-1,1]. Ainsi, éprouver l’hypothèse nulle (1) contre l’hypothèse voulant que la fonction de densité appartienne à la famille d’imbrication H1 : g() {gk(,η) : η K}, donnée en (2), est donc équivalent à éprouver l’hypothèse
H0 : η = 0 versus H1 : η = 0.

(3)
Il suffit alors de construire le test du score pour l’hypothèse (3). On définit le vecteur du score par
anη = 1 n i=1nLoggK(Y i,η) η1 ,, 1 n i=1nLoggK(Y i,η) ηK T.

Alors la statistique du score de Rao (1947) est

RK = nanη0TI η01a nη0

Iη = Eη LoggK(Y,η) ηi .LoggK(Y,η) ηj K×K

est la matrice d’information de Fisher de η. On peut ensuite montrer que, sous H0, RKLχK2, loi du khi-deux à K degrés de liberté.

Cette présentation du test de Neyman (1937)(12) repose sur l’hypothèse que F0 est entièrement spécifiée. Une généralisation au cas où un ou plusieurs paramètres seraient inconnus a été faite par Kopecky et Pierce (1979)(10) et Rayner et Best (1988)(14).

Remarque 1. Une justification théorique que l’on peut offrir pour le choix de la famille d’imbrication précédente repose sur la théorie des espaces de Hilbert (voir Royden, (1968), chap.10 sec.8(15)). En effet, on peut montrer que toute fonction h dans l’espace de Hilbert des fonctions de carré intégrable par rapport à g0(y) peut s’écrire

h(y) = i=0η iπi(y).

Par conséquent, la vraie distribution de Y peut s’écrire

f(y) = c.exp i=0η iπi(y) si y [1, 1], 0  sinon,

et la famille d’imbrication proposée est une approximation de la vraie densité, d’autant meilleure que K est grand.

Il est possible d’adapter ce qui précède au cas de données issues d’un processus de type ARMA, c’est l’objet du Chapitre 2.

Le Chapitre 3 est consacré à un problème connexe. On pourrait en effet supposer, dans le même ordre d’idée de la problématique précédente, que les erreurs εt du modèle ARMA sont Gaussiennes et se demander si ces erreurs sont engendrées par un processus bruit blanc ou, ce qui est équivalent dans ce cas de figure, s’il y a indépendance sérielle entre les εt. Dans l’optique d’une future généralisation des résultats du Chapitre 2 au cas d’un ARMA multivarié, il semble intéressant de construire un test d’indépendance sérielle pour des vecteurs aléatoires. De plus, afin de mieux appréhender la complexité du problème, il s’avère avantageux d’échelonner la réponse à cette question en commençant par bâtir un test d’indépendance dans le cas non sériel.

La construction de ce test non paramétrique s’appuie sur une caractérisation de l’indépendance introduite par Ghoudi et al. (2001)(7) et le résultat obtenu est une statistique de type Cramér-von Mises d’un certain processus empirique. Ghoudi et al. (2001)(? ) ont défini leur processus en utilisant la fonction de répartition empirique. Ici, le processus empirique est basé sur la fonction caractéristique empirique multivariée. Pour être plus formel, considérons le vecteur aléatoire ε = (ε(1),,ε(p)), constitué de p sous-vecteurs de dimension q et le vecteur t = (t(1),,t(p)) partitionné de la même façon. En outre, pour tout A {1,,p} de cardinalité supérieure à 1, introduisons la fonction μA définie par

μA(t) = BA(1)ABC p(tB) jABC(j)(t(j))

avec

(tB)(i) = t(i)i B, 0 i IpB.

L’objet Cp est la fonction caractéristique conjointe de ε et les C(j) sont les fonctions caractéristiques des marginales ε(j). Il est alors possible de montrer que ε(1),,ε(p) sont indépendants si et seulement si μA 0. Disposant d’un échantillon ε1,,εn, cela amène assez naturellement à définir le processus Rn,A, sous l’hypothèse de multinormalité des εi(j), par

Rn,A(t) = n BA(1)ABφ n,p(tB) iABφ(t(i))

(4)
φn,p(t) = 1 n j=1n exp(it,ε j)

est la fonction caractéristique empirique de l’échantillon et φ est la fonction caractéristique d’une loi normale Nq(0,I). La statistique de Cramér-von Mises et le test qui en découle seront construits à partir de ce processus. Introduisons maintenant quelques définitions et notations utiles pour la suite portant sur la convergence faible d’éléments aléatoires dans un espace métrique S.

Définition 4. Convergence en loi
La loi de X est par définition la mesure de probabilité P = PX1 sur (S,S) :

P(A) = P(X1(A)) = P{ω Ω; X(ω) A} = P{X A}, A S.

On dit qu’une suite d’éléments aléatoires {Xn} converge en loi vers un élément aléatoire X sur S, et on écrit XnLX, si les lois Pn des Xn convergent faiblement vers la loi P de X et on note Pn P.

Le théorème qui suit (Billingsley (1968), p. 30)(1) est très utile dans la pratique puisqu’il permet de prouver la convergence faible des mesures induites sur par différentes fonctions réelles h à partir de la convergence faible dans des espaces métriques généraux.

Théorème 1. Soit h : SS mesurable, et soit Dh l’ensemble des points de discontinuité de h.
Si Pn P et P(Dh) = 0, alors Pnh1 Ph1.

On va maintenant donner une caractérisation pratique de la convergence faible mais pour cela on a préalablement besoin d’introduire les deux définitions suivantes.

Définition 5. Compacitude relative
Soit Π une famille de mesures de probabilité sur (S,S). On dit que Π est relativement compacte si toute suite d’éléments de Π contient une sous-suite faiblement convergente.

Définition 6. Lois de dimension finie
Soit S = C(E,F) l’ensemble des fonctions continues de E dans F et S sa tribu borélienne. Soit {Pn} une famille de mesures de probabilité sur S. Les lois de dimension finie des Pn sont les mesures Pnπt1,,tk1 ,k = 1, 2,,t 1,,tk E où

πt1,,tk : C(E,F) Fk x (x(t1),,x(tk)).

On dit que les lois de dimension finie des Pn convergent faiblement vers celles de P si Pnπt1,,tk1 Pπ t1,,tk1. Lorsque Pn et P sont les lois d’éléments aléatoires Xn et X on écrit aussi Xn fdX.

Il est facile de montrer que Pn P si et seulement si les lois de dimension finie des Pn convergent faiblement vers celles de P et {Pn} est relativement compacte. Une façon simple de démontrer la compacitude relative d’une famille de mesures est d’utiliser la notion de tension de cette famille.

Définition 7. Tension d’une famille
Une famille Π de mesures de probabilité sur un espace métrique général S est dite tendue si

ε > 0, K compact tel que P(K) > 1 ε, P Π.

Une famille d’éléments aléatoires {Xn} est dite tendue si la famille des lois des Xn est tendue.

Le résultat liant les deux concepts précédents peut être résumé ainsi.

Théorème 2. Théorème de Prohorov
Si Π est tendue, elle est relativement compacte. Si S est séparable et complet et si Π est relativement compact, elle est tendue.

Dans cette optique, un théorème très utilisé, par exemple pour démontrer la convergence en loi de la statistique « classique » de Cramér-von Mises est le suivant.

Théorème 3. (Théorème 8.1, Billingsley (1968)(1))
Soient Pn, P des mesures de probabilité sur (C[0, 1],C). Si les lois de dimension finie des Pn convergent faiblement vers celles de P, et si {Pn} est tendue, alors Pn P.

Dans le cas qui nous préoccupe ici, la fonctionnelle de Cramér-von Mises, basée sur la fonction caractéristique empirique, n’est même pas définie sur C(pq, ) et on ne pourra pas utiliser le Théorème 1. Pour résoudre ce problème, on a généralisé le Théorème 3.3 de Kellermeir (1980)(9). Par ailleurs, dans le contexte particulier d’un processus basé sur la fonction caractéristique empirique, qui prend ses valeurs dans l’ensemble des nombres complexes et dont l’ensemble des indices est n, le problème est bien plus compliqué que dans le cas « classique ». Pour cela, il est utile d’introduire les éléments suivants.

Notons C(pq, ) l’ensemble des fonctions continues de pq dans . Schématiquement, la correspondance du processus Rn,A de (4) peut se traduire ainsi.

Rn,A : (Ω,B,P) C(pq, ) (5) ω Rn,A(,ω) : pq (6) tRn,A(t,ω) = NRn,A(t). (7) (8)

On définit une métrique ρ sur l’espace de Fréchet séparable C(pq, ) par

x,y C(pq, ), ρ(x,y) = j=12j ρj(x,y) 1 + ρj(x,y)

avec ρj(x,y) = sup t jx(t) y(t), où désigne le module sur . Notons que ρj est bien définie car une fonction continue sur un compact est bornée.

Remarque 2. Un espace de Fréchet est un espace métrique linéaire complet.

On note S la tribu engendrée par les ouverts de C C(pq, ) pour la métrique ρ, jpq la boule fermée de centre 0 et de rayon j dans pq, Cj C(jpq, ) et Sj la tribu borélienne sur Cj pour la métrique ρj.
Notons aussi

rj : C(pq, ) C j (9) x rj(x) : jpq (10) tx(t) = rj(x(t)) (11) (12)

la restriction d’un élément x de C(pq, ) à Cj. Puisque pq est localement compact, séparable et Hausdorff on a, d’après la Proposition 14.6 de Kallenberg (1997) p. 260(8),

Rn,ALRA si et seulement si rj(Rn,A)Lrj(RA), j 1.

On peut donc restreindre l’étude du processus Rn,A aux sous-espaces compacts de C.
Maintenant, d’après le Lemme 14.2 Kallenberg (1997) p. 256(8), on a
rj(Rn,A)Lrj(RA) si et seulement si rj(Rn,A)fdrj(RA) et {rj(Rn,A)}n est une famille relativement compacte. De plus, d’après le Théorème 14.3 de Kallenberg (1997) p.257(8) et puisque Cj est séparable et complet, on a l’équivalence entre la compacitude relative et la tension de {rj(Rn,A)}n. Pour prouver la convergence faible de {Rn,A}n vers l’élément aléatoire RA il suffit donc de montrer que les lois de dimension finie des rj(Rn,A) convergent faiblement vers celles des rj(RA) et que j 1, {rj(Rn,A)}n est tendue.

Tous ces résultats sont largement exploités et détaillés dans le Chapitre 3 pour construire un test semi-paramétrique d’indépendance, pour des marginales multinormales, qui peut être utile par exemple dans l’étude des données familiales. Ils sont ensuite généralisés au cas sériel.

Pour conclure, il apparait donc que notre problématique repose sur des questions ancrées dans les prémisses de la statistique tandis que notre recherche s’appuie sur des outils et techniques récentes et innovatrices empruntées à l’analyse multivariée (voir Bilodeau et Brenner (1999)(2)), à la théorie des processus stochastiques (voir Billingsley (1968)(1)) ainsi qu’aux méthodes asymptotiques (voir Ferguson (1996)(4)).

Bibliographie

[1]     Billingsley, P., 1968. Convergence of probability measures. John Wiley & Sons Inc., New York.

[2]     Bilodeau, M., Brenner, D., 1999. Theory of multivariate statistics. Springer Texts in Statistics. Springer-Verlag, New York.

[3]     Box, G. E. P., Jenkins, G. M., 1976. Time series analysis : forecasting and control, revised Edition. Holden-Day, San Francisco, Calif., holden-Day Series in Time Series Analysis.

[4]     Ferguson, T. S., 1996. A course in large sample theory. Chapman & Hall, London.

[5]     Fisher, R. A., 1925. Statistical methods for research workers. VIII + 239 p. with 6 tables. Edinburgh and London, Oliver and Boyd.

[6]     Funkhauser, H. G., 1936. A Note on a Tenth Century Graph. Osiris, Vol.1.

[7]     Ghoudi, K., Kulperger, R. J., Rémillard, B., 2001. A nonparametric test of serial independence for time series and residuals. J. Multivariate Anal. 79, 191-218.

[8]     Kallenberg, O., 1997. Foundations of modern probability. Springer-Verlag, New York.

[9]     Kellermeier, J., 1980. The empirical characteristic function and large sample hypothesis testing. J. Multivariate Anal. 10(1), 78-87.

[10]     Kopecky, K. J., Pierce, D. A., 1979. Efficiency of smooth goodness-of-fit tests. J. Amer. Statist. Assoc. 74, 392-397.

[11]     Marsden, J.E., 1974. Elementary classical analysis. New York : W.H. Freeman and Company.

[12]     Neyman, J., 1937. Smooth test for goodness of fit. Skand. Aktuar. 20, 149-199.

[13]     Rao, C.R., 1947. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. In : Proceedings of the Cambridge Philosophical Society. Vol. 44. pp. 50-7.

[14]     Rayner, J.C.W., Best, D.J., 1988. Smooth tests of goodness of fit for regular distributions. Comm. Statist. Theory Methods 17(10), 3235-67.

[15]     Royden, H.L., 1968. Real analysis. New York : MacMillan.

Chapitre 2
Goodness-of-fit tests of normality for the innovations in ARMA models

Cet article a été accepté pour publication dans la revue Journal of Time Series Analysis.

Comme la coutume dans cette discipline le veut, l’ordre alphabétique des auteurs a été respecté.

Voici la liste des contributions principales de Pierre Lafaye de Micheaux à cet article :

Goodness-of-fit tests of normality for the innovations in ARMA models
(abbreviated title : Testing the residuals in ARMA)

Gilles R. Ducharme and Pierre Lafaye de Micheaux
Laboratoire de probabilités et statistique, cc051
Université Montpellier II
Place Eugène Bataillon
34095, Montpellier, Cedex 5
France

________________________________________________________________________ Abstract
In this paper, we propose a goodness-of-fit test of normality for the innovations of an ARMA(p, q) model with known mean or trend. This test is based on the data driven smooth test approach and is simple to perform. An extensive simulation study is conducted to study the behavior of the test for moderate sample sizes. It is found that our approach is generally more powerful than existing tests while holding its level throughout most of the parameter space and thus, can be recommended. This agrees with theoretical results showing the superiority of the data driven smooth test approach in related contexts.

Key words : ARMA process, Gaussian white noise, Goodness-of-fit test, Normality of residuals, Smooth test.____________________________________________________________________________

1. Introduction

Let (Y t, t ) be a stationary process. In this paper, we consider the case where E(Y t) is known or has been estimated using information outside of the data set. Thus, without loss of generality, we set E(Y t) = 0. Consider the framework where (Y t, t ) obeys the causal and invertible finite order ARMA(p, q) model

Y t ϕTY t1(p) = θTε t1(q) + ε t

(1)
where (εt, t ) is an innovation process of random variables with mean 0 and autocovariance E(εtεt+h) = σ2 < (unknown) if h = 0 and 0 otherwise and where ϕ = ϕ1 ϕp , θ = θ1 θq , Y t1(p) = Y t1 Y tp , εt1(q) = εt1 εtq .

A sample {Y 1,...,Y T } is observed and model (1) is fitted by standard methods, for example the unconditional Gaussian maximum likelihood approach (see Brockwell and Davis (1991)(4), p. 256-257), yielding the estimator β̂ = (ϕ̂T,θ̂T,σ̂)T of β = (ϕT,θT,σ)T.

If it can be safely assumed that the distribution of the (εt, t ) generating the Y t’s is of a given form, in particular independent identically distributed (i.i.d.) normal (Gaussian) random variables, then better inference can be obtained from the fitted model. For example, such an assumption is helpful to get accurate confidence or tolerance bounds for a predicted Y T+h. Moreover, under this Gaussian assumption, β̂ is asymptotically efficient. It is thus important to have a tool to check the null hypothesis

H0 :  the εt’s are i.i.d. N(0,σ2).

(2)
As pointed out by Pierce and Gray (1985)(30) and Brockett et al. (1988)(3), other reasons may motivate a test of (2). One such reason is to check the fit of the structural part of (1). Indeed, the process of fitting a model to data often reduces to finding the model whose residuals behave most like a sample of i.i.d. Gaussian variables. In this context, rejection of (2) may indicate lack-of-fit of the entertained ARMA model. We will not elaborate further here on this possibility and assume, in the sequel, that model (1) is not underspecified. Note however that there exist specific tests for detecting lack-of-fit (for a recent review, see Koul and Stute (1999)(22)).

For the problem of testing (2), the few tests available fall roughly into two groups. Tests of the first group use the fact that for the ARMA (p, q) models, normality of the Y t’s induces normality of the εt’s and vice versa. Thus a test of the hypothesis that a process (Y t, t ) is Gaussian (Lomnicki (1961)(25) ; Hinich (1982)(16) ; Epps (1987)(9)) can serve for problem (2). This presents the advantage of not requiring the values of p and q. But Gasser (1975)(? ) and Granger (1976)(13) have shown, and Lutkepohl and Schneider (1989)(26) have confirmed by simulation, that this approach may lose much power. This is because the central limit theorem forces the Y t’s to be close to normality even when (2) is false. Moreover, the adaptation of standard normality tests to dependent data is not an easy task. A small simulation study by Heuts and Rens (1986)(15) has shown that, because of the serial correlation between the Y t’s, the finite null behavior of standard normality tests based on the empirical distribution function (EDF) of the Y t’s is different from what is obtained under i.i.d. data. The same problem appears for tests based on the third or fourth moment of Y t (see Lomnicki (1961)(25) ; Lutkepohl and Schneider (1989)(26)) and for Pearson’s chi-square test (Moore (1982)(27)).

It thus appears better, when there are reasons to believe that a given ARMA(p, q) model holds, to "inverse filter" the data and compute the residuals ε̂t of the fitted model. These can then be subjected to some test of normality. Tests of the second group are based on this idea and some examples are listed in Hipel and McLeod (1994)(17). However, these and other authors use such tests in conjunction with critical values for i.i.d. data. Since the residuals of an ARMA model are dependent, the null distribution of standard test statistics may be affected and critical values for i.i.d. data may no longer be valid. It turns out that for AR models, there is theoretical evidence that this dependence affects only slightly the critical values, at least when T is large. For an AR(p) model with unknown E(Y t), Pierce (1985)(30) has shown that the asymptotic null distribution of any test statistic based on the EDF of the residuals coincides with that of the same statistic for i.i.d. data with mean and variance unknown. Thus one can insert the residuals from an AR(p) model into any of the standard EDF-based tests (Kolmogorov-Smirnov, Anderson-Darling) and if T is large, use the critical values given, for example, in Chapter 4 of D’Agostino and Stephens (1986)(7), to obtain an asymptotically valid test strategy. In the same vein, Lee and Na (2001)(24) have recently adapted the Bickel-Rosenblatt test to this AR setting. Beiser (1985)(2) has found that for the AR(1) model, tests based on the skewness or kurtosis coefficient of the residuals (D’Agostino and Stephens (1986)(7), p. 408) in conjunction with the critical points derived for i.i.d. data produce valid levels if T is large and the AR-parameter is not too close to its boundary. This has been confirmed by Lutkepohl and Schneider (1989)(26). See also Andel (1997)(1).

For the general ARMA model, much less is known. Ojeda et al. (1997)(29) show that tests based on quadratic forms in differences between sample moments and expected values of certain non-linear functions of the sample have the same asymptotic distribution under the ARMA model as under i.i.d. data. This suggests that a generalization of Pierce (1985)(30) theorem to ARMA models could hold although, to our knowledge, no proof of this has been published. In accordance with this conjecture, the practice recommended in many textbooks (see for example, Brockwell and Davis (1991)(4), p. 314 ; Hipel and McLeod (1994)(17), p. 241) is to use standard normality tests in conjunction with critical values for i.i.d. data.

In this paper, we develop some tests designed specifically for problem (2) in the ARMA(p, q) context. Our approach is based on the smooth test paradigm introduced by Neyman (1937)(28) and improved by the data driven technology introduced by Ledwina (1994)(23) to select the best order for the test. This approach has been shown in the i.i.d. case to offer many advantages, both theoretically and empirically, over other tests. In particular, the test statistic we recommend for problem (2) is easy to compute with an asymptotic χ2 distribution that can be corrected in finite samples to yield a close to nominal level. Moreover, as a byproduct of the procedure, diagnostic information is available that helps in understanding which aspects of the null hypothesis are not supported by the data.

Note that we concentrate here on the development of valid tests along this paradigm and do not dwell into their theoretical properties (i.e. local power and asymptotic efficiency). We also stress that the tests proposed here are valid solely for the case where E(Y t) is assumed known. The case where an unknown trend is present in (1) requires a special treatment and is the object of current research.

The paper is organized as follows. In Section 2, we develop the smooth goodness-of-fit test in the ARMA(p, q) context of (1). In Section 3, we describe the data-driven technology that allows to "fine tune" the test by choosing a good value for its order. In Section 4, a Monte-Carlo study is conducted for some values of (p, q) to study the behavior of the proposed tests under the null hypothesis and compare their power to some competitors. It emerges that, under the null hypothesis, one of our data driven smooth tests holds its level over most of the parameter space and, under the alternatives studied, is in general more powerful than other methods. It can thus be recommended as a good tool for problem (2). An example concludes the paper.

2. Smooth test of normality in the ARMA context

Let Φ(.) be the cumulative distribution function of the N(0, 1) distribution with density φ(.) and let Ut = 2Φ(εtσ) 1 with density g(.). Under H0 of (2), the Ut’s are i.i.d. U[-1, 1] so that (2) reduces to testing g(u) = 12 on [-1, 1]. The εt’s are unobserved so the test must be based on residuals. Since the process (Y t,t ) is invertible, we have

εt = r=0δ rY tr

(1)
where the δr’s are functions of θ and ϕ (see (A.2), (A.3) of Appendix A). Let δ̂r be the Gaussian maximum likelihood estimator (m.l.e.) of δr under (2), obtained by plugging in the m.l.e. θ̂ and ϕ̂ under H0. We define the residuals of the fitted ARMA model by
ε̂t = r=0δ̂ rY tr.

(2)
In practice, some scheme must be used to compute these residuals, for example by taking Y t = 0 if t < 1. Note that other residuals can be defined for ARMA models (see Brockwell and Davis (1991)(4), Section 9.4) but the definition above is convenient for the following derivation. Consider Ût = 2Φ(ε̂tσ̂) 1 , t = 1,...,T. Let {Lk(.),k 0} be the normalized (over [-1, 1]) Legendre polynomials (Sansone (1959)(32)) with L0(.) 1 satisfying
1 2 11L k(x)Lj(x)dx = 1 if k = j and 0 otherwise.

(3)
For some integer K 1, consider the density defined on [-1, 1] by
gK(u; ω) = c(ω) exp k=1Kω kLk(u) ,

(4)
where c(ω) is a normalizing constant such that c(0) = 12. In the smooth test paradigm, (4) is the K-th order alternative with gK(.; 0) being the U[-1, 1] density. Thus, if g(u) can be approximated by (4), (2) reduces to testing H0 : ω = 0. For this, we use the following route. Let Lt = (L1(Ut),...,LK(Ut))T, L̂t = (L1(Ût),...,LK(Ût))T and
L̂¯ = T1 t=1T L̂ t.

(5)
Under H0, Lt has mean 0 and covariance matrix IK, the K-th order identity matrix. Under (4), these moments will differ and (5) can be used to capture departures from the U[-1, 1] in the "direction" of gK(.; ω). This suggests as a test statistic a quadratic form in L̂¯ . To complete the test, we need the null asymptotic distribution of (5). This is given in the following theorem.

Theorem 1. Consider the causal and invertible ARMA(p, q) process of (1) where we assume 1 ϕ1z ... ϕpzp and 1 + θ1z + ... + θqzq have no common zeroes. Under H0, we have

TL̂¯LNK 0,IK 1 2bKbKT

(6)
where bK = (b1,...,bK)T, with bk = Lk(2Φ(x) 1)x2φ(x)dx. Hence, the smooth test statistic RK = TL̂¯T IK 1 2bKbKT 1L̂¯Lχ K2.

DÉMONSTRATION. We present an outline of the argument. More details are given in the appendices and in Ducharme and Lafaye de Micheaux (2002)(8). Let

Iβ = Var βLog 1 σφ εt σ

be Fisher’s information matrix for β. From standard results (see Gouriéroux and Monfort (1995)(12), p.325), we have,

T β̂β = 1 T t=1T I β1 β Log 1 σφ εt σ + oP (1).

Since β̂β = OP (T12), a Taylor expansion yields

TL̂¯ = 1 T t=1T L t + 1 T t=1T βLt T β̂β + oP (1).

(7)
The first term on the right hand side of (7) converges to a NK(0,IK). Moreover, it is shown in Appendix A that
1 T t=1T βLtP 0K × (p + q),1 σbK = JK.

(8)
Hence, TL̂¯ = 1 T t=1T L t 1 TJKIβ1 t=1T β Log 1 σφ εt σ + oP (1) = 1 T t=1T BV t + oP (1),

where B = (IK,JKIβ1) and

V t = LtT, βT Log 1 σφ εt σT.

From Appendix B, it follows that, E(V t) = 0 and Var(BV t) = IK bKbKT2 . The central limit theorem yields (6).

It is possible to write RK in a form that makes it easy to use. A Cholesky decomposition of IK bKbKT2 yields IK bKbKT21 = PPT with P = (pij), an upper triangular matrix. Some algebra gives pij = 0 if i > j, while

pii = 2 k=1i1bk2 2 k=1ibk2  and pij = bibj 2 k=1j1bk2 2 k=1jbk2 if j > i.

Thus

RK = k=1K 1 T t=1T L k(Û t) 2,

where

Lk(Û t) = l=1kp lkLl(Ût).

(9)
Numerical integration gives (b2,b4,...,b10) = (1.23281, 0.521125, 0.304514, 0.205589, 0.150771) with bk = 0 if k is odd. This yields the first ten "modified" Legendre polynomials L1(u) = 1.73u, L2(u) = 6.85u2 2.28, L3(u) = 6.61u3 3.97u, L4(u) = 19.91u4 10.26u2 0.56, L5(u) = 26.12u5 29.02u3 + 6.22u, L6(u) = 69.84u6 81.84u4 + 28.36u2 3.06, L7(u) = 103.84u7 167.75u5 + 76.25u3 8.47u, L8(u) = 260.07u8 450.18u6 + 247.18u4 38.73u2 1.11, L9(u) = 413.92u9 876.55u7 + 613.58u5 157.33u3 + 10.73u, L10(u) = 994.51u10 2250.43u8 + 1782.83u6 569.92u4 + 67.54u2 3.58.

Remark 2.1. Theorem 1 shows that we can slightly extend the result of Pierce (1985)(30) and state that neither the estimation of ϕ and θ nor the dependence of the Y t’s has any asymptotic impact on a smooth test of (2) in the ARMA context. In pre-asymptotic situations, these elements and the complexity of the model will affect the null distribution of RK. This will be further explored in simulations of Section 4.

Remark 2.2. Each term T1 t=1T L k(Û t) 2 is a component of the test statistic and has an asymptotic χ12 distribution under H0. When the null hypothesis is rejected, some of these components will be large. The simple structure of the first few polynomials in (9) helps in understanding what aspects of the normal are not supported by the data. For example, the first component detects departure from symmetry under H0 in the "direction" of asymmetry. This diagnostic analysis must be undertaken with some care however; see Henze (1997)(14) for details.

Remark 2.3. The above methodology can in principle be applied to other distributions than the normal. For location-scale densities, one needs to replace the normal distribution in the definition of Ut and follow the derivation using the new null density. The structure of RK will be similar to what is obtained above but the modified Legendre polynomials will change. For distributions with a shape parameter, the statistic is more complex since the coefficients of these polynomials will in general depend on this unknown shape parameter that must be estimated.

3. Choosing the order K of the alternative

Before applying the test strategy of Section 2, one must choose the value of K. Ideally, this choice should be made so that members of the embedding family gK(.; ω) of (4) provide a good approximation to any plausible density g(.) of Ut under the alternative. If K is too small, this approximation may be crude and the test loses power. If K is too large, power dilution can occur since gK(.; ω) encompasses unnecessary "directions".

In practice, the user has only, at best, a qualitative idea of the plausible alternatives and no specific value of K emerges naturally. In the i.i.d. case, some authors (Rayner and Best (1989)(31)) argue that, as a rule of thumb, one can use a trade-off value of K between 2 and 4.

Recently, Ledwina (1994)(23) and Kallenberg and Ledwina (1997a,b)(? 21) have proposed and explored for i.i.d. data a method to choose adaptively a value for K. At the first step, Schwarz (1978)(33)’s criterion is used to choose the value K̂ that seems best in view of the data at hand. The smooth test strategy is then applied using the statistic RK̂. Extensive simulations have shown that, even for small sample sizes, this so-called "data driven smooth test" can yield power close to what could be obtained if one knew the true form of the alternative and had chosen the best value of K accordingly.

So far, this approach has been investigated for i.i.d. data only but it can be extended to the ARMA context. Choose two integers 1 d D and consider the set of statistics (Rd,...,RD). We seek a rule that will select a good RK in this set. Write

K̂ = min Argmaxd s D{Rs sLog(T)}

(1)
and denote RK̂(d), the test statistic RK̂ selected by (1) in (Rd,...,RD).

Theorem 2. Under H0, K̂d in probability and thus, RK̂(d) is asymptotically χd2.

DÉMONSTRATION. Set ek = (k d)LogT. For k d, P(K̂ = k) P(Rk > ek). Now, since each Rk is asymptotically χk2 under H0, as T increases,

P(Rk > ek)0,

when k > d. It follows that P(K̂ = d) = 1 P(K̂ d + 1)1.

For finite sample sizes, the asymptotic null distribution of Theorem 2 may not provide a good approximation to that of RK̂(d) since there is a positive probability that K̂ d + 1. A simple correction has been developed by Janic-Wroblewska and Ledwina (2000)(18) when d = 1 (i.i.d. data). Because of the asymptotic independence between the components of Rk, this correction can easily be extended to d > 1 and to the present ARMA context. A direct application of the argument in their Section 4 leads to the following approximation, which can be solved for x by numerical integration

P(RK̂(d) x) P(χd2 x)P(χ 12 Log(T))+ Log(T)xP(χ d2 < xz) 1 2πzez2dz.

(2)
Some quantiles corrected through (2) are listed in Table 3.1.
TAB. 3.1: Some quantiles obtained from approximation (2)





T a = 0.10a = 0.05a = 0.01





d = 1 50 3.692 5.410 8.805
100 3.275 5.201 8.703
200 3.057 4.751 8.590





d = 2 50 5.466 7.137 10.807
100 5.262 6.972 10.684
200 5.043 6.796 10.558







One may have the feeling that this data driven approach replaces the problem of selecting K with that of selecting d and D. To answer this, Kallenberg and Ledwina (1997a,b)(? 21) have studied a version of the above procedure where D is allowed to increase with T. In the i.i.d. case, they obtain rates connecting these quantities. These rates are theoretically interesting but do not help in practice in selecting a value for D. To get more insight, they have conducted extensive simulations. It turns out that the power levels off rapidly as D increases and there is little to be gained by choosing D much greater than 10. As for the choice of d, again Kallenberg and Ledwina (1997a)(20) briefly discuss this problem where it emerges that in their context d = 1 or 2 appears reasonable. In the simulation study of the next section we use both these values of d and take D = 10.
In closing this section, note that, by plotting gK̂(.; ω̂) where ω̂ is an estimate of ω, one can get an idea of the true shape of the density when the null hypothesis has been rejected. This can be helpful in finding a more appropriate distribution for the innovations.

4. Simulation Results

To get an idea of the behavior of our test statistics as compared to some competitors, a simulation study was conducted. Samples {Y t,t = 1,...,T} from various ARMA(p, q) models were generated with the innovations arising, in the first part of the simulation, from the normal distribution and, in the second, from various alternatives. For each sample, we estimated the parameters of the model and computed test statistics. From there, we obtained approximations to their level and power. All programs are written in Fortran 77. The subroutines listed below are from the Numerical Algorithms Group (NAG) MARK 16 Fortran library.

4.1. Levels

The first part of the simulation study was designed to see if the critical values obtained from the asymptotic χ2 or from (2) can be relied upon in finite samples. We took T = 50, 100 and 200 and restricted attention to the models MA(2), AR(2), ARMA(1,2), ARMA(2,1) and ARMA(2, 2). To generate ARMA(p,q) samples with Gaussian innovations, we used subroutine G05EGF and G05EWF. These samples were submitted to subroutine G13DCF that returns estimates of the parameters of the model as well as residuals. The definition of these residuals, given at equation (9.4.1) in Brockett et al. (1988)(3), differs from (2) but their numerical values are almost identical. These residuals were then submitted to the various tests. The actual level of each test was computed for nominal level α = 0.10 and 0.05.

Regarding the parameter β, note that our test statistics are in theory invariant to the choice of σ and we took σ = 1. Numerically, this invariance holds approximately because of the stopping rule in G13DCF. But the finite distribution of our test statistics depends on the values of θ and ϕ. To explore this, we have proceeded as follows. First, causality requires that, if p = 1, ϕ1 1, 1 while if p = 2, ϕ must be in the region ϕ = {(ϕ1,ϕ2)ϕ1 + ϕ2 < 1,ϕ2 ϕ1 < 1,ϕ2 < 1} (Brockett et al. (1988)(3), p. 110, ex.3.2). Similarly, invertibility implies that if q = 1, θ1 1, 1 while if q = 2, θ must be in θ = {(θ1,θ2)θ1 + θ2 < 1,θ2 θ1 < 1 and θ2 < 1}. In addition, the polynomials 1 ϕ1z when p = 1 and 1 ϕ1z ϕ2z2 when p = 2 must have no common zeroes with 1 + θ1z when q = 1 and 1 + θ1z + θ2z2 when q = 2.

For the AR(2) model, we have taken the values of ϕ in the grid of 64 points {(2.0 + 0.25j,0.9 + 0.25k) ϕj,k 0}. A similar grid was used for the MA(2). This makes it possible to see whether the tests maintain the proper critical level over a large section of the parameter space. For the ARMA(1, 2), the grid over θ was reduced to {(2.0 + 0.40j,0.9 + 0.40k) θj,k 0} while ϕ1 = 0.9 + 0.2j,j = 0,..., 9. This gives a set of 250 points on the parameter space of (ϕ1,θ). For the ARMA(2, 1) model, the same was done with ϕ and θ1 instead. Finally, for the ARMA (2, 2) model, points (ϕ,θ) satisfying the "no common zeroes" condition were taken in {(1.95+0.45j,0.85+0.45k) ϕj,k 0}{(1.95+0.45j,0.95+0.45k) θj,k 0}. This yields 294 (ϕ,θ) parameter points. For each of these parameter points, 10000 samples of size T were generated as described above.

To summarize the results, the following approach was adopted. A 95% confidence interval for the true level when α = 0.10 is (0.094, 0.106). Similarly, for α = 0.05, 95% of the p-values are expected in the interval (0.046, 0.054). Thus the range of possible p-values was divided in 5 sub-intervals. For α = 10%, these are I1 = (0, 0.085), I2 = [0.085, 0.094), I3 = [0.094, 0.106), I4 = [0.106, 0.115) and I5 = [0.115, 1]. For α = 0.05, I1 = (0, 0.035), I2 = [0.035, 0.046), I3 = [0.046, 0.054), I4 = [0.054, 0.065) and I5 = [0.065, 1]. For each model, the percentage of p-values in each interval was recorded. Table 4.1 reports the results for statistics R3 and RK̂(2) which, as discussed in Section 3, are representative of the two schools of thought for the choice of K. The results for the AR(2) and ARMA(2,1) models being similar to those of the MA(2) and ARMA(1,2) respectively, are omitted for brevity (see Ducharme and Lafaye de Micheaux (2002)(8) for more complete results).


TAB. 4.1: Distribution (in % of the number of parameter points) of the empirical p-values (based on 10000 replications) for the tests based on R3 and RK̂(2) among 5 sub-intervals.









R3
Observed level
Min








Model T αI1I2I3I4I5p-level


















50 5% 18.8 68.8 12.5 0 0 2.76


100 5% 1.6 50.0 48.4 0 0 3.41


MA(2) 200 5% 0 9.4 89.1 1.6 0 4.04


50 10% 23.4 53.1 23.4 0 0 6.49


(64 points) 100 10% 6.3 20.3 73.4 0 0 7.96


200 10% 0 7.8 90.6 1.6 0 8.92









50 5% 47.2 46.4 6.4 0 0 2.43


100 5% 8.0 71.6 20.4 0 0 2.98


ARMA(1,2) 200 5% 0.8 32.4 66.4 0.4 0 3.32


50 10% 65.6 24.0 10.4 0 0 6.20


(250 points) 100 10% 21.6 35.2 42.8 0.4 0 6.80


200 10% 4.0 19.6 75.6 0.8 0 7.42









50 5% 41.2 57.1 1.7 0 0 2.56


100 5% 5.1 74.1 20.8 0 0 3.09


ARMA(2,2) 200 5% 0.3 27.9 71.8 0 0 3.47


50 10% 57.8 37.4 4.8 0 0 6.24


(294 points) 100 10% 21.1 33.7 45.2 0 0 6.88


200 10% 3.1 18.0 78.6 0.3 0 7.86


















RK̂(2)
Observed level
Min








Model T αI1I2I3I4I5p-level


















50 5% 0 9.4 46.9 43.7 0 4.12


100 5% 0 14.1 68.8 17.2 0 4.17


MA(2) 200 5% 0 7.8 87.5 4.7 0 4.23


50 10% 6.3 14.1 62.5 17.2 0 7.78


(64 points) 100 10% 6.3 6.3 81.3 6.3 0 8.19


200 10% 0 6.3 89.1 4.7 0 8.83









50 5% 0 38.8 46.8 14.4 0 3.53


100 5% 0 34.8 59.2 6.0 0 3.74


ARMA(1,2) 200 5% 0 23.2 75.2 1.6 0 3.80


50 10% 24.8 31.6 35.2 8.4 0 7.06


(250 points) 100 10% 13.2 27.2 57.6 2.0 0 7.33


200 10% 4.4 18.0 76.0 1.6 0 7.61









50 5% 0 32.0 55.4 12.6 0 3.65


100 5% 0 31.0 62.9 6.1 0 3.75


ARMA(2,2) 200 5% 0 23.8 75.9 0.3 0 3.89


50 10% 21.4 30.3 47.0 1.4 0 7.14


(294 points) 100 10% 11.2 24.8 62.6 1.4 0 7.51


200 10% 2.7 16.7 80.6 0 0 7.62










The actual levels for R3 are concentrated on I1, I2 and I3. The mode of the distribution is generally located on I2 for T = 50 and is shifted to I3 as T increases. This lead, at worst, to slightly conservative tests. To appreciate this, the last column of Table 4.1 gives the smallest p-value recorded over the parameter points. For RK̂(2), the distribution is concentrated on I2, I3 and I4 with, in all cases, a mode centered on I3. For this statistic, the minimal p-values are also closer to the nominal level (no maximal p-value was very far from the upper bound of I4). Thus correction (2) works nicely, at least for the cases considered here.

We also investigated what areas of the parameter space give p-values in I1. Intuitively, one expects these points to be near the boundary. However, the pattern that emerges, which is very similar for both R3, and RK̂(2), is more precise. For AR(2) models, these points correspond mainly to positive (ϕ1,ϕ2) close to the right boundary of ϕ and, to a lesser degree, to those with positive ϕ1 and negative ϕ2 but again close to that boundary. For MA(2) models, the situation is reversed, which is not surprising since θ = ϕ. For ARMA(2, 1), the points giving small p-values correspond to positive (ϕ1,ϕ2) combined with values of θ1 close to -1. Again, for ARMA(1, 2) the situation is reversed and small p-values are associated with negative values of (θ1,θ2) with a value of ϕ1 close to 1. Finally, for the ARMA(2, 2), the points that yield p-values in I1 are mainly those with positive (ϕ1,ϕ2) and negative (θ1,θ2).

We have also investigated the behavior under H0 of some other tests that have been recommended in the time series literature for (2). We first considered the Anderson-Darling (AD) test (Pierce (1985)(30)) for case 2 (known mean) used in conjunction with the quantiles given in D’Agostino and Stephens (1986)(7) p. 122. Our simulations show that, for large T this yields valid critical levels. We also studied a variant of the Shapiro-Wilk test known as the Weisberg and Bingham (1975)(35) (WB) test. To adapt this test to our context where the mean is known, the denominator of equation (9.68) of D’Agostino and Stephens (1986)(7) was replaced by Tσ̂2 , where σ̂2 is the estimate of σ2 returned by subroutine G13DCF. Up to the numerical accuracy of procedure G13DCF, this corresponds to the sum of squares of the residuals. Our simulations show that the quantiles for this test can be approximated by Monte Carlo using i.i.d. data, although we found no theoretical result supporting this. Thus, we simulated 100000 samples from an ARMA(0,0) model and computed the empirical quantiles. For T = 50, 100 and 200, we got, for α = 10%, 0.920, 0.958 and 0.978. For 5%, we found 0.899, 0.947 and 0.973. A third approach, the Jarqueand Bera (1987)(19) eq. (5) (JB) test was also investigated. Although developed in the linear regression context, this test has been recommended in the time series literature (see Cromwell et al. (1994)(6) ; Frances (1998)(10)). A summary of the results for these tests in the ARMA(1, 2) model is given in Table 4.2. Also appearing in this table are the levels of the test based on RK̂(1) using quantiles derived from (refequation3.2).

Overall, the best tests, according to the criterion of maintaining the proper level throughout the parameter space, are RK̂(2) followed by RK̂(1) and then R3, AD and WB. In general, the AD test yields distributions of p-values in between those of R3 and RK̂(1). More troublesome is the fact that this test, as well as the WB test, may vastly underestimate the intended level, as can be seen by the minimal p-values (last column of Table 4.2) encountered on the grids. Also, there appears to be a problem with the JB test as the quantiles, obtained from the χ22 approximation, lead to gross error. Further simulations indicate that the convergence to the χ22 is very slow. The JB statistic is a version of the Bowman and Shenton test statistic that, for i.i.d. data, has a notoriously slow convergence. The simulation results in Lutkepohl and Schneider (1989)(26) tend to show that this is also the case for AR(1) and AR(2) models. In view of this problem, we choose to drop from further investigations the JB test.


TAB. 4.2: Distribution (in % of the number of parameter points) of the empirical p-values (based on 10000 replications) of various tests for the ARMA(1,2) model. AD=Anderson-Darling, WB=Weisberg-Bingham, JB=Jarque-Bera and RK̂(1) = RK̂ with d = 1.









Test
Observed level
Min








Model T αI1I2I3I4I5p-level


















50 5% 43.8 24.4 20.0 6.8 0 0.54


100 5% 32.0 34.8 33.2 0 0 0.92


AD200 5% 11.6 38.0 49.6 0.8 0 1.50


50 10% 41.2 16.0 22.8 18.4 1.6 3.38


100 10% 23.2 24.4 48.0 3.6 0.8 3.93


200 10% 9.6 13.2 70.0 6.8 0.4 4.65









50 5% 61.2 23.6 15.2 0 0 0.57


100 5% 39.2 46.0 14.4 0.4 0 0.93


WB200 5% 10.8 30.4 56.0 2.8 0 1.60


50 10% 56.8 16.4 25.6 1.2 0 2.96


100 10% 37.2 47.2 15.6 0 0 3.55


200 10% 15.6 36.0 46.0 2.4 0 4.71









50 5% 71.2 28.8 0 0 0 3.13


100 5% 0.4 99.2 0.4 0 0 3.13


JB200 5% 0 85.2 14.4 0.4 0 4.18


50 10% 100 0 0 0 0 4.96


100 10% 100 0 0 0 0 5.88


200 10% 98.8 1.2 0 0 0 7.23









50 5% 0 58.0 27.6 14.4 0 3.52


100 5% 10.0 48.8 39.2 2.0 0 3.32


RK̂(1)200 5% 8.4 33.6 56.4 1.6 0 3.39


50 10% 43.6 20.8 26.0 9.6 0 4.69


100 10% 26.0 24.8 48.0 1.2 0 4.81


200 10% 8.8 16.4 70.4 4.4 0 5.79










4.2. Power

The second part of the simulation was designed to study the power of our tests and allow comparison with the competitors mentioned above. We restricted attention to i.i.d. innovations. We generated samples {Y t,t = 1,...,T} according to model (1) from various alternatives to the normal distribution. These alternatives were taken as the centered version of the densities listed in Table V of Kallenberg and Ledwina (1997b)(21). They comprise a large range of departure from the normal distribution both in skewness, kurtosis and shape.

To generate ARMA(p, q) samples {Y t,t = 1,...,T} according to model (1) with non-Gaussian innovations, we used the random shock method (algorithms IA 1 with m = 50 and SA 1 with M = 200) of Burn (1987)(5). To allow a proper comparison of the various tests, we used for each model a set of parameters for which the p-values computed in the first part of the simulation were in I3 for all tests. More precisely we took : ARMA(2, 1) : (ϕ,θ1) = (0.8,0.1, 0.7), ARMA(1, 2) : (ϕ1,θ) = (0.7, 0.4, 0.5) and ARMA(2, 2) : (ϕ,θ) = (1.05,0.4, 0.15, 0.85). Also we took T = 50 (more complete simulations appear in Ducharme and Lafaye de Micheaux (2002)(8)). For each combination of model and alternative distribution, we generated 10000 samples and performed the various tests. From there, empirical powers were computed.

Table 4.3 presents these empirical powers for the tests R3, RK̂(2) and WB when α = 10%. Similar results were obtained for α = 5%. The tests R3 and RK̂(2) behave similarly with, overall, RK̂(2) being slightly better. Both these tests generally dominate the others. The AD approach, not shown here, often yields a power that is much lower than these two tests whereas WB generally lies somewhere in between. For i.i.d. data, the WB test, as a variant of the Shapiro-Wilk test, is considered among the best omnibus tests of normality. In ARMA situations, this does not seem to hold at the same degree.

We have also computed the power of the test based on RK̂(1). The tabulated results are not presented here for brevity. We found that, for T = 50 and symmetric alternatives, the test based on RK̂(1) yields slightly better power that RK̂(2). For asymmetric alternatives, the situation is reversed. But for T = 100, RK̂(2) is more powerful almost everywhere. This behavior of RK̂(1) is explained by the fact that for asymmetric alternatives, R1 yields little, sometimes trivial, power. Moreover, power as a function of K usually levels off at K = 3, and not infrequently at K = 2. This empirical observation is behind the rule of thumb stated in Section 3. Thus to have good power, the selection rule with d = 1 must give K̂ 3, which may be difficult. Starting at d = 2 gives a better chance that K̂ 3 when necessary.

In view of the results of these simulations, we recommend the use of RK̂(2) for testing (2) when E(Y t) is known. The levels are stable over most of the parameter points and close to nominal for moderate samples. Moreover, the power is generally better than that of other tests that have been recommended in the time series literature. Finally, the test is very easy to apply.


TAB. 4.3: Empirical power (based on 10000 replications with α = 10%) of various tests when T = 50. The part above the line in the middle of the table corresponds to symmetric alternatives while those below are skewed. The distributions are ordered according to increasing kurtosis. The ARMA(2,1) model has parameter (ϕ,θ1) = (0.8,0.1, 0.7), the ARMA(1,2) model has parameter (ϕ1,θ) = (0.7, 0.4, 0.5) while the ARMA(2,2) model has parameter (ϕ,θ) = 1.05,0.4, 0.15, 0.85.










T = 50
ARMA(2,1)
ARMA(1,2)
ARMA(2,2)










Alternatives R3RK̂(2)WBR3RK̂(2)WBR3RK̂(2)WB




















SB(0 ;0.5) 83.19 84.76 28.93 73.16 74.90 22.47 63.85 65.44 17.57
TU(1.5) 66.94 67.96 18.27 57.86 59.43 15.07 49.17 50.50 13.62
TU(0.7) 44.47 45.64 12.42 38.69 39.44 11.02 32.58 33.88 10.76
Logistic(1) 20.74 22.75 19.02 18.97 20.87 17.10 18.59 20.36 17.60
TU(10) 94.64 96.64 83.60 89.57 91.62 74.31 85.14 87.26 65.78
SC(0.05 ;3) 33.65 37.38 35.98 32.82 35.65 34.40 30.81 34.69 32.37
SC(0.2 ;5) 96.36 96.77 92.84 94.21 94.72 89.12 92.28 92.96 85.62
SC(0.05 ;5) 62.33 65.22 63.63 61.43 63.86 61.81 58.90 62.20 59.77
SC(0.05 ;7) 74.05 76.12 75.32 73.00 75.22 73.89 72.28 74.04 72.50
SU(0 ;1) 75.96 76.49 66.57 71.73 72.44 62.20 68.60 69.79 59.99










SB(0.533 ;0.5) 91.09 89.76 59.41 83.62 82.09 48.17 76.29 74.04 36.87
SB(1 ;1) 53.75 56.94 32.60 45.94 48.41 26.18 42.03 44.17 23.84
LC(0.2 ;3) 55.58 57.79 29.72 49.58 52.11 26.19 43.88 46.16 22.62
Weibull(2) 28.10 30.72 21.25 25.63 28.66 18.52 24.32 26.85 18.68
LC(0.1 ;3) 44.10 43.85 35.21 40.62 40.54 31.38 37.00 37.25 27.97
χ2 (df.=10) 41.41 45.84 34.72 37.80 41.98 31.09 35.04 38.91 29.66
LC(0.05 ;3) 29.50 31.25 28.28 28.05 29.86 26.18 25.91 28.32 24.17
LC(0.1 ;5) 96.10 96.00 95.07 93.40 93.03 90.44 91.04 89.90 86.06
SU(-1 ;2) 37.88 38.92 33.93 34.38 36.29 31.12 33.54 65.40 29.87
χ2 (df.=4) 76.13 80.54 69.78 71.19 75.76 63.30 65.96 70.30 57.70
LC(0.05 ;5) 81.48 83.98 84.41 78.07 80.38 80.80 75.48 77.71 77.58
LC(0.05 ;7) 94.26 94.74 96.28 94.05 94.73 96.39 93.44 94.24 95.88
SU(1 ;1) 96.26 96.15 93.98 94.18 94.17 91.39 93.14 93.17 90.14
LN(0 ;1) 99.52 99.68 99.24 98.74 98.85 98.02 97.89 98.16 96.83











5. An example

In the course of a study to forecast the amount of daily gas required, Shea (1987)(34) has studied a bivariate time series of T = 366 points. The first component of this time series pertains to differences in daily temperature between successive days (τt) and he found, after an iteration process of fitting and diagnostic checking, that the following MA(4) model could be entertained :

τt = εt + 0.07εt1 0.30εt2 0.15εt3 0.20εt4.

The residual variance is 2.475. All these parameters are obtained by maximizing the Gaussian likelihood so that problem (2) is of some importance. Shea does not discuss the normality of the innovations in assessing the fit of this model but rather goes on to find a good model for the bivariate series based on an analysis of the residuals’ cross correlation matrix.

An application of our tests yields R3 = 22.85, with a p-value of 0.00004 while RK̂(2) = 22.77 (K̂ = 2) yielding a p-value of 0.00003 according to (2). Thus, both tests strongly reject the null hypothesis (2). A complementary analysis helps understanding what aspect of the Gaussian is not supported by the data. We found R1 = 0.15 (p = 0.69) with a skewness coefficient of 0.13. Thus there is no reason to suspect an asymmetrical distribution for the innovations. On the other hand, we can notice that 9.3% of the absolute standardized residuals are greater than 2.5 and the kurtosis is 4.33. Thus, if the model entertained above is correct, the conclusion that emerges from the present analysis is that the τt series could have been generated from innovations with a symmetric distribution having fatter tails than the Gaussian.

ACKNOWLEDGMENTS
The authors would like to thank Dr. B.L. Shea for some insight on subroutine G13DCF of the NAG library and for providing them with the data set used in Section 5.

Appendix A

We show that(8) holds under H0. Assume p and q > 0. It suffices to show that

1 T t=1T σLk(Ut) PE σLk(Ut) = 1 σbk,

(A.1.a)
1 T t=1T ϕ1Lk(Ut) PE ϕ1Lk(Ut) = 0,

(A.1.b)
1 T t=1T θ1Lk(Ut) PE θ1Lk(Ut) = 0.

(A.1.c)
First, σLk(Ut) = 2εt σ2 φ εt σLk(x) x=2Φεt σ 1 = εt σ2w εt σ say.

The law of large numbers yields (A.1.a). For (A.1.b), define for r 0,

Br1 = ϕ1δr(θ,ϕ),

where, setting ϕ0 = 1, γ0 = θ0 = 1, we have

δr(θ,ϕ) = δr = i=0min(r,p)ϕ iγri r 0,

(A.2)
γr = i=1min(r,q)γ riθi r 1.

(A.3)
Obviously Br1 = γr1 when r 1. For r q, from Brockwell and Davis (1991)(4), p.107, γr = i=1j n=0ri1c inrnα ir

for some constants cin and where the αi’s are the j distinct roots of 1 + θ1z + ... + θqzq and ri is the multiplicity of αi, i = 1,...,j. Thus, when r q + 1,

Br1 = i=1j n=0ri1c in(r 1)nα ir+1.

(A.4)
If (Xt,t ) is a weak stationary process such that Cov(Xt,Xt+h) 0 as h , then X¯TPE(Xt). We apply this result with Xt = Lk(Ut)ϕ1. From (1) and (1), we have
Xt = 1 σw εt σ ϕ1εt = 1 σw εt σ Y t1 r=0 ϕ1δr θTY t1r(q) .

(A.5)
Thus E(Xt) = 0. Moreover, Var(Xt) < as shown in Appendix C and it is seen by Lemma C.1 that Cov(Xt,Xt+h) depends only on h. Thus (Xt,t ) is stationary. We show that Cov(Xt,Xt+h) 0 as h . From (A.5), for h large, Cov(Xt,Xt+h) = d1Ew(εt+hσ)σ, where
d1 = E 1 σw εt σ Y t1 r=0B r1θTY t1r(q) Y t+h1 r=0B r1θTY t+h1r(q) .

(A.6)
But, d1 d2 + j=1qθ j(d3j + d4j) + i=1q j=1qθ iθjd5ij where d2 = E 1 σw εt σY t1Y t+h1 , d3j = r=0B r1E 1 σw εt σY trjY t+h1 ,

d4j = r=0B r1E 1 σw εt σY t1Y t+hrj

and

d5ij = r=0 r=1B r1Br1E 1 σw εt σY triY t+hrj .

It can be shown that d2, d3j, d4j and d5ij 0 when h . Proof for d4j, which is typical, is sketched in Appendix D. This yields (A.1.b).

As for (A.1.c), let Ar = θ1δr(θ,ϕ). From (A.3), we obtain, for r q, the system

γr + θ 1γr1 + ... + θ qγrq = γ r1 γr + θ1γr1 + ... + θqγrq = 0

from which we find

0 = j=0qθ j i=0qθ iγrji+1 = h=02qa hγrh+1 where a h = i+j=h 0i,jq θiθj, r 2q1.

Again from Brockwell and Davis (1991)(4), p.107, we have, for some constants din

γr = i=1j n=0si1d inrnβ ir

where the βi’s are the j distinct roots (with multiplicity si) of 1 + a1z + a2z2 + ... + a 2qz2q. Now

i=0qθ iz2 = h=02q i+j=h 0i,jq θiθj zh = h=02qa hzh

where a0 = θ02 = 1. This shows that the roots of 1 + a1z + a2z2 + ... + a 2qz2q are exactly the same than that of 1 + θ1z + θ2z2 + ... + θ qzq, apart from the multiplicity. Thus, we obtain

Ar = l=0p i=1j n=0si1d in(r l)nα i(rl)ϕ l, for all r max(2q,p).

(A.7)
By the same argument, using Ar of (A.7) instead of Br1 of (A.4), we get (A.1.c).

Appendix B

We show that E(V t) = 0 and Var(BV t) = IK bKbKT2. In view of (1) and (1),

ϕLog 1 σφ εt σ = εt σ2 Y t1(p) r=0 ϕδr θTY t1r(q) ,

θLog 1 σφ εt σ = εt σ2 εt1(q) r=0 θδr θTY t1r(q)

and

σLog 1 σφ εt σ = 1 σ εt σ2 1 .

It follows that E(V t) = 0 under H0. Moreover, under H0, Var(Lt) = IK. Thus,

Cov Lt, βLog 1 σφ εt σT = 0 0 1 σbKT = JKT.

Finally, Var βLog 1 σφ εt σ = Iβ = C 0 0 2 σ2 , for some matrix C whose exact expression is not needed. Thus

Var(V t) = IK IK IKTI β .

Appendix C

We show that Var(Xt) < . Without loss of generality, set σ = 1. This will be assumed here and in the next appendix. Since Y t is causal, we can write Y t = j=0ψ jεtj and from (A.5)

Var(Xt) = E(w(εt))2E Y t1 r=0B r1θTY t1r(q) 2 = E(w(ε t))2E h=1d hεth 2

where dh = ψh1 r+j+l=h 1jq 0r,lh1 ψlγr1θj. We now need the following lemma.

Lemma 1. If the ARMA process (1) is causal and invertible, then h=1d h < .

DÉMONSTRATION.
From (A.3), r+j+l=h 1jq 0r,lh1 ψlγr1θj = k=1hψ hk r+j=k 1jq 0rh1γr1θj = k=1hψ hkγk1. We have also h=1 k=1hψ hkγk1 = h=0 k=0hψ hkγk = k=0γ k h=0ψ h.
Thus, h=1d h h=1ψ h1 + h=1 k=1hψ hkγk1 = h=0ψ h k=0γ k + 1. But from Brockwell and Davis (1991)(4), p.87, k=0γ k is finite. Since under the assumption of Theorem 1, j=0ψ j < the lemma follows.

From this lemma, we conclude that E h=1d hεth 2 = h=1d h2 < . Since

E(w(εt))2 = 4 (Lk(2Φ(x) 1))2φ3(x)dx < ,

the result follows.

Appendix D

Here we sketch the proof that the typical element d4j of inequality (A.6) vanishes. From Y t = j=0ψ jεtj and the fact that the remainder of a convergent series converges toward 0, we have

limh d4j = lim h r=0B r1E w(εt)Y t1Y t+hrj limh E(w(εt)) r=0hj B r1 a=0ψ aψa+hjr+1 E(w(εt)) lim h a=0m1ψ a r=0a+hjmB r1ψa+hjr+1 + r=a+hjm+1hjB r1ψa+hjr+1 + r=0hjB r1 a=mψ aψa+hjr+1(D.1)

where m = max{p,q + 1} p. For the first term in the limit of (1), using the expression for Br1 in (A.4) and that of ψa+hjr+1 given in Brockwell and Davis (1991)(4) eq. (3.3.6), we have

r=q+1a+hjmB r1ψa+hjr+1 = r=q+1a+hjm b=1k l=0rb1c blrlα ar b=1k l=0rb1α bl(a + h j r + 1)l ξb(a+hjr+1) b=1k l=0rb1 b=1k l=0rb1 d=0l l d cblαblξb(a+hj+1)(a + h j + 1)ld × r=q+1a+hjmrl+dα arξ br.

If ξb < αa, the term in braces 0 as h . Let αa = 1 + ε1 < ξb = 1 + ε2 with ε1, ε2 > 0.

(a + h j + 1)ld ξb(a+hj+1) r=q+1a+hjmrl+dα arξ br a + h j + 1ld ξba+hj+1 r=0a+hj+1rl+d ξb αar.

(D.2)
For all ε > 0, there exist a C,C such that the left-hand side of (D.2) is bounded above by
Ca + h j + 1ld ξba+hj+1 ξb αa + εh+aj+2 1 ξb αa + ε 1 Ca + h j + 1ld ξb αa + ε ξb a+hj+2.

(D.3)
In (D.3), take ε > 0 smaller than ε1(1 + ε2)(1 + ε1). Then the right hand side of (D.3) converges to 0 as h . This shows that the first term in the limit of (1) converges to 0. It follows that the second term also converges toward 0. As for the last term in the limit, a similar argument yields that all terms on the right hand side of (1) converge to 0 so that d4j 0.

Bibliographie

[1]     Andel, J., 1997. On residual analysis for time series models. Kybernetika 33(2), 161-170.

[2]     Beiser, A., 1985. Distributions of b1 and b2 for autoregressive errors. Ph.D. thesis, Boston University.

[3]     Brockett, P. L., Hinich, M. J., Patterson, D., 1988. Bispectral-based tests for the detection of Gaussianity and linearity in time series. J. Amer. Statist. Assoc. 83, 657-664.

[4]     Brockwell, P. J., Davis, R. A., 1991. Time series : Theory and Methods, 2nd Edition. Springer-Verlag New York.

[5]     Burn, D.A., 1987. Simulation of stationary time series. Proceedings of the 1987 Winter Simulation Conference, 289-294.

[6]     Cromwell, J. B., Labys, W. C., Terraza, M., 1994. Univariate tests for time-series models. Sage Publications Inc, Thousand Oaks, California.

[7]     D’Agostino, R. B., Stephens, M. A., 1986. Goodness-of-fit techniques. Statistics : TextBOOKs and Monographs, 68, New York : Marcel Dekker.

[8]     Ducharme, G.R., Lafaye de Micheaux, P., 2002. Goodness-of-fit tests of normality for the innovations in ARMA models. Tech. rep., Technical report #02-02, Université Montpellier II.

[9]     Epps, T. W., 1987. Testing that a stationnary time series is Gaussian. Ann. Statist. 15(4), 1683-1698.

[10]     Frances, P.H., 1998. Time series models for business and economic forecasting. Cambridge University Press, Cambridge.

[11]     Gasser, T., 1975. Goodness-of-fit tests for correlated data. Biometrika 62, 563-570.

[12]     Gouriéroux, C., Monfort, A., 1995. Séries temporelles et modèles dynamiques, 2nd Edition. Economica.

[13]     Granger, C. W. J., 1976. Tendency towards normality of linear combinations of random variables. Metrika 23(4), 237-248.

[14]     Henze, N., 1997. Do components of smooth tests of fit have diagnostic properties ? Metrika 45, 121-130.

[15]     Heuts, R.M.J., Rens, S., 1986. Testing normality when observations satisfy a certain low order ARMA-scheme. Computat. Statist. Quarterly 1, 49-60.

[16]     Hinich, M. J., 1982. Testing for Gaussianity and Linearity of a stationary time series. J. Time Ser. Anal. 3(3), 169-176.

[17]     Hipel, K. W., McLeod, A. I., 1994. Time series modelling of water resources and environmental systems. [Elsevier Science Publishing Co., New York ; North-Holland Publishing Co., Amsterdam] (New York ; Amsterdam).

[18]     Janic-Wroblewska, A., Ledwina, T., 2000. Data driven rank test for two-sample problem. Scand. J. Statist. 27, 281-298.

[19]     Jarque, C.M., Bera, A.K., 1987. A test for normality of observations and regression residuals. Internat. Statist. Review 55(2), 163-172.

[20]     Kallenberg, W.C.M., Ledwina, T, 1997a. Data driven smooth tests for composite hypotheses : comparison of powers. J. Statist. Comput. Simul. 59(2), 101-121.

[21]     Kallenberg, W.C.M., Ledwina, T, 1997b. Data-driven smooth tests when the hypothesis is composite. J. Amer. Statist. Assoc. 439(92), 1094-1104.

[22]     Koul, H. L., Stute, W., 1999. Nonparametric model checks for time series. Ann. Statist. 27(1), 204–236.

[23]     Ledwina, T., 1994. Data-driven version of Neyman’s smooth test of fit. J. Amer. Statist. Assoc. 89(427), 1000-1005.

[24]     Lee, S., Na, S., 2001. On the Bickel-Rosenblatt test for first-order autoregressive models. Statist. Probab. Lett. 56(1), 23-35.

[25]     Lomnicki, Z.A., 1961. Tests for departure from normality in the case of linear stochastic processes. Metrika 4, 37-62.

[26]     Lutkepohl, H., Schneider, W., 1989. Testing for normality of autoregressive time series. Comput. Statist. Quaterly 2, 151-168.

[27]     Moore, D. S., 1982. The effect of dependence on chi squared tests of fit. Ann. Statist. 10(4), 1163-1171.

[28]     Neyman, J., 1937. Smooth test for goodness of fit. Skand. Aktuar. 20, 149-199.

[29]     Ojeda, R., Cardoso, J.F., Moulines, E., 1997. Asymptotically invariant gaussianity test for causal invertible time series. Proc. of IEEE international conference on Acoustics, Speech and Signal Processing 5, 3713-3716.

[30]     Pierce, D. A., Gray, R. J., 1985. Goodness-of-fit tests for censored survival data. Ann. Statist. 13(2), 552-563.

[31]     Rayner, J.C.W., Best, D.J., 1989. Smooth Tests of Goodness-of-Fit. Oxford : Oxford University Press.

[32]     Sansone, G., 1959. Orthogonal functions. New York : Interscience.

[33]     Schwarz, G., 1978. Estimating the dimension of a model. Ann. Statist. 6(2), 461-464.

[34]     Shea, B. L., 1987. Estimation of multivariate time series. J. Time Ser. Anal. 8(1), 95-109.

[35]     Weisberg, S., Bingham, C, 1975. An approximate analysis of variance test for non-normality suitable for machine calculation. Technometrics 17, 133-134.

Chapitre 3
A multivariate empirical characteristic function test of independence with normal marginals

Cet article a été soumis à la revue Journal of Multivariate Analysis.

Comme la coutume dans cette discipline le veut, l’ordre alphabétique des auteurs a été respecté.

Voici la liste des contributions principales de Pierre Lafaye de Micheaux à cet article :

A multivariate empirical characteristic function test of independence with normal marginals
Martin Bilodeau and Pierre Lafaye de Micheaux
Département de mathématiques et de statistique, Université de Montréal, Canada
_ Abstract
This paper proposes a semi-parametric test of independence (or serial independence) between marginal vectors each of which is normally distributed but without assuming the joint normality of these marginal vectors. The test statistic is a Cramér-von Mises functional of a process defined from the empirical characteristic function. This process is defined similarly as the process of Ghoudi et al. (2001) built from the empirical distribution function and used to test for independence between univariate marginal variables. The test statistic can be represented as a V statistic. It is consistent to detect any form of dependence. The weak convergence of the process is derived. The asymptotic distribution of the Cramér-von Mises functionals are approximated by the Cornish-Fisher expansion using a recursive formula for cumulants and by the numerical evaluations of the eigenvalues in the inversion formula. The test statistic is finally compared with Wilks’ statistic for testing the parametric hypothesis of independence in the one-way MANOVA model with random effects.

Key words : Characteristic function, Independence, Multivariate Analysis, Serial independence, Stochastic processes

1991 MSC: 62H15, 62M99___________________________________________________________________

1. Introduction

Different characterizations have led to various tests of independence. Let p 1 be a fixed integer. Consider a partitioned random vector ε = ε(1),,ε(p) made up of p q-dimensional subvectors and a corresponding partitioned t = t(1),,t(p), for any fixed vector t. Independence of the subvectors may be characterized with the joint distribution function or characteristic function as

Kp(t) = k=1pK(k)(t(k)), (1.1) Cp(t) = k=1pC(k)(t(k)), (1.2)

where Kp and Cp are, respectively, the joint distribution function and joint characteristic function. The marginal versions are K(k) and C(k) for k = 1,,p. In the univariate setting (q = 1) Blum et al. (1961)(3) proposed an empirical process based on (1.1), whereas Csorgo (1981a)(8) defined a similar process based on (1.2). Feuerverger (1993)(16) proposes an empirical characteristic function version of the Blum et al. (1961)(3) test statistic. He points out difficulties with dimensions above 2.

Recently, in the univariate setting, Ghoudi et al. (2001)(18) introduced a new process based on their novel characterization of independence which is now presented. This characterization for p = 3 is implicit in the paper of Blum et al. (1961)(3). For any A Ip = {1,,p} and any t p, let

μA(t) = BA(1)ABK p(tB) jABK(j)(t(j)).

The notation A stands for cardinality of the set A and the convention = 1 is adopted. The vector tB is used to make a selection of components of t according to the set B,

(tB)(i) = t(i),i B; , i IpB.

Independence can be characterized : ε(1),,ε(p) are independent if and only if μA 0, for all A Ip satisfying A > 1. Cramér-von Mises type functional of an associated process then leads them to a non-parametric test of independence in the non-serial or serial situation. The interest in their process resides in the simple form of the covariance which is expressed as a product of covariance functions of Brownian bridge.

In the multivariate setting (q 1), the present paper proposes tests of independence, when subvectors or marginals are normally distributed, built from a process relying on a similar independence characterization based on characteristic functions. Note that the subvectors are not assumed to be jointly multinormal in which case independence can be tested parametrically with covariances using likelihood ratio tests. Namely, the marginals ε(1),,ε(p) are independent if and only if μA 0, for all A Ip, A > 1. Here,

μA(t) = BA(1)ABC p(tB) jABC(j)(t(j)),

where

(tB)(i) = t(i),i B; 0, i IpB.

Note that the subvectors are not assumed to be jointly multinormal in which case independence can be tested parametrically with covariances using likelihood ratio tests. Normality of the marginals will often be approximately satisfied. For example, data analysis of regression models on which Box-Cox transformations are done in the first stage are common in practice. One should bear in mind, however, that the asymptotic of the tests of independence proposed here would not consider the Box-Cox transformation as data dependent but as a fixed transformation. Goodness-of-fit tests of normality after a data dependent Box-Cox transformation by Chen et al. (2002)(? ) is a result in this direction ; it will not be pursued here for tests of independence.

The non-serial and serial problems are considered. It is shown that the asymptotic distribution of the proposed process is the same in both cases under the null hypothesis of independence. Moreover, it is established that the estimation of the unknown mean vector and positive definite covariance matrix of the normal marginals does not affect the asymptotic distribution of the process. The proposed Cramér-von Mises type of test statistic is related to V-statistics for which de Wet and Randles(1987)(13) studied the effect of estimating unknown parameters. Norm on Euclidian spaces m will be denoted , whereas will be the norm on the complex field . We also let C(pq, ) be the space of continuous functions from pq to .

2. Testing independence : the non-serial situation

2.1. The case of known parameters

Let ε = ε(1),,ε(p) pq denote a partition into p subvectors and ε1,,εn be an i.i.d. sample of such (pq)-dimensional random vectors. Suppose that the subvectors of the random vectors εi all have the same Nq(0,I) normal distribution, with characteristic function φ. The problem is that of testing the independence of the marginals, that is the independence of ε(1),,ε(p). This non-serial problem with known parameters is of very limited practical importance. However, it serves as a prototype on which subsequent results are based.

Following Ghoudi et al. (2001)(18), for any A Ip = {1,,p} and any t = (t(1),,t(p)) pq, let

Rn,A(t) = n BA(1)ABφ n,p(tB) iABφ(t(i))

(2.1)
where
φn,p(t) = 1 n j=1n exp(it,ε j)

(2.2)
is the empirical characteristic function of the sample. The notation , is for the usual inner product between vectors.

The asymptotic behaviour of these processes is stated next. It states that for different subsets A the associated processes are asymptotically independent, each process being asymptotically Gaussian with a covariance function of a particularly simple form. Specifically, the covariance function is a product of covariance functions of the type encountered by Feuerverger and Mureika (1977)(17), Csorgo (1981a)(8) or Marcus (1981)(26), for the empirical characteristic function process. Another process defined by Csorgo (1985)(10) has a covariance of a more complicated structure.

Theorem 2.1. If ε1(1),,ε 1(p) are independent, the collection of processes {Rn,A : A > 1} converge in C(pq, ) to independent zero mean complex Gaussian processes {RA : A > 1} having covariance function given by

CA(s,t) = kA[φ(t(k) s(k)) φ(t(k))φ(s(k))]

(2.3)
and pseudo-covariance function given by
C¯A(s,t) = E[RA(s)RA(t)] = CA(s,t).

(2.4)

The multinomial formula (Ghoudi et al. (2001)(18)) yields the equivalent representation

Rn,A(t) = 1 n j=1n kA[exp(it(k),ε j(k)) φ(t(k))].

(2.5)
This i.i.d. average representation is used in the proof of Theorem 2.1.

2.2. The case of unknown parameters

The context is the same as in the preceeding subsection except that the components of the random vectors εi in the sample now have all the same Nq(μ,Σ) normal distribution, where μ and Σ, positive definite, are unknown. The problem again is that of testing for independence of the marginals, that is the independence of the components ε(1),,ε(p).

First, define the standardized residual vectors ej(k) = S1 2 (εj(k) ε¯), where S = 1 np j=1n k=1p(ε j(k) ε¯)(ε j(k) ε¯)T and ε¯ = 1 np j=1n k=1pε j(k) are, respectively, the sample covariance matrix and the sample mean. Also, let ε¯(k) = 1 n j=1nε j(k) be the sample mean of the kth subvectors.

The underlying process is the same apart from the unknown parameters which are replaced by their sample estimates. The plug-in process is thus

R̂n,A(t) = n BA(1)ABφ̂ n,p(tB) iABφ(t(i))

(2.6)
where
φ̂n,p(t) = 1 n j=1n exp(it,e j)

(2.7)
is the empirical characteristic function based on standardized residuals ej = ej(1),,e j(p) pq. The asymptotic behaviour of these processes is stated next, the main conclusion being that the estimation of the unknown parameters does not affect the asymptotic distribution.

Theorem 2.2. If ε1(1),,ε 1(p) are independent, the processes {R̂n,A : A > 1} converge in C(pq, ) to independent zero mean complex Gaussian processes {RA : A > 1} having covariance and pseudo-covariance functions respectively given by CA(s,t) and C¯A(s,t) in (2.3) and (2.4).

The same multinomial formula (Ghoudi et al. (2001)(18)) yields the representation

R̂n,A(t) = 1 n j=1n kA[exp(it(k),e j(k)) φ(t(k))].

(2.8)
The Cramér-von Mises test statistic proposed is nTn,b,A, where for a given subset A
Tn,b,A = 1 n R̂n,A(t)2ϕ b(t)dt,

(2.9)
where ϕb is the Npq(0,b2I) density which acts as a weighting function. The multinomial representation and this appropriate weighting allow this test statistic to be computed explicitely as 1 n2 l=1n l=1n kA exp b2 2 el(k) e l(k)2 (2.10) (b2 + 1)q 2 exp 1 2 b2 b2 + 1el(k)2 (2.11) (b2 + 1)q 2 exp 1 2 b2 b2 + 1el(k)2 + (2b2 + 1)q 2 . (2.12)

Since squared Mahalanobis type statistics are affine invariant it follows that Tn,b,A is affine invariant. Thus, the asymptotic distribution of this statistic does not depend on unknown parameters.

It should be noted that the functional (2.9) defining this test statistic is not continuous ; it is not even defined on C(pq, ) but only on the subset of squared-integrable functions with respect to the measure ϕb(t)dt. Thus, the continuity theorem as in Billingsley (1968)(2) can not be invoked. In order to obtain the asymptotic distribution of this functional, the following generalization of Theorem 3.3 of Kellermeier (1980)(23) on a uniform integrability condition is proposed. Let jpq be the ball of radius j centered at zero in pq.

Theorem 2.3. Let yn and y be random elements of C(pq, )2 such that yn Dy on all compact balls. Let f : 2 be a continuous function and let G be a probability measure on pq. Define wn = f(yn(t))dG(t) and w = f(y(t))dG(t). Suppose that wn and w are well defined with probability one. Moreover, suppose that there exists α 1 such that for all ε > 0,

limj lim sup n pqjpqEf(yn(t))αdG(t) = 0.

(2.13)
Then wn Dw as n .

Using Theorem 2.3, the joint convergence of the Cramér-von Mises functionals can be established.

Theorem 2.4.

R̂n,A(t)2,R̂ n,B(t)2 ϕ b(t)dtD RA(t)2,R B(t)2 ϕ b(t)dt,

where integrals are computed componentwise.

All possible subsets A can then be simultaneously accounted for by combining the test statistics as in

Sn = n A>1Tn,b,A

(2.14)
or
Mn = n max A>1Tn,b,A.

(2.15)

2.3. Relation to V-statistics

The statistic Tn,b,A is in fact a V-statistic as in de Wet and Randles (1987)(13). It can be represented as

Tn,b,A = 1 n2 l=1n l=1nh(ε l,εl; λ̂n),

(2.16)
where λ̂n = (ε̄,S) consistently estimates the true parameter λ = (0,I). The function h at an arbitrary γ = (μ,Σ) is defined as
h(εl,εl; γ) = g(εl,t; γ)g(εl,t; γ)ϕb(t)dt,

where, from elementary properties of integrals of odd functions, the function g can be taken real-valued

g(εl,t; γ) = kA cos(t(k),Σ12(ε l(k) μ)) (2.17) + sin(t(k),Σ12(ε l(k) μ)) φ(t(k)). (2.18)

Letting μ(t; γ) = E(0,I)g(εl,t; γ), it is seen that Tn,b,A is a V-statistic which falls into case I situation. In de Wet and Randles (1987)(13), they refer to case I when all first order partial derivatives of μ(t; γ) evaluated at the true parameter γ = λ vanish. Otherwise, they refer to case II. This is case I here since only A’s such that A > 1 are considered. Thus, the asymptotic distribution of Tn,b,A is the same whether one uses λ̂n or λ in (2.16). It is not clear, however, how this argument would apply to the joint distribution of Tn,b,A and Tn,b,B. The proof of Theorem 2.4 in the Appendix does not use de Wet and Randles (1987)(13).

For subsets A, A = 1, the statistic Tn,b,A reduces to the statistic used by Baringhaus and Henze (1988)(1) and Henze and Zirkler (1990)(20) to test normality of a given marginal. They showed that the asymptotic distribution is affected by the estimation of the unknown parameters by establishing case II of de Wet and Randles (1987)(13). Henze and Wagner (1997)(19) treated the same problem with an approach based on stochastic processes.

2.4. Consistency

Consider the alternatives whereby ε(1),,ε(p) are distributed as Nq(μ,Σ), but are not independent. Then, Sn and Mn . Thus, the test statistics Sn and Mn in (2.14)-(2.15) are consistent against any such alternatives.

The argument to establish consistency is rather trivial as in Baringhaus and Henze (1988)(1). Recall that Cp() is the joint characteristic function of ε(1),,ε(p). This argument consists of the following almost sure convergence :

Tn,b,A = BA(1)AB exp(i kBt(k),S1 2 ε¯) (2.19) ×1 n j=1n exp(i kBt(k),S1 2 εj(k)) iABφ(t(i))2ϕ b(t)dt (2.20) as BA(1)AB exp(i kBt(k),Σ1 2 μ) (2.21) ×Cp((Ip Σ1 2 )tB) iABφ(t(i))2ϕ b(t)dt (2.22)

which equals 0 for all A, A > 1, if and only if ε(1),,ε(p) are independent Nq(μ,Σ). Therefore, if ε(1),,ε(p) are dependent Nq(μ,Σ) then, there exists an A such that nTn,b,A which suffices to have Sn and Mn .

3. Testing independence : the serial situation

3.1. The case of known parameters

Let u1,u2, be a stationary sequence of random vectors ui distributed as Nq(0,I). It is desired to verify whether the ui’s are independent. Introduce the partitioned random vectors εi = (ui,,ui+p1) pq, i = 1,,n p + 1. Also, let Rn,A(t) be as in (2.1) with the slight modification φn,p(t) = 1 n j=1np+1 exp(it,ε j). The main result related to the asymptotic distribution is that the m-dependence introduced by the overlapping of the ui’s does not affect the asymptotic distribution. It is the same as in the non-serial case.

Theorem 3.1. If the ui’s are independent, the collection of processes {Rn,A : A > 1} converge in C(pq, ) to independent zero mean complex Gaussian processes {RA : A > 1} having covariance and pseudo-covariance functions respectively given by CA(s,t) and C¯A(s,t) in (2.3) and (2.4).

As in (2.5), the multinomial formula (Ghoudi et al. (2001)(18)) yields

Rn,A(t) = 1 n j=1np+1 kA[exp(it(k),ε j(k)) φ(t(k))].

(3.1)
which is useful in the proof of Theorem 3.1.

3.2. The case of unknown parameters

The context is the same as in the preceeding section but here the ui’s all have the same Nq(μ,Σ) normal distribution, where μ and Σ, positive definite, are assumed unknown. Again, we want to test whether the ui’s are independent. To this aim, define the random vectors εi = (ui,,ui+p1) pq and ei = (ûi,,ûi+p1) pq, i = 1,,n p + 1. Also, define the standardized residuals ûi = S1 2 (ui u¯) with the sample covariance matrix S = 1 n j=1n(u j u¯)(uj u¯)T and the sample mean u¯ = 1 n j=1nu j. Now, let
R̂n,A(t) = n BA(1)ABφ̂ n,p(tB) iABφ(t(i)),

(3.2)
where φ̂n,p(t) = 1 n j=1np+1 exp(it,e j). The asymptotic behaviour of these processes is stated next. The main conclusion is that the estimation of the unknown parameters does not affect the asymptotic distribution.

Theorem 3.2. If the ui’s are independent, the processes {R̂n,A : A > 1} converge in C(pq, ) to independent zero mean complex Gaussian processes {RA : A > 1} having covariance and pseudo-covariance functions respectively given by CA(s,t) and C¯A(s,t) in (2.3) and (2.4).

Note that the multinomial formula yields

R̂n,A(t) = 1 n j=1np+1 kA[exp(it(k),e j(k)) φ(t(k))]

(3.3)
and so the Cramér-von Mises test statistic
Tn,b,A = 1 n R̂n,A(t)2ϕ b(t)dt

can be computed as

1 n2 l=1np+1 l=1np+1 kA exp b2 2 el(k) e l(k)2 (3.4) (b2 + 1)q 2 exp 1 2 b2 b2 + 1el(k)2 (3.5) (b2 + 1)q 2 exp 1 2 b2 b2 + 1el(k)2 + (2b2 + 1)q 2 . (3.6)

This representation shows that Tn,b,A is affine invariant. Here again we can use theorem 2.3 to obtain

Theorem 3.3.

R̂n,A(t)2,R̂ n,B(t)2ϕ b(t)dtD RA(t)2,R B(t)2 ϕ b(t)dt,

(3.7)
where integrals are computed componentwise.

In the serial situation, a subset A and its translate A + k essentially lead to the same statistic Tn,b,A. Hence, when considering these statistics, only A’s such that 1 A can be considered. The same statistics (2.14) or (2.15) can be used to perform the statistical test :

Sn = n A>1,1ATn,b,A,Mn = n max A>1,1ATn,b,A.

4. Properties of the limiting processes

This section shows how to compute the critical values of the Cramér-von Mises variable Tb,A RA(t)2ϕ b(t)dt. This can be achieved either by computing its cumulants and then applying the Cornish-Fisher asymptotic expansion (see Lee and Lin (1992) ; Lee and Lee (1992)(? 24)) or by inversion of the characteristic function (see Imhof(1961)(21) or an improved version of this algorithm introduced by Davies (1973, 1980)(? 12) or Deheuvels and Martynov (1996)(14)) after evaluation of the eigenvalues of CA.

The Cramér-von Mises test statistic in (2.9)can also be written in terms of a real process :

Tn,b,A = 1 n Wn,A2(t)ϕ b(t)dt

where Wn,A(t) = 1 n j=1n kA cos(t(k),e j(k)) + sin(t(k),e j(k)) φ(t(k)) is a real process which converges to a real Gaussian process with the same covariance function CA(s,t) as in (2.3). Thus the usual Karhunen-Loève expansion holds.

Let k = A. It is well known that

pqRA(t)2ϕ b(t)dt Tb,A = (i 1,,ik)kλ(i1,,ik)Z(i1,,ik)2

(4.1)
where Z(i1,,ik) are standard i.i.d. CN(0, 1) complex random variables. Also, it is easy to show that λ(i1,,ik) = l=1kλ il where the λj’s are the eigenvalues of the integral operator O defined by
O(f)(s) = qf(t)K(s,t)ϕb(t)dt

(4.2)
with
K(s,t) = exp(1 2st2) exp(1 2(s2 + t2)).

(4.3)
That is to say the problem is to solve, in λ (and f), the linear second order homogeneous Fredholm integral equation
λf(s) = qf(t)K(s,t)ϕb(t)dt.

(4.4)
See Conway (1985)(5) for an introduction to integral operators.

It does not seem possible to solve (4.4) explicitely, but one can compute its eigenvalues using a relation as

qf(ν)eν2 dν = j=1NB jf(νj)

(4.5)
where the parameters Bj and νj = (νj,1,,νj,q), j = 1,,N could be respectively the coefficients and the points of a cubature formula (CF), or also could be obtained by a Monte-Carlo experiment, in which case Bj = 1 N and νj N(0,I), j = 1,,N. A good rule of the thumb is to use a cubature formula when b is small, for example less than one, otherwise use Monte Carlo method.

We used the following formulas : the nth degree Gauss quadrature formula when q = 1, the fifteenth degree CF E2r2 : 15 1 (see (Stroud, 1971, p. 326)(29)) when q = 2 and the seventh degree CF Eqr2 : 7 2 appearing in (Stroud, 1971, p. 319)(29) for q 3. This last formula contains an error, see Stroud (1967)(28) for details.

Moreover one can notice that all the cumulants of Tb,A can be computed explicitly. In fact, the m-th cumulant κm,A(b) of Tb,A in (4.1) is given by

κm,A(b) = (m 1)! qK(m)(x,x)ϕ b(x)dxA

(4.6)
where
ϕb(x) = (2π)q2bq exp(1 2x2b2)

(4.7)
and where K(1)(x,y) = K(x,y) and K(m)(x,y) = qK(m1)(x,z)K(z,y)ϕ b(z)dz, m 2.

Define

Ix,y(1)(α,β,γ) = exp(αx2 + βy2 + γx,y) (4.8) Ix,y(m)(α,β,γ) = qIx,z(m1)(α,β,γ)K(z,y)ϕ b(z)dz, m 2. (4.9)

One can show

K(m)(x,x) = I x,x(m) 1 2,1 2, 1 Ix,x(m) 1 2,1 2, 0

(4.10)
and the recurrence relation
Ix,x(m)(α,β,γ) = χ β Ix,x(m1)(α γ2 4c,1 4c 1 2,2γ 4c ) Ix,x(m1)(α γ2 4c,1 2, 0) ,

where χβ = (2π)q2bq q ecz2dz = (1 + b2 2b2β)q2 and 1 4c = b2 2+2b24b2β. Thus, one can express K(m)(x,x) in terms of Ix,x(1) and then use the relation

qIx,x(1)(α,β,γ)ϕ b(x)dx = [1 2b2(α + β + γ)]q2

(4.11)
to obtain all the cumulants recursively. Note that this permits to double-check into the preceding computation of the eigenvalues through the following relation
κm,A(b) = (m 1)! j=1λ jm A.

(4.12)
Note also that one only needs to compute the cumulants κm(b)’s for the case A = 1 since
κm,A(b) = κm(b) A(m 1)! 1A.

(4.13)
The CF’s used here are not the only one available to obtain estimates λ̂j of the λj’s. A good choice is one that minimises
κm(b) (m 1)! j=1λ̂ jm.

(4.14)
See Cools (1999)(6) and Cools and Rabinowitz (1993)(7) for a comprehensive list of such formulas.
Table 4.1 provides an approximation of the cut-off values obtained from the Cornish Fisher asymptotic expansion based on the first six cumulants, for b = 0.1.
TAB. 4.1: Critical Values of the Distribution of Tb,A for b = 0.1.







q=2
q=3






1α k=2 k=3 k=4 k=2 k=3 k=4







0.900 0.0007331.230e-052.122e-070.001373.347e-058.707e-07
0.905 0.0007451.244e-052.140e-070.001383.370e-058.741e-07
0.910 0.0007581.260e-052.159e-070.001403.393e-058.777e-07
0.915 0.0007711.276e-052.179e-070.001423.418e-058.815e-07
0.920 0.0007851.293e-052.200e-070.001433.444e-058.854e-07
0.925 0.0008001.311e-052.222e-070.001453.472e-058.896e-07
0.930 0.0008151.330e-052.246e-070.001473.501e-058.939e-07
0.935 0.0008321.350e-052.271e-070.001493.531e-058.986e-07
0.940 0.0008501.372e-052.298e-070.001523.564e-059.035e-07
0.945 0.0008701.395e-052.326e-070.001543.600e-059.088e-07
0.950 0.0008911.421e-052.357e-070.001573.638e-059.145e-07
0.955 0.0009151.449e-052.392e-070.001603.680e-059.207e-07
0.960 0.0009411.480e-052.429e-070.001633.726e-059.275e-07
0.965 0.0009711.514e-052.471e-070.001673.777e-059.351e-07
0.970 0.0010051.554e-052.519e-070.001713.836e-059.437e-07
0.975 0.0010451.601e-052.575e-070.001763.904e-059.537e-07
0.980 0.0010931.657e-052.643e-070.001823.985e-059.656e-07
0.985 0.0011561.729e-052.728e-070.001904.088e-059.805e-07
0.990 0.0012431.828e-052.845e-070.002004.229e-051.000e-06
0.995 0.0013901.994e-053.039e-070.002174.460e-051.033e-06









TAB. 4.2: Distribution of Tb,A for b = 0.1.







x
q=2
q=3






P[Tb,2 x]P[Tb,3 x]P[Tb,4 x]P[Tb,2 x]P[Tb,3 x]P[Tb,4 x]







0.0 0.593 0.566 0.546 0.562 0.536 0.520
0.2 0.665 0.640 0.622 0.637 0.612 0.598
0.4 0.725 0.705 0.691 0.702 0.683 0.671
0.6 0.777 0.761 0.751 0.759 0.745 0.737
0.8 0.819 0.809 0.802 0.807 0.798 0.794
1.0 0.854 0.848 0.845 0.848 0.843 0.842
1.2 0.883 0.881 0.880 0.880 0.880 0.881
1.4 0.906 0.907 0.908 0.907 0.910 0.913
1.6 0.925 0.928 0.931 0.928 0.933 0.937
1.8 0.941 0.944 0.948 0.945 0.951 0.955
2.0 0.953 0.957 0.961 0.958 0.964 0.969
2.2 0.963 0.967 0.972 0.968 0.974 0.979
2.4 0.970 0.975 0.979 0.976 0.982 0.986
2.6 0.977 0.981 0.985 0.982 0.987 0.990
2.8 0.982 0.986 0.989 0.986 0.991 0.994
3.0 0.985 0.989 0.992 0.990 0.994 0.996
3.2 0.988 0.992 0.994 0.992 0.996 0.997
3.4 0.991 0.994 0.996 0.994 0.997 0.998
3.6 0.993 0.995 0.997 0.996 0.998 0.999
3.8 0.994 0.996 0.998 0.997 0.998 0.999
4.0 0.995 0.997 0.998 0.997 0.999 0.999








As in Ghoudi et al. (2001)(18), define Tb,A to be the standardised version of Tb,A. Then Table 4.2 provides the distribution function of this statistic for some values of A and q, with b = 0.1 as approximated by Davies technique. Besides, a C++ program is available from the authors which permits to compute any cut-off value given the nominal level and vice-versa. In Table 4.3, one can find the empirical percentage points of nTn,b,A (n = 20, 50, 100; b = 0.1, 0.5, 1.0, 3.0; α = 0.1, 0.05) based on N = 10000 Monte Carlo replications, in the non serial case.
TAB. 4.3: Empirical Percentage Points of nTn,b,A based on N = 10000 Monte Carlo replications : non-serial case.









q=2
q=3






1α b n k = 2k = 3k = 4k = 2k = 3k = 4









20 0.000718 1.249e-05 2.371e-07 0.00134 3.460e-05 1.033e-06
0.1 50 0.000718 1.253e-05 2.358e-07 0.00136 3.420e-05 9.748e-07
100 0.000733 1.229e-05 2.242e-07 0.00137 3.389e-05 9.430e-07








20 0.170 0.049 0.016 0.269 0.110 0.050
0.5 50 0.167 0.0481 0.0156 0.266 0.107 0.0484
0.9 100 0.168 0.0471 0.0149 0.266 0.106 0.0472








20 0.561 0.333 0.221 0.727 0.555 0.450
1.0 50 0.558 0.327 0.213 0.724 0.547 0.440
100 0.559 0.326 0.209 0.723 0.544 0.436








20 0.938 0.864 0.820 0.981 0.969 0.958
3.0 50 0.940 0.860 0.814 0.983 0.967 0.955
100 0.938 0.858 0.811 0.983 0.966 0.954









20 0.000854 1.485e-05 3.036e-07 0.00151 3.849e-05 1.212e-06
0.1 50 0.000883 1.465e-05 2.840e-07 0.00156 3.751e-05 1.103e-06
100 0.000874 1.429e-05 2.653e-07 0.00155 3.724e-05 1.027e-06








20 0.191 0.0539 0.0184 0.289 0.114 0.053
0.5 50 0.192 0.0520 0.0168 0.289 0.111 0.0505
0.95 100 0.191 0.0513 0.0158 0.289 0.110 0.0486








20 0.606 0.345 0.228 0.749 0.562 0.456
1.0 50 0.603 0.338 0.218 0.748 0.553 0.445
100 0.593 0.336 0.213 0.745 0.549 0.439








20 0.955 0.868 0.823 0.984 0.970 0.959
3.0 50 0.954 0.863 0.816 0.987 0.967 0.956
100 0.953 0.861 0.813 0.986 0.966 0.955










In Table 4.4, one can find the empirical percentage points of nTn,b,A (n = 20, 50, 100; b = 0.1, 0.5, 1.0, 3.0; α = 0.1, 0.05) based on N = 10000 Monte Carlo replications, in the serial case, for p = 4.
TAB. 4.4: Empirical Percentage Points of nTn,b,A based on N = 10000 Monte Carlo replications for p = 4 : serial case.









q=2
q=3






1α b n k = 2k = 3k = 4k = 2k = 3k = 4









20 0.000712 1.174e-05 2.139e-07 0.00132 3.384e-05 9.931e-07
0.1 50 0.000721 1.205e-05 2.288e-07 0.00135 3.357e-05 9.867e-07
100 0.000716 1.217e-05 2.261e-07 0.00138 3.335e-05 9.530e-07








20 0.173 0.0504 0.0171 0.276 0.1160 0.0547
0.5 50 0.168 0.0484 0.0162 0.269 0.109 0.050
0.9 100 0.170 0.0476 0.0155 0.268 0.107 0.048








20 0.567 0.353 0.241 0.737 0.586 0.486
1.0 50 0.561 0.334 0.223 0.728 0.559 0.458
100 0.561 0.328 0.216 0.727 0.55 0.447








20 0.937 0.880 0.842 0.984 0.975 0.966
3.0 50 0.939 0.867 0.825 0.983 0.970 0.960
100 0.939 0.862 0.818 0.983 0.968 0.957









20 0.000841 1.414e-05 2.734e-07 0.00148 3.804e-05 1.176e-06
0.1 50 0.000897 1.412e-05 2.780e-07 0.00154 3.772e-05 1.128e-06
100 0.000861 1.435e-05 2.737e-07 0.00154 3.688e-05 1.071e-06








20 0.196 0.0556 0.0193 0.296 0.122 0.0587
0.5 50 0.192 0.0528 0.0179 0.292 0.114 0.0539
0.95 100 0.189 0.0517 0.0168 0.290 0.111 0.0509








20 0.615 0.367 0.255 0.759 0.598 0.498
1.0 50 0.604 0.346 0.231 0.753 0.567 0.466
100 0.602 0.340 0.222 0.751 0.557 0.452








20 0.955 0.886 0.849 0.985 0.976 0.968
3.0 50 0.953 0.871 0.829 0.987 0.971 0.961
100 0.952 0.865 0.822 0.987 0.969 0.958










5. One-way MANOVA model with random effects

The one way linear model with random effects

εi(j) = μ + α i + δi(j), i = 1,,n; j = 1,,p,

where αi Nq(0,ψ) and δi(j) N q(0,Σ) are all mutually independant, provides a joint normal model for the non-serial case. This means that in this variance component model

εi = (εi(1),,ε i(p)) N pq(1p μ, (Ip Σ) + (1p1pT) ψ),i = 1,,n,

are i.i.d. The test of independance amounts to the parametric test of the hypothesis H0 : ψ = 0. The MANOVA decomposition

i=1n j=1p(ε i(j) ε¯ ())(ε i(j) ε¯ ())T = i=1n j=1p(ε i(j) ε¯ i())(ε i(j) ε¯ i())T (5.1) + p i=1n(ε¯ i() ε¯ ())(ε¯ i() ε¯ ())T (5.2) = E + H, (5.3)

(see Rencher (2002)(27)), leads to the usual MANOVA table. A dot means averaging over the corresponding index.



TAB. 5.1: MANOVA table
Source Sum of squares

Degrees of freedom

Expected sum of squares

Null distribution





BetweenH

νH = n 1

(n 1)Σ + p(n 1)ψ

Wq(n 1,Σ)
Within E

νE = n(p 1)

n(p 1)Σ

Wq(n(p 1),Σ)





Total E + H


The test of H0 : ψ = 0 is usually done with Wilks statistic
Λ = detE det(E + H) = k=1q 1 + λ k(E1H) 1,

where λk(E1H) are the eigenvalues of E1H. The null distribution of Λ is the Λq,νH,νE distribution. Tables of exact critical points for Λ are available but for q = 2 the relation

(νE 1) νH (1 Λ1 2 ) Λ1 2 F2νH,2(νE1)

holds.


TAB. 5.2: Empirical power of nTn,b,A and Wilks test based on N = 10000 Monte Carlo replications for p = 2, q = 2, μ = 0, Σ = γI2 and Ψ = θI2.