Wednesday, June 20	Thursday, June 21	Friday, June 22
08:30 - 09:15 Welcome
09:15 - 09:30 Opening session
09:30 - 11:00 P. Bühlmann	09:00 - 10:30 P. Bühlmann	09:00 - 10:30 S. Mallat
11:00 - 11:30 Coffee break	10:30 - 11:00 Coffee break	10:30 - 11:00 Coffee break
11:30 - 12:30 M. Hoffman	11:00 - 12:00 A. Samson	11:00 - 12:30 Y. Ingster
12:30 - 14:30 Lunch	12:00 - 14:00 Lunch	12:30 - 14:00 Lunch
14:30 - 16:00 Y. Ingster	14:00 - 15:30 S. Mallat
16:00 - 16:30 Coffee break	15:30 - 16:00 Coffee break
16:30 - 17:30 P. Mathé	16:00 - 17:00 C. Lacour
17:30 - 18:30 C. Marteau	17:00 - 18:00 M. Lerasle
20:00 Gala dinner	Fête de la musique

Detailed Program

Wednesday, June 20

09:30 - 11:00 P. Bühlman

Causal inference in high dimensions I

Understanding cause-effect relationships between variables is of great interest in many fields of science. An ambitious but highly desirable goal is to infer causal effects from observational data obtained by observing a system of interest without subjecting it to interventions. This would allow to circumvent severe experimental constraints or to substantially lower experimental costs. Our main motivation to study this goal comes from applications in biology.

We first introduce main concepts of causal inference, graphical models and identifiability of causal effects. We then focus on high-dimensional estimation based on multiple testing in directed graphical models (i.e., the PC-algorithm), highlight exciting possibilities and emphasize fundamental limitations. In view of the latter, statistical modeling needs to be complemented with experimental validations: we discuss this in the context of molecular biology for yeast (Saccharomyces Cerevisiae) and the model plant Arabidopsis Thaliana.
11:30 - 12:30 M. Hoffmann

Statistical inference for structured populations alimented by transport-fragmentation

We investigate inference in simple models that describe the evolution (in size or age) of a population of bacteria across scales. The size of the system evolves according to a transport-fragmentation equation: each individual grows with a given transport rate, and splits into two offsprings, according to a binary fragmentation process with unknown division rate that depends on its size. Macroscopically, the system is well approximated by a PDE and statstical inference transfers into a nonlinear inverse problem. Microscopically, a more accurate description is given by a stochastic piecewise deterministic Markov process, which allows for other methods of inference, introducing however stochastic dependences. We will discuss and present some very simple results on the inference of the parameters of the system across scales. Real data analysis is conducted on E. Coli experiments. This is a joint ongoing work with M. Doumic (INRIA and Paris 6), N. Krell (Rennes 1) and L. Robert (INRA).
14:30 - 16:00 Y. Ingster

Estimation and detection for ellipsoids: application to high-variable functions I

A lot of classes of high-variable functions of interest correspond to ellipsoids of infinite dimension. We discuss the problems of estimation and detection of infinite-dimensional vectors from ellipsoids, and we show how the efficiency in the problems is connected with the asymptotics of "the count functions" of the coefficients of ellipsoids. We describe a new method for the study of the asymptotics of count functions based on probabilistic machinery.
16:30 - 17:30 P. Mathé

Regularization of statistical inverse problems in Hilbert space

We review linear inverse problems of the form $y^{\sigma} = A x + \sigma \xi $ in Hilbert space, i.e., when the operator $A\colon X \to Y$ acts between Hilbert spaces. The data $y^{\sigma}$ are noisy, and the noise $\xi$ is assumed to be Gaussian white noise, $\sigma$ denotes the noise level. The stable reconstruction of $x$ based on noisy data requires 'regularization'. This can be achieved by either 'discretization', or by classical 'regularization schemes'. In general, the solution is sought within a parametric family, say $x_{\alpha}^{\sigma},\ \alpha >0$, where $\alpha$ is a regularization parameter. The quality of $x_{\alpha}^{\sigma}$ strongly depends on a correct choice of $\alpha$. In statistics this is related to 'model selection'. The survey study [2] highlights such problems from a statistical perspective. We will discuss two types of model selection. First we review the Lepski parameter choice under discretization. However, if the operator $A$ is a Hilbert--Schmidt operator, then the (symmetrized) data $z^{\sigma} := A^{\ast}y^{\sigma}$ belong to the Hilbert space $X$ almost surely, and the discrepancy $\|A^{\ast}( A x_{\alpha}^{\sigma} - y^{\sigma})\|$ is well defined. In this case classical regularization theory proposes to use the 'discrepancy principle', i.e.,\ choosing $\alpha$ such that $\|A^{\ast}( A x_{\alpha}^{\sigma} - y^{\sigma})\|\asymp \sigma$. It turns out that this simple usage will not be optimal, and a weighted discrepancy should be used, instead. This approach has recently be analyzed for general 'linear regularization' and also for 'conjugate gradient iteration' [1], and in [3]. The analysis of (variants of the) discrepancy principle highlights interesting aspects of regularization theory, and we shall explain this.

References
[1] Gilles Blanchard and Peter Mathé. Discrepancy principle for statistical inverse problems with application to conjugate gradient regularization. Technical re- port, University of Potsdam, 2011.
[2] L. Cavalier. Nonparametric statistical inverse problems. Inverse Problems, 24(3):034004, 19, 2008.
[3] Shuai Lu and Peter Mathé. Varying discrepancy principle as an adaptive pa- rameter selection in statistical inverse problems. submitted, 2012.
17:30 - 18:30 C. Marteau

Testing strategies for inverse problems
Joint work with Béatrice Laurent and Jean-Michel Loubes

We consider non-parametric ill-posed inverse problem models. We will present minimax rates related to goodness-of-fit testing problems in such models that have been obtained in various situations, both from asymptotic and non-asymptotic point of views. We will also propose test strategies attaining these rates, being easy to implement and robust with respect to the characteristics of the operator. In particular, we prove that the inversion of the involved operator is not always necessary. This result provides interesting perspectives, for instance in the specific cases where the operator is difficult to handle.

Thursday, June 21

09:00 - 10:30 P. Bühlman

Causal inference in high dimensions II

In the second lecture, we largely focus on penalized maximum likelihood estimation in general directed acyclic graphs and more constrained models. We present novel results on high-dimensional estimation, discuss alternatives to the so-called strong faithfulness condition, and we present computational algorithms for (strongly) non-convex optimization of the penalized likelihood. Identifiability gains are possible in presence of interventional data, and we outline aspects of active learning for directed acyclic graphs. Various methods and algorithms are illustrated on examples.
11:00 - 12:00 A. Samson

Parametric estimation with particle filter in mixed models defined by stochastic differential equations
Joint work with Sophie Donnet (Univ. Paris Dauphine)

Biological processes are generally measured repeatedly along time for several subjects. The classical statistical approach to analyze these longitudinal data is mixed models (Pinheiro et Bates 2000). In biology, the regression function of these mixed models is often described by deterministic models based on ordinary differential equations. However, these functions are not satisfactory when biological processes involved a random behavior which can not be neglected. Thus we consider physiological models based on stochastic differential equations (SDE). Our aim is to propose a parametric estimation method for mixed models defined by SDE.
Parametric estimation of SDE has been widely studied (Ait Sahalia 2002). However, their extension to mixed models is not simple because the likelihood of these models is not explicit. Several approaches have been proposed for mixed models with a deterministic regression function (Davidian et Giltinan 1995, Pinheiro et Bates 2000, Kuhn et Lavielle 2005). Especially, Kuhn et Lavielle (2005) propose to couple the SAEM algorithm with a Markov Chain Monte Carlo (MCMC) method. In the context of mixed models defined by SDE, Tornoe et al (2005) propose a method based on the extended Kalman filter but the convergence of their method is not proved. Donnet et Samson (2008) propose a method based on the SAEM algorithm coupled with a MCMC algorithm. However, the MCMC algorithm, which simulates the hidden SDE at each time sequentially, does not use the temporal specific of the diffusion process. Therefore its practical convergence properties are not satisfactory.
In this work, we propose to combine the SAEM algorithm with the Particle MCMC proposed by Andrieu et al (2010), which makes use of the temporal structure of the hidden process. We prove the convergence of the estimation algorithm to the maximum of the likelihood. The properties of the method are illustrated on simulated data for a stochastic volatility model which is time inhomogeneous.

References
Ait-Sahalia, Y. (2002), Maximum likelihood estimation of discretely sampled diffusions:a closed-form approximation approach, Econometrica, 70, 223-262.
Andrieu, C., Doucet, A. et Holenstein, R. (2010), Particle Markov chain Monte Carlo methods, J. R. Statist. Soc. B, 72, 1-33.
Davidian, M. et Giltinan, D.M. (1995), Nonlinear models to repeated measurement data, Chapman and Hall.
14:00 - 15:30 S. Mallat

Lecture 1. Invariant Scattering Kernels for Classification

Signal classification in high dimension requires to reduce signal variability within each class, while preserving enough information for discriminability. Invariants to the action of groups such as translation, rotations or scaling are important for many signal classification tasks. Such invariants must also be stable to the action of deformations. We introduce scattering kernels, which are invariant to the action of groups and stable to the action of deformation diffeomorphisms. They are constructed with deep convolution neural networks, which iterate over wavelet transforms and modulus operators. The mathematical properties of scattering transforms will be studied with numerical examples on audio signals and images. Applications to image classification will be shown.
16:00 - 17:00 C. Lacour

Nonparametric estimation of conditional distributions

We consider a sample of independent and identically distributed observations $(X_i,Y_i)_{1\leq i\leq n}$ .The aim of this talk is to give a survey of some recent methods to estimate the conditional density of $Y_i$ given $X_i$, defined by $$f(x,y)dy= P(Y_i\in dy|X_i=x). $$

After giving some motivation for this issue, we will first study a penalized projection estimator and give a result in term of $L^2$ distance between the conditional density and its estimator. This result can be extended to the case of dependent variables : mixing variables or Markov chains. We deduce the minimax rates of convergence for conditional density in anisotropic function spaces. We also mention the estimation with warped bases (Chagny (2011)).

Next we will present a penalized maximum likelihood estimator, for which oracle inequality in Kullback divergence can be obtained (Cohen and Le Pennec (2011))

The last part will be devoted to the estimation at a fixed point, for which we used the method of Goldenshluger and Lepski (2011).

The talk is based on joint works with N. Akakpo, K. Bertin, E. Brunel, F. Comte, V. Rivoirard.
17:00 - 18:00 M. Lerasle

V-fold cross-validation and V-fold penalization in least-squares density estimation
Joint work with Sylvain Arlot (CNRS, Ecole Normale Supérieure)

V-fold cross validation (VFCV) is a widely used method of selection and calibration of estimators. In this work, we are interested in understanding the influence of the parameter V on the performances in model selection of V-fold procedures such as VFCV (see Geisser, 1975) and V-fold penalties (see Arlot, 2008).
It is known that, if V is bounded, VFCV is asymptotically biased and the selected estimator is asymptotically suboptimal : it satisfies an oracle inequality with leading constant larger than 1. This problem can be solved using a corrected criterion (see Burman (1989)) or V-fold penalties as in Arlot (2008). This fact is however not sufficient to understand completely the influence of V from a non-asymptotic point of view. In particular, simulations show that V-fold penalized estimators exhibit very different performances as V varies even if the corresponding criterions are unbiased.

The intuition is that choosing a larger V reduces the variance of the criterion. Actually, taking the mean over V partitions should reduce the hazard of the choice of a particular cutting. Moreover, a smaller variance of the criterion induces better performances for model selection. This intuition is based on an asymptotic computation of the variance of the V-fold criterion in a particular example (see Burman (1989)). In this work, we want to highlight these points theoretically and non-asymptotically.

We work in the least-squares density estimation framework. We first show that several well-known cross-validation procedures as VFCV or leave-p-out and exchangeable resampling-based penalization procedures are special instances of V-fold penalization methods. We prove non-asymptotic oracle inequalities for general V-fold penalization methods and V-fold cross-validation criterions, valid for all V, with leading constant asymptotically equal to 1 when the criterions are unbiased and a second order term getting better when V increases. We compute exactly the variance of V-fold criterions, showing in particular the precise dependence on V. For unbiased criterions, when V grows from 2 to n, this variance is reduced by a multiplicative factor and not by an order of magnitude; this explains why, in practice, it is sufficient to consider V=5 or V=10. We finally illustrate these results on a simulation study.

References
ARLOT Sylvain, 2008, V-fold cross-validation improved: V-fold penalization, arXiv:0802.0566v2.
BURMAN Prabir, 1989, A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods, Biometrika, 76(3):503--514.
GEISSER Seymour, 1975, The predictive sample reuse method with applications. J. Amer. Statist. Assoc., 70:320--328.

Friday, June 22

09:00 - 10:30 S. Mallat

Lecture 2: Scattering Representation of Stationary Processes

The representation of stationary processes remains an open problem to classify image and audio textures. Although stationary, such textures are not Gaussian neither Markovian. They include transient multiscale structures. High order moment estimators are not be used for classification because their variance is too large. Expected scattering coefficients of a stationary process are computed by iterating on wavelet transforms and modulus operators. They have low variance estimators and depend upon normalized high order moments. They provide new representations of stationary processes, which also characterizes multifractal properties. State of the art image texture classification is demonstrated on standard texture data bases. Synthesis of audio textures is also shown, together with musical genre classification.
11:00 - 12:30 Y. Ingster

Estimation and detection for ellipsoids: application to high-variable functions II