Chapter 11 Misc Topics

Here we put some technical details that are useful to be included in the book but not so important to add into the main chapter.

11.1 Delta Method

Lemma 11.1 Let \((X,Y),(X_1,Y_1),\dots,(X_N,Y_N)\) be i.i.d. random variables such that \[ \sqrt{N}\left(\overline{X}-\mathrm{E}(X), \overline{Y}-\mathrm{E}(Y)\right) \] is asymptotically normal. Let \(\mu = \frac{\mathrm{E}(Y)}{\mathrm{E}(X)}\). Then \[ \frac{\sum Y}{\sum X} \to \mu \quad a.s. \] Moreover, \[\begin{equation} \sqrt{N}\left(\frac{\sum (Y - \mu X)}{\sum X} \right) - \sqrt{N}\left(\frac{\sum (Y - \mu X)}{N\mathrm{E}(X)}\right) \to 0 \quad \text{in probability}, \tag{11.1} \end{equation}\] which means \[ \sqrt{N}\left(\frac{\sum (Y - \mu X)}{\sum X} \right) \quad \text{and} \quad \sqrt{N}\left(\frac{\sum (Y - \mu X)}{N\mathrm{E}(X)} \right) \] have the same asymptotic normal distribution.

Proof. \(\frac{\sum Y}{\sum X} \to \mu\) by law of large number. For (11.1), \[\begin{align*} &\sqrt{N}\left(\frac{\sum (Y - \mu X)}{\sum X} \right) - \sqrt{N}\left(\frac{\sum (Y - \mu X)}{N\mathrm{E}(X)}\right) = \\ & \sqrt{N}\left(\frac{\sum (Y - \mu X)}{N\mathrm{E}(X)}\right)\times\left( \frac{N\mathrm{E}(X)}{\sum X} - 1 \right). \end{align*}\] The first term converges to a normal distribution and the second term converge in probability to 0 by law of large number. By Slutsky’s theorem, the product converges in distribution (hence also in probability) to 0.

11.2 Random Denominator for Independent Randomization Experiments

Let \(Z_i,i=1,\dots,N\) be i.i.d. assignment indicator where \(Z_i=1\) if \(i\)th unit is assigned to treatment with probability \(p\). Let \((Y_i(0),Y_i(1)),i=1,\dots,N\) be the potential outcome pairs. We consider the asymptotic distribution of \[ \frac{\sum Z_i Y_i(1)}{\sum Z_i} - \frac{\sum (1- Z_i) Y_i(0)}{\sum (1-Z_i)}. \] Denote \(\sum Z_i\) by \(N_T\) and \(\sum (1-Z_i)\) by \(N_C\) for the sample size in treatment and control groups respectively. Because \(N_T\) and \(N_C\) are no longer a fixed number, the variance of the above cannot be derived simply using the variance of \(Z_i Y_i(1)\) divided by \(N_T\) (and \((1-Z_i) Y_i(0)\) divided by \(N_C\)).

The asymptotic variance can be derived using the delta method or more directly from Lemma 11.1. Let \(\mu_1 = \mathrm{E}(Y(1))\) and \(\mu_0 = \mathrm{E}(Y(0))\). By (11.1) \[\begin{align*} \sqrt{N} \left( \frac{\sum Z_i (Y_i(1) - \mu_1)}{\sum Z_i} - \frac{\sum (1- Z_i) (Y_i(0) - \mu_0)}{\sum (1-Z_i)} \right) \end{align*}\] and \[\begin{align} \sqrt{N} \left( \frac{\sum Z_i (Y_i(1) - \mu_1)}{Np} - \frac{\sum (1- Z_i) (Y_i(0) - \mu_0)}{N(1-p)} \right) \tag{11.2} \end{align}\] have the same asymptotic normal distribution. Also note that \[ \mathrm{E}(Z_i(Y_i(1)-\mu_1) \times (1- Z_i) (Y_i(0) - \mu_0)) = 0 \] since \(Z_i(1-Z_i)=0\). The variance of (11.2) is \[\begin{align} &\frac{\mathrm{Var}\left(Z_i(Y_i(1)-\mu_1)\right)}{p^2} + \frac{\mathrm{Var}\left((1-Z_i)(Y_i(0)-\mu_0)\right)}{(1-p)^2} \notag \\ = & \frac{\mathrm{Var}(Y_i(1))}{p} + \frac{\mathrm{Var}(Y_i(0))}{1-p}. \tag{11.3} \end{align}\] The equality is because \(Z_i\) and \((Y_i(1),Y_i(0))\) is independent so \[ \mathrm{Var}\left(Z_i(Y_i(1)-\mu_1)\right) = \mathrm{E}\left(Z_i^2(Y_i(1)-\mu_1)^2 \right) = p \mathrm{E}(Y_i(1)-\mu_1)^2 = p \mathrm{Var}(Y_i(1)) \] and similarly \(\mathrm{Var}\left((1-Z_i)(Y_i(0)-\mu_0)\right) = (1-p)\mathrm{Var}(Y_i(0))\).

11.3 M-Estimator and Z-Estimator

Many estimators can be defined as a maximizer or root of an empirical expectation.

Quantile and mean share the same form as the solution of minimizing the expectation of a parametrized function \(\psi_\theta(x)\) of a random variable. By simply replacing the theoretical distribution \(P\) by its empirical version \(\widetilde{P}\), the solution of the empirical version of the same minimization problem is called M-estimator.

Definition 11.1 Let \(\psi_\theta(x)\) be a family of functions of \(x\) parametrized by \(\theta\). The solution of \[ \min_{\theta} \frac{1}{N}\sum_{i=1}^N \psi_\theta(X_i) \] is called a M-estimator.

Sample quantile and sample mean are all special cases of M-estimator. The theory of M-estimator is derived as an generalization of MLE (so MLE is a special case of M-estimator). M here refers to Maximum likelihood like estimator. See endnotes. Under mild regularity conditions, M-estimator, like MLE, has an asymptotically normal distribution.

Theorem 11.1 (M-Estimator) To do. From van der vaart.