Chapter 11 Misc Topics
Here we put some technical details that are useful to be included in the book but not so important to add into the main chapter.
11.1 Delta Method
Lemma 11.1 Let \((X,Y),(X_1,Y_1),\dots,(X_N,Y_N)\) be i.i.d. random variables such that \[ \sqrt{N}\left(\overline{X}-\mathrm{E}(X), \overline{Y}-\mathrm{E}(Y)\right) \] is asymptotically normal. Let \(\mu = \frac{\mathrm{E}(Y)}{\mathrm{E}(X)}\). Then \[ \frac{\sum Y}{\sum X} \to \mu \quad a.s. \] Moreover, \[\begin{equation} \sqrt{N}\left(\frac{\sum (Y - \mu X)}{\sum X} \right) - \sqrt{N}\left(\frac{\sum (Y - \mu X)}{N\mathrm{E}(X)}\right) \to 0 \quad \text{in probability}, \tag{11.1} \end{equation}\] which means \[ \sqrt{N}\left(\frac{\sum (Y - \mu X)}{\sum X} \right) \quad \text{and} \quad \sqrt{N}\left(\frac{\sum (Y - \mu X)}{N\mathrm{E}(X)} \right) \] have the same asymptotic normal distribution.
11.2 Random Denominator for Independent Randomization Experiments
Let \(Z_i,i=1,\dots,N\) be i.i.d. assignment indicator where \(Z_i=1\) if \(i\)th unit is assigned to treatment with probability \(p\). Let \((Y_i(0),Y_i(1)),i=1,\dots,N\) be the potential outcome pairs. We consider the asymptotic distribution of \[ \frac{\sum Z_i Y_i(1)}{\sum Z_i} - \frac{\sum (1- Z_i) Y_i(0)}{\sum (1-Z_i)}. \] Denote \(\sum Z_i\) by \(N_T\) and \(\sum (1-Z_i)\) by \(N_C\) for the sample size in treatment and control groups respectively. Because \(N_T\) and \(N_C\) are no longer a fixed number, the variance of the above cannot be derived simply using the variance of \(Z_i Y_i(1)\) divided by \(N_T\) (and \((1-Z_i) Y_i(0)\) divided by \(N_C\)).
The asymptotic variance can be derived using the delta method or more directly from Lemma 11.1. Let \(\mu_1 = \mathrm{E}(Y(1))\) and \(\mu_0 = \mathrm{E}(Y(0))\). By (11.1) \[\begin{align*} \sqrt{N} \left( \frac{\sum Z_i (Y_i(1) - \mu_1)}{\sum Z_i} - \frac{\sum (1- Z_i) (Y_i(0) - \mu_0)}{\sum (1-Z_i)} \right) \end{align*}\] and \[\begin{align} \sqrt{N} \left( \frac{\sum Z_i (Y_i(1) - \mu_1)}{Np} - \frac{\sum (1- Z_i) (Y_i(0) - \mu_0)}{N(1-p)} \right) \tag{11.2} \end{align}\] have the same asymptotic normal distribution. Also note that \[ \mathrm{E}(Z_i(Y_i(1)-\mu_1) \times (1- Z_i) (Y_i(0) - \mu_0)) = 0 \] since \(Z_i(1-Z_i)=0\). The variance of (11.2) is \[\begin{align} &\frac{\mathrm{Var}\left(Z_i(Y_i(1)-\mu_1)\right)}{p^2} + \frac{\mathrm{Var}\left((1-Z_i)(Y_i(0)-\mu_0)\right)}{(1-p)^2} \notag \\ = & \frac{\mathrm{Var}(Y_i(1))}{p} + \frac{\mathrm{Var}(Y_i(0))}{1-p}. \tag{11.3} \end{align}\] The equality is because \(Z_i\) and \((Y_i(1),Y_i(0))\) is independent so \[ \mathrm{Var}\left(Z_i(Y_i(1)-\mu_1)\right) = \mathrm{E}\left(Z_i^2(Y_i(1)-\mu_1)^2 \right) = p \mathrm{E}(Y_i(1)-\mu_1)^2 = p \mathrm{Var}(Y_i(1)) \] and similarly \(\mathrm{Var}\left((1-Z_i)(Y_i(0)-\mu_0)\right) = (1-p)\mathrm{Var}(Y_i(0))\).
11.3 M-Estimator and Z-Estimator
Many estimators can be defined as a maximizer or root of an empirical expectation.
Quantile and mean share the same form as the solution of minimizing the expectation of a parametrized function \(\psi_\theta(x)\) of a random variable. By simply replacing the theoretical distribution \(P\) by its empirical version \(\widetilde{P}\), the solution of the empirical version of the same minimization problem is called M-estimator.
Sample quantile and sample mean are all special cases of M-estimator. The theory of M-estimator is derived as an generalization of MLE (so MLE is a special case of M-estimator). M here refers to Maximum likelihood like estimator. See endnotes. Under mild regularity conditions, M-estimator, like MLE, has an asymptotically normal distribution.