High-dimensional Minimum Variance Portfolio Estimation
Based on High-frequency Data
June 1, 2019
Abstract
This paper studies the estimation of high-dimensional minimum variance portfolio
(MVP) based on the high frequency returns which can exhibit heteroscedasticity and
possibly be contaminated by microstructure noise. Under certain sparsity assumptions
on the precision matrix, we propose estimators of the MVP and prove that our portfolios asymptotically achieve the minimum variance in a sharp sense. In addition, we
introduce consistent estimators of the minimum variance, which provide reference targets. Simulation and empirical studies demonstrate the favorable performance of the
proposed portfolios.
Key Words: Minimum variance portfolio; High dimension; High frequency; CLIME
estimator; Precision matrix.
JEL Codes: C13, C55, C58, G11
1 Introduction
1.1 Background
Since the ground-breaking work of Markowitz (1952), the mean-variance portfolio has caught
significant attention from both academics and practitioners. To implement such a strategy in
practice, the accuracy in estimating both the expected returns and the covariance structure
of returns is vital. It has been well documented that the estimation of the expected returns
is more difficult than the estimation of covariances (Merton (1980)), and the impact on
portfolio performance caused by the estimation error in the expected returns is larger than
that caused by the error in covariance estimation. These difficulties pose serious challenges
for the practical implementation of the Markowitz portfolio optimization. Theoretically,
∗Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104.
tcai@wharton.upenn.edu.
†Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706. huj@stat.wisc.edu.
‡Department of ISOM and Department of Finance, Hong Kong University of Science and Technology,
Clear Water Bay, Kowloon, Hong Kong. yyli@ust.hk.
§Department of ISOM, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon,
Hong Kong. xhzheng@ust.hk.
1
Electronic copy available at: https://ssrn.com/abstract=3105815
mean-variance optimization has been criticized from the portfolio standpoint due to invalid
preferences from the axioms of choice, inconsistent dynamic behavior, etc.
The minimum variance portfolio (MVP) has received growing attention over the past few
years (see, e.g., DeMiguel et al. (2009a) and the references therein). It avoids the difficulties
in estimating the expected returns and is on the efficient frontier. In addition, the MVP is
found to perform well on real data. Empirical studies in Haugen and Baker (1991), Chan
et al. (1999), Schwartz (2000), Jagannathan and Ma (2003) and Clarke et al. (2006) have
found that the MVP can enjoy both lower risk and higher return compared with some
benchmark portfolios. These features make the MVP an attractive investment strategy in
practice.
The MVP is more natural in the context of high-frequency data. Over short time horizons, mean return is usually completely dominated by the volatility, consequently, as a
prudent common practice, when the time horizon of interest is short, the expected returns
are often assumed to be zero (see, e.g., Part II of Christoffersen (2012) and the references
therein). Fan et al. (2012a) make this assumption when considering the management of
portfolios that are rebalanced daily or every other few days. When the expected returns are
zero, the mean-variance optimization reduces to the risk minimization problem, in which
one seeks the MVP.
In addition, there are benefits of using high-frequency data. On the one hand, large
number of observations can potentially help facilitate better understanding of the covariance
structure of returns. Developments in this direction in the high-dimensional setting include
Wang and Zou (2010); Tao et al. (2011); Zheng and Li (2011); Tao et al. (2013); Kim et al.
(2016); Aït-Sahalia and Xiu (2017); Xia and Zheng (2018); Dai et al. (2019); Pelger (2019),
among others. On the other hand, high-frequency data allow short-horizon rebalancing
and hence the portfolios can adjust quickly to time variability of volatilities/co-volatilities.
However, high-frequency data do come with significant challenges in analysis. Complications
arise due to heteroscedasticity and microstructure noise, among others.
We consider in this paper the estimation of high-dimensional MVP using high-frequency
data. To be more specific, given p assets, whose returns X = (X1, . . . , Xp)
| have covariance
matrix Σ, we aim to find:
arg min
w
w|Σw subject to w|1 = 1, (1.1)
where w = (w1, . . . , wp)
|
represents the weights put on different assets, and 1 = (1, . . . , 1)|
is the p-dimensional vector with all entries being 1. The optimal solution is given by
wopt =
Σ−11
1
|Σ−11
, (1.2)
which yields the minimum risk
Rmin = w
|
optΣwopt =
1
1
|Σ−11
. (1.3)
More generally, one may be interested in the following optimization problem: for a given
vector β = (β1, . . . , βp)
| and constant c,
arg min
w
we
|Σwe subject to w|1 = c, where we = (β1w1, . . . , βpwp), (1.4)
2
Electronic copy available at: https://ssrn.com/abstract=3105815
or its equivalent formulation:
arg min
we
we
|Σwe subject to we
|β
−1 = c, where β
−1
:= (1/β1, . . . , 1/βp)
|
. (1.5)
Such a setting applies, for example, in leveraged investment. We remark that the optimization problem (1.5) can be reduced to (1.1) by noticing that if we solves (1.5), then
wˇ := (we1/β1, . . . , wep/βp)
|/c solves (1.1) with Σe = diag(β1, . . . , βp)Σ diag(β1, . . . , βp), and
vice versa. For this reason, the two optimization problems (1.1) and (1.4) or (1.5) can be
transformed into each other. In the rest of the paper, we focus on the problem (1.1).
A main challenge of solving the optimization problem (1.1) comes from high-dimensionality
because modern portfolios often involve a large number of assets. See, for example, Zheng
and Li (2011), Fan et al. (2012a), Fan et al. (2012b), Ao et al. (2018), Xia and Zheng (2018),
and Dai et al. (2019) on issues about and progress made on vast portfolio management.
On the other hand, the estimation of the minimum risk defined in (1.3) is also a problem
of interest. This provides a reference target for the estimated minimum variance portfolios.
In practice, because the true covariance matrix is unknown, the sample covariance matrix S is usually used as a proxy, and the resulting “plug-in” portfolio, wp = S
−11/1
|S
−11,
has been widely adopted. How well does such a portfolio perform? This question has been
considered in Basak et al. (2009). The following simulation result visualizes their first finding (Proposition 1 therein). Figure 1 shows the risk of the plug-in portfolio based on 100
replications. One can see that the actual risk R(wp) = w
|
pΣwp of the plug-in portfolio
can be devastatingly higher than the theoretical minimum risk. On the other hand, the
perceived risk Rbp = w
|
pSwp can be even lower than the theoretical minimum risk. Such
contradictory phenomena lead to two questions: (1) Can we consistently estimate the true
minimum risk? ; and (2) More importantly, can we find a portfolio with a risk close to the
true minimum risk?
3
Electronic copy available at: https://ssrn.com/abstract=3105815
0 20 40 60 80 100
1.0e−05 1.5e−05 2.0e−05
Comparison of risks
replication
risk
perceived risk
actual risk
minimun risk
Figure 1. Comparison of actual and perceived risks of the plug-in portfolio. The portfolios
are constructed based on returns simulated from i.i.d. multivariate normal distribution with
mean zero and covariance matrix Σ calibrated from real data; see Section 4 for details.
The number of assets and observations are 80 and 252, respectively. The comparison is
replicated 100 times.
Because of such issues with the plug-in portfolio, alternative methods have been proposed. Jagannathan and Ma (2003) argue that imposing no short-sale constraint helps.
More generally, Fan et al. (2012b) study the MVP under the following gross-exposure constraint:
arg min
w
w|Σw subject to w|1 = 1 and ||w||1 ≤ λ , (1.6)
where ||w||1 =
Pp
i=1 |wi
| and λ is a chosen constant. They derive the following bound on
the risk of estimated portfolio. If Σb is an estimator of Σ, then the solution to (1.6) with Σ
replaced by Σb, denoted by wbopt, satisfies that
|R(wbopt) − Rmin| ≤ λ
2
· ||Σb − Σ||∞, (1.7)
where for any weight vector w, R(w) = w|Σw stands for the risk measured by the variance
of the portfolio return, and for any matrix A = (aij ), ||A||∞ := maxij |aij |. In particular,
||Σb −Σ||∞ is the maximum element-wise estimation error in using Σb to estimate Σ. Fan et al.
(2012a) consider the high-frequency setting, where they use the two-scale realized covariance
matrix (Zhang et al. (2005)) to estimate the integrated covariance matrix (see Section 2.1
below for related background), and establish concentration inequalities for the element-wise
estimation error. These concentration inequalities imply that even if the number of assets p
grows faster than the number of observations n, one still has that ||Σb −Σ||∞ → 0 as n → ∞;
see equation (18) in Fan et al. (2012a) for the precise statement. In particular, bound (1.7)
4
Electronic copy available at: https://ssrn.com/abstract=3105815
guarantees that under gross-exposure constraint, the difference between the risk associated
with wbopt and the minimum risk is asymptotically negligible.
The difference between the risk of an estimated portfolio and the minimum risk going to
zero, however, may not be sufficient to guarantee (near) optimality. In fact, under rather general assumptions (which do not exclude factor models), the minimum risk Rmin = 1/1
|Σ−11
may go to zero as the number of assets p → ∞; see Ding et al. (2018) for a thorough discussion. If indeed the minimum risk goes to 0 as p → ∞, then the difference |R(wbopt) − Rmin|
going to 0 is not enough to guarantee (near) optimality. Based on the above consideration,
we turn to find an asset allocation wb which satisfies a stronger sense of consistency in that
the ratio between the risk of the estimated portfolio and the minimum risk goes to one, i.e.,
R(wb)
Rmin
p
−→ 1 as p → ∞, (1.8)
where p
−→ stands for convergence in probability.
1.2 Main contributions of the paper
Our contributions mainly lie in the following aspects.
We propose estimators of minimum variance portfolio that can accommodate stochastic
volatility and market microstructure noise, which are intrinsic to high-frequency returns.
Under some sparsity assumptions on the inverse of the covariance matrix (also known as the
precision matrix), our estimated portfolios enjoy the desired convergence (1.8).
We also introduce consistent estimators of the minimum risk. One such estimator does
not depend on the sparsity assumption and also enjoys a CLT.
1.3 Organization of the paper
The paper is organized as follows. In Section 2, we present our estimators of the MVP
and show that their risks converge to the minimum risk in the sense of (1.8). We have
an estimator that incorporates stochastic volatility (CLIME-SV) and one that incorporates
stochastic volatility and microstructure noise (CLIME-SVMN). A consistent estimator of the
minimum risk is proposed in Section 2.4, for which we also establish the CLT. An extension of
our method to utilize factors is developed in Section 3. Section 4 presents simulation results
to illustrate the performance of both portfolio and minimum risk estimations. Empirical
study results based on S&P 100 Index constituents are reported in Section 5. We conclude
our paper with a brief summary in Section 6. All proofs are given in the Appendix.
2 Estimation Methods and Asymptotic Properties
2.1 High-frequency data model
We assume that the latent p-dimensional log-price process (Xt) follows a diffusion model:
dXt = µtdt + Θt dWt
, for t ≥ 0, (2.1)
5
Electronic copy available at: https://ssrn.com/abstract=3105815
where (µt) = (µ
1
t
, . . . , µ
p
t
)
|
is the drift process, (Θt) = (θ
ij
t
)1≤i,j≤p is a p × p matrix-valued
process called the spot co-volatility process, and (Wt) is a p-dimensional Brownian motion.
Both (µt) and (Θt) are stochastic, càdlàg, and may depend on (Wt), all defined on a
common filtered probability space (Ω, F,(Ft)t≥0).
Let
Σt = ΘtΘ
|
t
:= (σ
ij
t
)
be the spot covariance matrix process. The ex-post integrated covariance (ICV) matrix over
an interval, say [0, 1], is
ΣICV = ΣICV,1 = (σ
ij ) := Z 1
0
Σt dt.
Denote its inverse by ΩICV := Σ
−1
ICV. The ex-post minimum risk, Rmin, is obtained by
replacing the Σ in (1.3) with ΣICV.
Let us emphasize that in general, ΣICV is a random variable which is only measurable
to F1, and so is Rmin. It is therefore in principle impossible to construct a portfolio that
is measurable to F0 to achieve the minimum risk Rmin. Practical implementation of the
minimum variance portfolio relies on making forecasts of ΣICV based on historical data.
The simplest approach is to assume that ΣICV,t ≈ ΣICV,t+1
1
, where ΣICV,t stands for the
ICV matrix in period [t−1, t]. Under such an assumption, if we can construct a portfolio w
based on the observations during [t − 1, t] (and hence only measurable to Ft) that can
approximately minimize the ex-post risk w|ΣICV,tw, then if we hold the portfolio during
the next period [t, t + 1], the actual risk w|ΣICV,t+1w is still approximately minimized. In
this article we adopt such a strategy.
2.2 High-frequency case with no microstructure noise
We first consider the case when there is no microstructure noise - in other words, one observes
the true log-prices (Xi
t
).
Our approach to estimate the minimum variance portfolio relies on the constrained l1-
minimization for inverse matrix estimation (CLIME) proposed in Cai et al. (2011). The
original CLIME method is developed under the i.i.d. observation setting. Specifically, suppose one has n i.i.d. observations from a population with covariance matrix Σ. Let Ω := Σ−1
be the corresponding precision matrix. The CLIME estimator of Ω is defined as
Ωb CLIME := arg min
Ω0
||Ω0
||1 subject to ||ΣΩb 0 − I||∞ ≤ λ, (2.2)
where Σb is the sample covariance matrix, I is the identity matrix, and for any matrix
A = (aij ), ||A||1 := P
i,j |aij | and, recall that, ||A||∞ = maxij |aij |. The λ is a tuning
parameter, and is usually chosen via cross-validation.
1This is related to the phenomenon that the volatility process is often found to be nearly unit root, in
which case the one-step ahead prediction is approximately the current value. The assumption is also used
in Fan et al. (2012a) and Dai et al. (2019), among others.
6
Electronic copy available at: https://ssrn.com/abstract=3105815
The CLIME method is designed for the following uniformity class of precision matrices.
For any 0 ≤ q < 1, s0 = s0(p) < ∞ and M = M(p) < ∞, let
U(q, s0, M) = n
Ω = (Ωij )p×p : Ω positive definite, ||Ω||L1 ≤ M, max
1≤i≤p
X
p
j=1
|Ωij |
q ≤ s0
o
,
(2.3)
where ||Ω||L1
:= max1≤j≤p
Pp
i=1 |Ωij |.
Under the situation when the observations are i.i.d. sub-Gaussian and the underlying Ω
belongs to U(q, s0(p), M(p)), Cai et al. (2011) establish consistency of Ωb CLIME when q, s0(p)
and M(p) satisfy M2−2q
s0 (log p/n)
(1−q)/2 → 0; see Theorem 1(a) therein.
In the high-frequency setting, due to stochastic volatility, leverage effect etc., the returns
are not i.i.d., so the results in Cai et al. (2011) do not apply. Before we discuss how
to tackle this difficulty, we remark that the sparsity assumption Ω = Σ−1 ∈ U(q, s0, M)
appears to be reasonable in financial applications. For example, if the returns are assumed
to follow a (conditional) multivariate normal distribution with covariance matrix Σ, then
the (i, j)th element in Ω being 0 is equivalent to that the returns of the ith and jth assets
are conditionally independent given the other asset returns. For stocks in different sectors,
many pairs might be conditionally independent or only weakly dependent.
Now we discuss how to adopt CLIME to the high-frequency setting. Our goal is to
estimate ΣICV, or, more precisely, its inverse ΩICV := Σ
−1
ICV. In order to apply CLIME, we
need to decide which Σb to use in (2.2).
When the true log-prices are observed, one of the most commonly used estimators
for ΣICV is the realized covariance (RCV) matrix. Specifically, for each asset i, suppose
the observations at stage n are
Xi
t
i,n
`
, where 0 = t
i,n
0 < ti,n
1 < · · · < ti,n
Ni
= 1 are the
observation times. The n characterizes the observation frequency, and Ni → ∞ as n → ∞.
The synchronous observation case corresponds to
t
i,n
` ≡ t
n
`
for all i = 1, . . . , p, (2.4)
which, in the simplest equidistant setting, reduces to
t
i,n
` = t
n
` = `/n, ` = 0, 1, . . . , n. (2.5)
In the synchronous observation case (2.4), for ` = 1, . . . , n, let
∆X`
:= Xt
n
`
− Xt
n
`−1
be the log-return vector over the time interval [t
n
`−1
, tn
`
]. The RCV matrix is defined as
Σb RCV =
Xn
`=1
∆X`(∆X`)
|
. (2.6)
We are now ready to define the constrained l1-minimization for inverse matrix estimation
with stochastic volatility (CLIME-SV), Ωb CLIME−SV:
Ωb CLIME−SV := arg min
Ω0
||Ω0
||1 subject to ||Σb RCVΩ0 − I||∞ ≤ λ. (2.7)
7
Electronic copy available at: https://ssrn.com/abstract=310581
The resulting MVP estimator is given by
wbCLIME−SV =
Ωb CLIME−SV1
1
|Ωb CLIME−SV1
, (2.8)
which is associated with a risk of
RCLIME−SV = wb
|
CLIME−SVΣICVwbCLIME−SV =
(Ωb CLIME−SV1)
|ΣICV(Ωb CLIME−SV1)
(1
|Ωb CLIME−SV1)
2
. (2.9)
Next we discuss the theoretical properties of this portfolio estimator. We make the
following assumptions.
Assumption A:
(A.i) There exists δ ∈ (0, 1) such that for all p, δ ≤ λmin(ΣICV) ≤ λmax(ΣICV) < 1/δ
almost surely, where for any symmetric matrix A, λmin(A) and λmax(A) stand
for its smallest and largest eigenvalues, respectively.
(A.ii) The drift process is such that |µ
i
t
| ≤ Cµ for some constant Cµ < ∞ for all
i = 1, . . . , p and t ∈ [0, 1] almost surely.
(A.iii) There exists constant Cσ such that |σ
ij
t
| ≤ Cσ < ∞ for all i = 1, . . . , p and t ∈
[0, 1] almost surely.
(A.iv) The observation times t
n
`
under the synchronous setting (2.4) satisfy that there
exists constant C∆ such that
sup
n
max
1≤`≤n
n|t
n
` − t
n
`−1
| ≤ C∆ < ∞. (2.10)
(A.v) p and n satisfy that log p/n → 0.
Remark 1. Assumptions (A.ii) and (A.iii) are standard in the high frequency literature, one
assuming bounded drift, the other assuming bounded volatility. One may relax the bounded
volatility assumption by using techniques similar to Lemma 3 of Fan et al. (2012a). Assumption (A.iv) is also widely adopted, although one may need to apply synchronization
techniques such as the previous tick (Zhang (2011)) and refresh time (Barndorff-Nielsen
et al. (2011)) in order to synchronize the observations. Assumption (A.v) implies that the
number of assets, p, can grow faster than the observation frequency.
Theorem 1. Suppose that (Xt) satisfies (2.1) and the underlying precision matrix ΩICV =
Σ
−1
ICV ∈ U(q, s0, M). Under Assumptions (A.i)-(A.v), with λ = ηMp
log p/n for some
η > 0, there exist constants C1, C2 > 0 such that
P
RCLIME−SV
Rmin
− 1 = O
M2−2q
s0
(log p/n)
(1−q)/2
≥ 1 −
C1
p
C2η
2−2
,
where RCLIME−SV is defined in (2.9) and Rmin = 1/(1
|ΩICV1).
Remark 2. Theorem 1 guarantees that as long as M2−2q
s0
(log p/n)
(1−q)/2
→ 0, the
risk of our estimated MVP is consistent in the sense of (1.8). The estimated portfolio is
therefore applicable to the ultra-high-dimensional setting where the number of assets can be
much larger than the number of observations.
8
Electronic copy available at: https://ssrn.com/abstract=310581
Remark 3. The case where observations are i.i.d. is a special case of the high frequency
setting that we adopt here. In fact, to generate i.i.d. returns under our setting, one just
needs to take constant drift and volatility processes. Our setting is therefore more general
than the i.i.d. observation setting, and all our results readily apply to that case.
2.3 High-frequency case with microstructure noise
In general, the observed prices are believed to be contaminated by microstructure noise. In
other words, instead of observing the true log-prices
Xi
t
i,n
`
, for each asset i, the observations
at stage n are
Y
i
t
i,n
`
= Xi
t
i,n
`
+ ε
i
`
, (2.11)
where ε
i
`
’s represent microstructure noise. In this case, if one simply plugs
Y
i
t
i,n
`
into the
formula of RCV in (2.6), the resulting estimator is not consistent even when the dimension p
is fixed. Consistent estimators in the univariate case include the two-scales realized volatility
(TSRV, Zhang et al. (2005)), multi-scale realized volatility (MSRV, Zhang (2006)), preaveraging estimator (PAV, Jacod et al. (2009), Podolskij and Vetter (2009) and Jacod et al.
(2019)), realized kernels (RK, Barndorff-Nielsen et al. (2008)), quasi-maximum likelihood
estimator (QMLE, Xiu (2010)), estimated-price realized volatility (ERV, Li et al. (2016))
and the unified volatility estimator (UV, Li et al. (2018)). All these estimators, however,
are not consistent in the high-dimensional setting.
In this article we choose to work with the PAV estimator. To reduce non-essential technical complications, we work with the equidistant time setting (2.5) (such as data sampled
every minute). Asynchronicity can be dealt with by using existing data synchronization
techniques. Compared with microstructure noise, asynchronicity is less an issue; see, for
example, Section 2.4 in Xia and Zheng (2018).
To implement the PAV estimator, we fix a constant θ > 0 and let kn = [θn1/2
] be the
window length over which the averaging takes place. Define
Y
n
k =
Pkn−1
i=kn/2 Ytk+i −
Pkn/2−1
i=0 Ytk+i
kn
.
The PAV with weight function g(x) = x ∧ (1 − x) for x ∈ (0, 1) is defined as
Σb PAV =
12
θ
√
n
n−X
kn+1
k=0
Y
n
k
· (Y
n
k
)
| −
6
θ
2n
diag Xn
k=1
(∆Y
i
tk
)
2
!
i=1,...,p
. (2.12)
We now define the constrained l1-minimization for inverse matrix estimation with stochastic volatility and microstructure noise (CLIME-SVMN), Ωb CLIME−SVMN, as
Ωb CLIME−SVMN := arg min
Ω0
||Ω0
||1 subject to ||Σb PAVΩ0 − I||∞ ≤ λ. (2.13)
Correspondingly, we define the estimated MVP, wbCLIME−SVMN, by replacing Ωb CLIME−SV
in (2.8) with Ωb CLIME−SVMN. Its associated risk, RCLIME−SVMN, is given by replacing
wbCLIME−SV and Ωb CLIME−SV in (2.9) with wbCLIME−SVMN and Ωb CLIME−SVMN, respectively.
9
Electronic copy available at: https://ssrn.com/abstract=31058
Now we state the assumptions that we need in order to establish the statistical properties
of the portfolio wbCLIME−SVMN.
Assumption B:
(B.i) The microstructure noise (ε
i
`
) is strictly stationary and independent with mean
zero and also independent of (Xt).
(B.ii) For any γ ∈ R,
E(exp(γεi
`
)) ≤ exp(Cγ2
),
where C is a fixed constant. Suppose also that there exists Cε > 0 such that
Var(ε
i
`
) ≤ Cε for all i and `.
(B.iii) p and n satisfy that log p/√
n → 0.
Theorem 2. Suppose that (Xt) satisfies (2.1) and ΩICV = Σ
−1
ICV ∈ U(q, s0, M). Under
Assumptions (A.i)-(A.iv) and (B.i)-(B.iii), with λ = ηM√
log p/n1/4
for some η > 0, there
exist constants C3, C4 > 0 such that
P
RCLIME−SVMN
Rmin
− 1 = O
M2−2q
s0
(log p)
(1−q)/2
/n(1−q)/4
≥ 1 −
C3
p
C4η
2−2
.
Remark 4. Theorem 2 states that in the noisy case, if M2−2q
s0
(log p)
(1−q)/2/n(1−q)/4
→ 0,
then the risk of our estimated MVP is consistent in the sense of (1.8).
Remark 5. The reduction in the rate from (
√
log p/√
n)
1−q
in Theorem 1 to (
√
log p/n1/4
)
1−q
in the current theorem is an inevitable consequence due to noise. In fact, the optimal rate in
estimating the integrated volatility in the noisy case is O(n
1/4
) (Gloter and Jacod (2001)),
as is compared with the rate of O(n
1/2
) in the noiseless case.
2.4 Estimating the minimum risk
So far we have seen that under certain sparsity assumptions on the precision matrix, we can
construct a portfolio with a risk close to the minimum risk in the sense of (1.8). Now we
turn to consistent estimation of the minimum risk.
2.4.1 Assuming sparsity: using CLIME based estimator
The CLIME based estimators we considered in Sections 2.2 and 2.3 also lead to estimators of
the minimum risk Rmin. Recall that Rmin = 1/1
|ΩICV1. Our estimators are, when working
under high-frequency setting without noise,
RbCLIME−SV =
1
1
|Ωb CLIME−SV1
, (2.14)
where, recall that, Ωb CLIME−SV is defined in (2.7). Similarly, if microstructure noise is
present, then the minimum risk estimator is defined as
RbCLIME−SVMN =
1
1
|Ωb CLIME−SVMN1
, (2.15)
10
Electronic copy available at: https://ssrn.com/abstract=310581
where Ωb CLIME−SVMN is defined in (2.13).
We have the following results for these estimators.
Theorem 3. (i) Under the assumptions in Theorem 1, the minimum risk estimator RbCLIME−SV
defined in (2.14) satisfies that
P
RbCLIME−SV
Rmin
− 1 = O
M2−2q
s0
(log p/n)
(1−q)/2
!
≥ 1 −
C1
p
C2η
2−2
.
(ii) Under the assumptions in Theorem 2, the minimum risk estimator RbCLIME−SVMN
defined in (2.15) satisfies that
P
RbCLIME−SVMN
Rmin
− 1 = O
M2−2q
s0
(log p)
(1−q)/2
/n(1−q)/4
!
≥ 1 −
C3
p
C4η
2−2
.
2.4.2 Without sparsity assumption: low-frequency i.i.d. returns
CLIME based estimators work under mild sparsity assumptions on the precision matrix.
Now we propose another estimator of the minimum risk, which does not rely on such sparsity
assumptions, although on the other hand, it assumes that the observations are i.i.d. normally
distributed. This estimator is hence more suitable for the low frequency setting and can be
used to estimate the minimum risk over a long time period.
More specifically, suppose that we observe n i.i.d. returns X1, . . . , Xn (possibly at low
frequency). Let S be the sample covariance matrix, and let wp = S
−11/1
|S
−11 be the “plugin” portfolio. The corresponding perceived risk is Rbp = w
|
pSwp. We have the following result
on the relationship between Rbp and the minimum risk Rmin, based on which a consistent
estimator of the minimum risk is constructed.
Proposition 1. Suppose that the returns X1, . . . , Xn ∼i.i.d. N(µ, Σ). Suppose further that
both n and p → ∞ in such a way that ρn := p/n → ρ ∈ (0, 1). Then
Rbp
Rmin
− (1 − ρn)
p
−→ 0. (2.16)
Therefore, if we define
Rbmin =
1
1 − ρn
Rbp, (2.17)
then
Rbmin
Rmin
p
−→ 1. (2.18)
Furthermore, we have
√
n − p
Rbmin
Rmin
− 1
!
⇒ N(0, 2). (2.19)
11
Electronic copy available at: https://ssrn.com/abstract=3105815
The convergence (2.19) actually shows “blessing” of dimensionality: the higher the dimension, the more accurate the estimation.
The convergence (2.16) explains why in Figure 1 the perceived risk is systematically
lower than the minimum risk. We also remark that the “plug-in” portfolio is not optimal.
In Basak et al. (2009), it is shown that the risk of the plug-in portfolio is on average a
higher-than-one multiple of the minimum risk; see Propositions 1 and 2 therein. Under the
condition that both p and n → ∞ and p/n → ρ ∈ (0, 1), their result can be strengthened to
be that the risk of the plug-in portfolio is, with probability approaching one, a larger-thanone multiple of the minimum risk. In fact, using the relationship (5) in Basak et al. (2009),
it is easy to show that
R(wp)
Rmin
p
−→
1
1 − ρ
, (2.20)
where R(wp) = w
|
pΣwp is the risk of the plug-in portfolio.
3 Estimation with Factors
In practice, stock returns often exhibit a factor structure. Prominent examples include
CAPM (Sharpe (1964)), Fama-French Three Factor and Five Factor models (Fama and
French (1992, 2015)). Recently, there have also been studies on factor modeling for highfrequency returns; see Aït-Sahalia and Xiu (2017); Dai et al. (2019); Pelger (2019). In this
section, we extend our approach to accommodate such a structure.
Let rk = (rk,1, . . . , rk,p)
|
, k = 1, . . . , n, be asset returns, which can be either highfrequency such as 5-minute returns or low-frequency like daily or weekly returns. We assume
that (rk) admits a factor structure as follows:
rk = α + Γfk + zk, (3.1)
where α is a p × 1 unknown vector, Γ is a p × m unknown coefficient matrix, fk =
(fk,1, . . . , fk,m)
| are factor returns, and zk is p×1 random vector with mean 0 and covariance
matrix Σk,0 = (σ
k,0
ij ). We assume that for each k, fk’s and zk’s are independent, and the pairs
(rk,fk), k = 1, . . . , n, are mutually independent. Let Σr,k = Cov(rk) and Σf,k = Cov(fk).
Note that we allow both Σr,k and Σf,k to depend on the time index k, to accommodate
stochastic (co-)volatility. Because of such a feature, it is impossible to estimate individual Σr,k or Σf,k, but it is possible to estimate their means, namely, Σr =
1
n
Pn
k=1 Σr,k,
Σf =
1
n
Pn
k=1 Σf,k, and Σ0 =
1
n
Pn
k=1 Σk,0 and the corresponding Ω0 = Σ
−1
0
. The estimation procedure requires consistently estimating the means of (rk,fk), which entails the
condition that E(fk)’s are the same for all k.
To estimate the model, we first calculate the least squares estimators of α and Γ, denoted
by αˆ and Γb respectively. The residuals are ek := rk − αˆ − Γbfk. We then obtain a CLIME
estimator of Ω0, denoted by Ωb0, based on the residuals. Specifically,
Ωb0 := arg min
Ω0
||Ω0
||1 subject to ||SeΩ0 − I||∞ ≤ λ, (3.2)
where Se is the sample covariance matrix of residuals (e1, . . . , en).
12
Electronic copy available at: https://ssrn.com/abstract=3105815
Our goal is to estimate the precision matrix Ωr := Σ−1
r
. To do so, note that Σr =
ΓΣfΓ
| + Σ0, it follows that
Ωr = (ΓΣfΓ
| + Σ0)
−1 = Ω0 − Ω0Γ(Σ
−1
f + Γ|Ω0Γ)−1Γ
|Ω0. (3.3)
Therefore, we can define the constrained l1-minimization for inverse matrix estimation adjusted for factor (CLIME-F), Ωb CLIME−F, as
Ωb CLIME−F = Ωb0 − Ωb0Γ( b S
−1
f + Γb|Ωb0Γ) b −1Γb|Ωb0, (3.4)
where Sf
is the sample covariance matrix of (f1, . . . ,fn). From here the MVP estimator is
wbCLIME−F =
Ωb CLIME−F1
1
|Ωb CLIME−F1
. (3.5)
To establish the statistical properties for this estimator, the following assumptions are
made.
Assumption C:
(C.i) Let the number of factors, m, be fixed, and log(p) = o(n). Moreover, there exist
η > 0 and K > 0 such that for all i = 1, . . . , m, j = 1, . . . , p, and k = 1, . . . , n,
E(exp(ηf 2
ki)) ≤ K, E(exp(ηz2
kj/σk,0
jj )) ≤ K, and σ
k,0
jj ≤ K.
(C.ii) Σf =
1
n
Pn
k=1 Cov(fk) is of full rank, and there exists Kf > 0 such that
||Σ
−1
f
||L∞ ≤ Kf
, where for any matrix A = (aij ), ||A||L∞ := maxi
P
j
|aij |.
(C.iii) There exists δ ∈ (0, 1) such that for all p, δ ≤ λmin(Σ0) ≤ λmax(Σ0) < 1/δ.
(C.iv) For all p, the eigenvalues of p
−1Γ
|Γ are bounded away from 0 and ∞.
Theorem 4. Suppose that the returns (r1, . . . , rn) follow (3.1) and the precision matrix
Ω0 ∈ U(q, s0, M). Under Assumptions (C.i)-(C.iv), with probability greater than 1−O(p
−1
),
we have, for some constant C5 > 0,
||Γb − Γ||∞ ≤ C5
log p
n
1/2
. (3.6)
Moreover, with λ = C6M
p
log p/n for some sufficiently large positive constant C6, with
probability greater than 1 − O(p
−1
), we have, for some constants C7, C8 > 0,
||Ωb0 − Ω0||2 ≤ C7M1−q
s0
log p
n
(1−q)/2
, (3.7)
and
||Ωb CLIME−F − Ωr||2 ≤ C8M1−q
s0
log p
n
(1−q)/2
, (3.8)
where for any symmetric matrix A, ||A||2 stands for its spectral norm.
Remark 6. When (rk,fk), k = 1, . . . , n, are i.i.d., we have Σr,k ≡ Σr and Σf,k ≡ Σf
, and
Theorem 4 readily applies. We do not impose such a strong assumption and allow both Σr,k
and Σf,k to change over k, to accommodate the stochastic (co-)volatility in high-frequency
returns.
13
Electronic copy available at: https://ssrn.com/abstract=3105815
4 Simulation Studies
We simulate the p-dimensional log-price process (Xt) from a stochastic factor model:
dXt = Γ
µf dt + γtΣ
1/2
f
dW(1)
t
+ γtΣ
1/2
0
dW(2)
t
, for t ∈ [0, 1], (4.1)
where µf and Σf are the mean vector and covariance matrix of an m-dimensional factor
return, (W(1)
t
) and (W(2)
t
) are independent m-dimensional and p-dimensional standard
Brownian motions, and (γt) is a stochastic U-shaped process that obeys
dγt = −ρ(γt − ϕt) dt + σ dBt
,
where ρ = 10, σ = 0.05,
ϕt =
p
1 + 0.8 cos(2πt), for t ∈ [0, 1],
and (Bt) is a standard Brownian motion that is independent of (W(1)
t
,W(2)
t
). Under such
a setting, the ICV matrix is
ΣICV =
Z 1
0
γ
2
t dt
Σ, where Σ = ΓΣfΓ
| + Σ0.
We now discuss the specifications of the parameters in (4.1). The parameter values are
learned from real data as follows. We work with the daily returns in Year 2012 of randomly
selected eighty S&P100 Index constituents that are used in our empirical studies (Section 5).
The factors are taken to mimic the Fama-French three factors, and we set µf and Σf to be
the sample mean and sample covariance matrix of the daily factor returns in Year 2012. As
to the coefficients Γ, they are taken to be the estimated coefficients by fitting the FamaFrench three-factor model to the daily returns of the eighty stocks. Finally, we construct a
CLIME estimator Ωb0 based on the residuals as in (3.2), and set Σ0 = Ωb −1
0
.
We shall compare our proposed portfolios with several existing methods that are based
on low-frequency data. We simulate 11 years’ data, each year consisting of T = 252 days
and each day n = 391 log-prices. The first year is taken as the initial training period and the
remaining 10 years as the testing period. The log-prices are contaminated with simulated
microstructure noise. Specifically, for each stock i, the noise is generated from N(0, σ2
e,i),
where σe,i ∼ Unif(0.0001, 0.0005).
We include the following portfolios in our comparison. All portfolios are constructed
using a rolling window scheme, with different window lengths according to whether a portfolio is a low-frequency strategy or a high-frequency one. For methods using low-frequency
data (methods (ii)-(vi)), daily returns during the immediate past 252 days are used to construct the covariance matrix estimator Σb or precision matrix estimator Ωb, which is then
plugged into (1.2), and the weights are re-calculated every 21 days. For methods using
high-frequency data (methods (vii)-(ix)), the portfolio weights are estimated with returns
from the immediate past 10 days and are re-estimated every 5 days.
(i) Equally-weighted (EW) portfolio. This serves as an important benchmark portfolio as
shown in DeMiguel et al. (2009b);
14
Electronic copy available at: https://ssrn.com/abstract=310581
(ii) FFL portfolio. This is based on Fan et al. (2008) and utilizes the three factors from
the data-generating process;
(iii) Nonlinear shrinkage (NLS) portfolio. This is based on Ledoit and Wolf (2012);
(iv) NLSF portfolio. This is based on Ledoit and Wolf (2017);
(v) CLIME portfolio. To construct this portfolio, daily returns are treated as i.i.d. observations and the original CLIME method is directly applied to estimate the precision
matrix and the portfolio weight;
(vi) CLIME-F portfolio. This is based on Theorem 4 using daily returns and utilizing the
three factors from the data-generating process;
(vii) CLIME-SV(5min) portfolio. This is based on Theorem 1 using 5-min returns;
(viii) CLIME-F(5min) portfolio: This is a high-frequency version of CLIME-F where the
factor model is estimated using 5-min intra-day returns;
(ix) CLIME-SVMN(1min) portfolio. This is based on Theorem 2 using 1-min returns;
(x) Oracle portfolio. This is the optimal portfolio obtained by plugging the ICV matrix into
the weight formula (1.2). This portfolio is infeasible because ICV is only measurable to
the filtration at the end of the day while the portfolio is to be held from the beginning
of each day.
Table 1 reports the annualized standard deviations of these portfolios based on 10 years
of testing.
Low Frequency High Frequency
Non-factor-based EW NLS CLIME CLIME-SV(5min) CLIME-SVMN(1min)
Ann. SD 9.64% 6.29% 6.27% 6.08% 6.01%
Factor-based FFL NLSF CLIME-F CLIME-F(5min) Oracle
Ann. SD 8.00% 6.18% 6.10% 5.96% 5.49%
Table 1.
Simulation comparison based on 10 years testing period. For each portfolio, its annualized
standard deviation (Ann. SD) is reported.
We observe from Table 1 that
• Among low-frequency methods, when the factor structure is not taken into consideration, CLIME performs substantially better than EW and is comparable to NLS;
• Our proposed high-frequency methods, CLIME-SV and CLIME-SVMN, work even
better. In particular, CLIME-SVMN can be used on noisy high-frequency data and
attains a substantially lower risk;
• Further incorporating factor structure helps. CLIME-F(5min) attains the lowest risk
among all feasible portfolios.
Next, we examine the estimation of minimum risk. As we discussed in Section 2.1, the
daily minimum risk is a random variable that changes from one day to another. Hence, we
obtain one estimated RbCLIME−SVMN for each day, and totally 252 × 10 = 2520 estimates.
By the law of total variance and under the stationarity assumption, we can then calculate
the unconditional daily minimum risk by taking the average of these estimates (ignoring
15
Electronic copy available at: https://ssrn.com/abstract=3105815
the variation in the expected daily returns). Alternatively, we can use daily returns in the
whole sample to estimate the (long-term) minimum risk based on Proposition 1 (Rbmin based
on (2.17)). The results are summarized in Table 2.
Estimator Rbmin based on (2.17) RbCLIME−SVMN Oracle
Estimate (annualized) 5.51% 5.29% 5.49%
Table 2. Annualized unconditional minimum risk estimates.
We see from Table 2 that both estimators give quite accurate estimates.
5 Empirical Studies
In this section, we conduct empirical studies and compare minimum variance portfolios
constructed via different methods as well as the equally-weighted portfolio.
We work with S&P 100 Index constitutes during Years 2003 – 2013. For each year to
be tested, we consider those stocks that are included in the Index during that year as well
as the previous year. For example, when constructing portfolios for Year 2004, the stocks
considered are constitutes that are common in Years 2003 and 2004, and for Year 2005, the
stocks used would be those common constitutes for Years 2004 and 20052
. The observations
are 1-min intra-day prices. For methods based on factor models, Fama-French three factors
are the factors that are utilized3
. In addition to the portfolios in the Simulation Studies
(except the Oracle portfolio which is infeasible in practice), we add the following portfolio,
which represents the latest development in the literature:
• DLX(5min) portfolio: This is based on Dai et al. (2019) and uses the Fama-French
three-factor model. 5-min intra-day observations from the immediate past 10 days
with GICS codes at the industry group level are used to construct the covariance
matrix estimator Σb, which is plugged into (1.2) to obtain the portfolio weights. The
weights are re-estimated every 5 days.
Note that because this method depends on GICS codes, we cannot implement it in
simulations.
Table 3 reports the annualized standard deviations of all the portfolios under comparison
based on daily returns in the 10-year testing period.
2For stock pools selection, this is a bit forward looking. This helps to maintain the quality of data in
test. Similar arrangements were used in other research works. Note that other than this, we use no future
information. Our testing is fully out-of-sample.
3We thank the authors of Dai et al. (2019) for sharing their high-frequency (5-min) factors data with us.
16
Electronic copy available at: https://ssrn.com/abstract=3105815
Low Frequency High Frequency
Non-factor-based EW NLS CLIME CLIME-SV(5min) CLIME-SVMN(1min)
Ann. SD 14.84% 9.64% 9.52% 9.56% 9.25%
Factor-based FFL NLSF CLIME-F DLX(5min) CLIME-F(5min)
Ann. SD 9.73% 9.50% 9.20% 9.52% 9.13%
Table 3.
Portfolio comparison based on S&P100 Index constituents during Years 2003 – 2013. For
each portfolio, its annualized standard deviation (Ann. SD) is reported.
From Table 3 we observe similar comparisons to Table 1. Specifically,
• Among low-frequency methods, when the factor structure is not taken into consideration, CLIME outperforms EW and NLS;
• With high-frequency data, CLIME-SVMN improves and attains a substantially lower
risk;
• CLIME-F(5min) performs even better and attains the lowest risk.
As a reference, the minimum risk estimated based on (2.17) with daily returns is 8.28%.
This indicates that our methods approach the minimum risk reasonably well.
Dai et al. (2019) use a slightly different measure for comparison, which is based on
monthly realized volatilities. Specifically, for each portfolio, its monthly realized volatilities
(based on 15-minute returns) are calculated and the average of monthly volatilities (after
annualization) is reported. The comparison under this alternative measure is reported in
Table 4.
Low Frequency High Frequency
Non-factor-based EW NLS CLIME CLIME-SV(5min) CLIME-SVMN(1min)
Ann. Vol. 14.46% 11.04% 10.86% 10.19% 10.12%
Factor-based FFL NLSF CLIME-F DLX(5min) CLIME-F(5min)
Ann. Vol. 10.84% 10.82% 10.60% 10.34% 9.92%
Table 4. Comparison of (annualized) average monthly volatilities based on 15-min returns.
We observe that our methods outperform the approach of Dai et al. (2019) based on
such an alternative measure as well.
To practically implement MVP, an important issue is the possible presence of extreme
positions, which sometimes calls for additional short-sale or gross-exposure constraints. For
this reason, we analyze the maximum (absolute) weights of each portfolio during the testing
period. Table 5 reports the medians and means of extreme positions for different methods.
17
Electronic copy available at: https://ssrn.com/abstract=3105815
Non-factor-based median mean
NLS 0.113 0.119
CLIME 0.071 0.074
CLIME-SV(5min) 0.069 0.070
CLIME-SVMN(1min) 0.089 0.089
Factor-based median mean
FFL 0.128 0.133
NLSF 0.158 0.163
CLIME-F 0.126 0.126
DLX(5min) 0.163 0.174
CLIME-F(5min) 0.114 0.118
Table 5.
Median and mean of max (absolute) weight for each portfolio in the testing period. Portfolio
EW has evenly distributed positions of 0.012 on each asset, and is not included in the table.
We observe from Table 5 that our proposed methods appear to be advantageous in
this regard. In both non-factor-based and factor-based panels, CLIME-based methods have
milder extreme positions. An interesting observation is that factor-based methods lead to
more severe extreme positions.
6 Conclusion
We propose estimators of the minimum variance portfolio in the high-dimensional setting
based on high-frequency data. The desired risk convergence (1.8) for the estimated portfolios
is obtained under certain sparsity assumptions on the precision matrix. We further propose
consistent estimators of the minimum risk, one of which does not require sparsity.
Numerical studies demonstrate that our methods perform favorably. An important conclusion is that high-frequency data can add value to portfolio allocation.
Acknowledgements
We thank the Editor and two anonymous referees for their constructive suggestions. We also
thank the participants of SoFiE 2016 and FERM 2016 conferences for helpful discussions on
the preliminary versions of this paper. We are grateful to insightful comments from Robert
Engle, Raymond Kan, Olivier Ledoit and Dacheng Xiu. We thank the authors of Dai et al.
(2019) for sharing their high-frequency factors data with us. Research partially supported
by NSF Grant DMS-1712735 and NIH grants R01-GM129781 and R01-GM123056; and
the RGC grants GRF 16305315, GRF 16304317, GRF 16502014, GRF 16518716 and GRF
16502118 of the HKSAR.
18
Electronic copy available at: https://ssrn.com/abstract=3105815
REFERENCES
Aït-Sahalia, Y. and Xiu, D. (2017), “Using principal component analysis to estimate a high
dimensional factor model with high-frequency data,” Journal of Econometrics, 201, 384–
399.
Ao, M., Li, Y., and Zheng, X. (2018), “Approaching mean-variance efficiency for large portfolios,” to appear in Review of Financial Studies.
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2008), “Designing
realized kernels to measure the ex post variation of equity prices in the presence of noise,”
Econometrica, 76, 1481–1536.
— (2011), “Multivariate realised kernels: Consistent positive semi-definite estimators of the
covariation of equity prices with noise and non-synchronous trading,” Journal of Econometrics, 162, 149–169.
Basak, G. K., Jagannathan, R., and Ma, T. (2009), “Jackknife estimator for tracking error
variance of optimal portfolios,” Management Science, 55, 990–1002.
Cai, T. T., Li, H., Liu, W., and Xie, J. (2012), “Covariate-adjusted precision matrix estimation with an application in genetical genomics,” Biometrika, 100, 139–156.
Cai, T. T., Liu, W., and Luo, X. (2011), “A constrained `1 minimization approach to sparse
precision matrix estimation,” Journal of the American Statistical Association, 106, 594–
607.
Chan, L. K., Karceski, J., and Lakonishok, J. (1999), “On portfolio optimization: Forecasting
covariances and choosing the risk model,” Review of Financial Studies, 12, 937–974.
Christoffersen, P. (2012), Elements of Financial Risk Management, Oxford: Academic Press.
Clarke, R. G., De Silva, H., and Thorley, S. (2006), “Minimum-variance portfolios in the US
equity market,” The Journal of Portfolio Management, 33, 10–24.
Dai, C., Lu, K., and Xiu, D. (2019), “Knowing factors or factor loadings, or neither? Evaluating estimators of large covariance matrices with noisy and asynchronous data,” Journal
of Econometrics, 208, 43–79.
DeMiguel, V., Garlappi, L., Nogales, F. J., and Uppal, R. (2009a), “A generalized approach
to portfolio optimization: Improving performance by constraining portfolio norms,” Management Science, 55, 798–812.
DeMiguel, V., Garlappi, L., and Uppal, R. (2009b), “Optimal versus naive diversification:
How inefficient is the 1/N portfolio strategy?” Review of Financial Studies, 22, 1915–1953.
Ding, Y., Li, Y., and Zheng, X. (2018), “Estimating high dimensional minimum variance
portfolio under factor model,” working paper.
Fama, E. F. and French, K. R. (1992), “The cross-section of expected stock returns,” The
Journal of Finance, 47, 427–465.
19
Electronic copy available at: https://ssrn.com/abstract=3105815
— (2015), “A five-factor asset pricing model,” Journal of Financial Economics, 116, 1–22.
Fan, J., Fan, Y., and Lv, J. (2008), “High dimensional covariance matrix estimation using a
factor model,” Journal of Econometrics, 147, 186–197.
Fan, J., Li, Y., and Yu, K. (2012a), “Vast volatility matrix estimation using high-frequency
data for portfolio selection,” Journal of the American Statistical Association, 107, 412–428.
Fan, J., Zhang, J., and Yu, K. (2012b), “Vast portfolio selection with gross-exposure constraints,” Journal of the American Statistical Association, 107, 592–606.
Gloter, A. and Jacod, J. (2001), “Diffusions with measurement errors. I. Local asymptotic
normality,” ESAIM: Probability and Statistics, 5, 225–242.
Haugen, R. A. and Baker, N. L. (1991), “The efficient market inefficiency of capitalizationweighted stock portfolios,” The Journal of Portfolio Management, 17, 35–40.
Jacod, J., Li, Y., Mykland, P. A., Podolskij, M., and Vetter, M. (2009), “Microstructure
noise in the continuous case: the pre-averaging approach,” Stochastic Processes and their
Applications, 119, 2249–2276.
Jacod, J., Li, Y., and Zheng, X. (2019), “Estimating the integrated volatility with tick
observations,” Journal of Econometrics, 208, 80–100.
Jagannathan, R. and Ma, T. (2003), “Risk reduction in large portfolios: Why imposing the
wrong constraints helps,” The Journal of Finance, 58, 1651–1684.
Kim, D. and Wang, Y. (2016), “Sparse PCA based on high-dimensional Ito processes with
measurement errors,” Journal of Multivariate Analysis, 152, 172–189.
Kim, D., Wang, Y., and Zou, J. (2016), “Asymptotic theory for large volatility matrix estimation based on high-frequency financial data,” Stochastic Processes and their Applications,
126, 3527–3577.
Ledoit, O. and Wolf, M. (2012), “Nonlinear shrinkage estimation of large-dimensional covariance matrices,” The Annals of Statistics, 40, 1024–1060.
— (2017), “Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz
meets Goldilocks,” Review of Financial Studies, 30, 4349–4388.
Li, Y., Xie, S., and Zheng, X. (2016), “Efficient estimation of integrated volatility incorporating trading information,” Journal of Econometrics, 195, 33–50.
Li, Y., Zhang, Z., and Li, Y. (2018), “A unified approach to volatility estimation in the
presence of both rounding and random market microstructure noise,” Journal of Econometrics, 203, 187–222.
Markowitz, H. (1952), “Portfolio selection,” The Journal of Finance, 7, 77–91.
Merton, R. C. (1980), “On estimating the expected return on the market: An exploratory
investigation,” Journal of Financial Economics, 8, 323–361.
20
Electronic copy available at: https://ssrn.com/abstract=3105815
Muirhead, R. J. (1982), Aspects of Multivariate Statistical Theory, John Wiley & Sons, Inc.,
New York, wiley Series in Probability and Mathematical Statistics.
Pelger, M. (2019), “Large-dimensional factor modeling based on high-frequency observations,” Journal of Econometrics, 208, 23–42.
Podolskij, M. and Vetter, M. (2009), “Estimation of volatility functionals in the simultaneous
presence of microstructure noise and jumps,” Bernoulli, 15, 634–658.
Schwartz, T. (2000), “How to beat the S&P 500 with portfolio optimization,” working paper.
Sharpe, W. (1964), “Capital asset prices: A theory of market equilibrium under conditions
of risk,” The Journal of Finance, 19, 425–442.
Tao, M., Wang, Y., and Chen, X. (2013), “Fast convergence rates in estimating large volatility matrices using high-frequency financial data,” Econometric Theory, 29, 838–856.
Tao, M., Wang, Y., Yao, Q., and Zou, J. (2011), “Large volatility matrix inference via combining low-frequency and high-frequency approaches,” Journal of the American Statistical
Association, 106, 1025–1040.
Wang, Y. and Zou, J. (2010), “Vast volatility matrix estimation for high-frequency financial
data,” The Annals of Statistics, 38, 943–978.
Xia, N. and Zheng, X. (2018), “On the inference about the spectral distribution of highdimensional covariance matrix based on high-frequency noisy observations,” The Annals
of Statistics, 46, 500–525.
Xiu, D. (2010), “Quasi-maximum likelihood estimation of volatility with high frequency
data,” Journal of Econometrics, 159, 235–250.
Zhang, L. (2006), “Efficient estimation of stochastic volatility using noisy observations: a
multi-scale approach,” Bernoulli, 12, 1019–1043.
— (2011), “Estimating covariation: Epps effect, microstructure noise,” Journal of Econometrics, 160, 33–47.
Zhang, L., Mykland, P. A., and Aït-Sahalia, Y. (2005), “A tale of two time scales: Determining integrated volatility with noisy high-frequency data,” Journal of the American
Statistical Association, 100, 1394–1411.
Zheng, X. and Li, Y. (2011), “On the estimation of integrated covariance matrices of high
dimensional diffusion processes,” The Annals of Statistics, 39, 3121–3151.
21
Electronic copy available at: https://ssrn.com/abstract=3105815
A Appendix: Proofs
We start with a result about the CLIME estimator. It is a direct consequence of Theorem 6
in Cai et al. (2011) .
Proposition 2. Suppose that Ω ∈ U(q, s0, M). If λ ≥ M||Σb − Σ||∞ with Σb the estimator
for Σ used to construct CLIME estimator, then we have
||Ωb − Ω||2 ≤ Cs0M1−qλ
1−q
, (A.1)
where C ≤ 2(1 + 21−q + 31−q
)41−q
.
Next, we show that if (A.1) holds and s0M1−qλ
1−q → 0, then under Assumption (A.i),
for all p large enough, we have
||Ωb − Ω||2. (A.2)
In fact, if s0M1−qλ
1−q → 0, then (A.1) implies that ||Ωb − Ω||2 ≤ δ/2 for all p large enough,
in which case we have
(A.3)
A.1 Proof of Theorem 1
We will show that under Assumption (A.i)-(A.v), for some positive constants C1, C2, we
have
P
||Σb RCV − ΣICV||∞ > ηp
log p/n
≤
C1
p
C2η
2−2
. (A.4)
Then with λ = ηMp
log p/n, by Proposition 2 and (A.3), with probability greater than
1 − C1/pC2η
2−2
,
RCLIME−SV
Rmin
− 1 = O
M2−2q
s0
(log p/n)
(1−q)/2
,
namely, the conclusion in the theorem holds.
We now prove (A.4). We first consider the estimation of integrated volatility and covolatility.
Lemma 1. Suppose that (Xt) satisfies
dXt = µt dt + σt dWt
, t ∈ [0, 1],
and there exist constants Cµ, Cσ such that
|µt
| ≤ Cµ, and |σt
| ≤ Cσ < ∞, for all t ∈ [0, 1] almost surely.
Suppose further that the observation times (t
n
i
) satisfy (2.10). Denote the realized volatility
by [X, X]t
:= P
{i:t
n
i ≤t}
(Xt
n
i
− Xt
n
i−1
)
2
. Then, we have
P
√
n|[X, X]1 −
Z 1
0
σ
2
t dt| > x
≤ C exp
−
x
2
32C4
σC2
∆
(A.5)
for all 0 ≤ x ≤ 2C
2
σC∆
√
n, where the constant C = 2 exp(C
2
µ/(4C
2
σ
)) + exp(C
2
µ/C2
σ
).
Proof. For notational ease, we shall omit the superscript n and write ti for t
n
i
etc. Define
Xet = X0 +
R t
0
σsdWs. Then we have Xt = Xet +
R t
0
µsds. Observe that
[X, X]t −
Z t
0
σ
2
s ds
=
[X, e Xe]t −
Z t
0
σ
2
s ds
+
X
ti≤t
Z ti
ti−1
µsds!2
+ 2X
ti≤t
Xeti − Xeti−1
·
Z ti
ti−1
µsds
:=I1 + I2 + I3.
23
Electronic copy available at: https://ssrn.com/abstract=31058
We first consider terms I1 and I2. By Eqn.(A.6) in Fan et al. (2012a), we have for
|θ| ≤ √
n/(4C
2
σC∆),
E
exp(θ
√
n · I1)
≤ exp(2θ
2C
4
σC
2
∆). (A.6)
As to term I2, it is easy to see that I2 ≤ C
2
µC∆/n. It follows that for |θ| ≤ √
n/(4C
2
σC∆),
E
exp(θ
√
n(I1 + I2))
≤ C1 exp(2θ
2C
4
σC
2
∆),
where C1 = exp(C
2
µ/(4C
2
σ
)). By Markov’s inequality, we then obtain that for 0 ≤ θ ≤
√
n/(4C
2
σC∆),
P(
√
n|I1 + I2| > x) ≤ exp(−θx)E
exp{θ
√
n|I1 + I2|}
≤2C1 exp
2θ
2C
4
σC
2
∆ − x θ
.
Taking θ = x/4C
4
σC
2
∆ yields that for x ∈ [0,
√
nC2
σC∆],
P(
√
n|I1 + I2| > x) ≤ 2C1 exp
−
x
2
8C4
σC2
∆
. (A.7)
Next we consider term I3. By Cauchy-Schwartz inequality, we have
|I3| ≤ 2
q
[X, e Xe]1I2 ≤
2Cµ
√
C∆
√
n
q
[X, e Xe]1 ≤
2CµC∆
√
n
q
[X, e Xe]1.
Therefore
P(
√
n|I3| > x) ≤P(4C
2
µC
2
∆[X, e Xe]1 > x2
)
≤E exp
C
2
µ
[X, e Xe]1
2C4
σ
−
x
2
8C4
σC2
∆
!
.
By (A.6) and the assumption that |σt
| ≤ Cσ, we obtain that when n ≥ 2C
2
µC
2
∆/C2
σ
,
P(
√
n|I3| > x) ≤ exp
C
4
µC
2
∆
2C4
σ n
+
C
2
µ
2C2
σ
−
x
2
8C4
σC2
∆
!
≤ C2 exp
−
x
2
8C4
σC2
∆
,
(A.8)
where C2 = exp(C
2
µ/C2
σ
).
Combining (A.7) and (A.8), we see that for all n ≥ 2C
2
µC
2
∆/C2
σ and all 0 ≤ x ≤
2C
2
σC∆
√
n,
P(
√
n|[X, X]1 −
Z 1
0
σ
2
t dt| > x)
≤P(
√
n|I1 + I2| > x/2) + P(
√
n|I3| > x/2)
≤(2C1 + C2) exp
−
x
2
32C4
σC2
∆
.
The conclusion in the lemma follows by taking C = 2C1 + C2.
24
Electronic copy available at: https://ssrn.com/abstract=310
Now we focus on estimating co-volatility between two log-price processes. We assume
that two log price processes X and Y satisfy
dXt = X0 + µ
X
t dt + σ
X
t dWX
t and dYt = Y0 + µ
Y
t dt + σ
Y
t dWY
t
, (A.9)
where Corr(WX
t
, WY
t
) = ρ
(X,Y )
t
.
Lemma 2. Suppose that the two processes (X, Y ) satisfy (A.9). Furthermore, assume that
there exist finite positive constants Cµ, Cσ and C∆ such that |µ
i
t
| ≤ Cµ, |σ
i
t
| ≤ Cσ a.s. for
i = X, Y . Suppose also that the two processes are observed at times {t
i
n} which satisfy (2.10).
Then, we have for all 0 ≤ x ≤ 4C
2
σC∆
√
n,
P
√
n
[X, Y ]1 −
Z 1
0
σ
X
t σ
Y
t ρ
(X,Y )
t
dt
> x
≤ 2C exp
−
x
2
128C4
σC2
∆
,
where the constant C = 2 exp(C
2
µ/(4C
2
σ
)) + exp(C
2
µ/C2
σ
).
Proof. Again we shall omit the superscript n and write ti for t
n
i
etc. Define Z
± = X ± Y .
Then
[X, Y ]1 =
1
4
([Z
+, Z+]1 − [Z
−, Z−]1). (A.10)
Note that both Z
± are stochastic processes with bounded drifts and volatilities. By Lemma 1,
for all 0 ≤ x ≤ 8C
2
σC∆
√
n,
P
√
n|[Z
±, Z±]1 −
Z 1
0
(σ
Z±
t
)
2
dt| > x
≤ C exp
−x
2
/(512C
4
σC
2
∆)
.
It follows from the decomposition (A.10) that
P
√
n
[X, Y ]1 −
Z 1
0
σ
X
t σ
Y
t ρ
(X,Y )
t
dt
> x
≤ 2C exp
−x
2
/(128C
4
σC
2
∆)
.
With the previous two lemmas, we obtain that there exist positive constants C1, C2
such that for 0 ≤ x ≤ 4C
2
σC∆
√
n,
max
i,j
P(
√
n|σbij − σij | > x) ≤ C1 exp(−C2x
2
),
where (σij ) = ΣICV and (σbij ) = Σb RCV. Therefore, by the Bonferroni inequality,
P
||Σb RCV − ΣICV||∞ > ηp
log p/n
≤
X
1≤i,j≤p
P(|σbij − σij | > ηp
log p/n)
≤ p
2
· C1 exp(−C2η
2
log(p)) = C1
p
C2η
2−2
.
This finishes the proof.
25
Electronic copy available at: https://ssrn.com/abstract=31058
A.2 Proof of Theorem 2
To prove Theorem 2, similar to the proof of Theorem 1, it is sufficient to provide a bound
on the element-wise estimation error in using Σb PAV to estimate ΣICV. The following result
from Kim and Wang (2016) gives an exponential tail bound for this element-wise estimation
error.
Proposition 3. [Theorem 1 in Kim and Wang (2016)] Suppose that (Xt) satisfies (2.1).
Under Assumption (A.i)-(A.iv) and (B.i)-(B.iii), the PAV estimator Σb PAV satisfies that
P
|(Σb PAV − ΣICV)ij | ≥ x
≤ C3 exp(−
√
nC4x
2
),
where (Σb PAV − ΣICV)ij denotes the ij-th entry of the matrix (Σb PAV − ΣICV), and x is a
positive number in a neighbor of 0, and C3 and C4 are positive constants independent of n
and p.
It follows from the Bonferroni inequality that
P
||Σb PAV − ΣICV||∞ ≥ η
√
log p
n1/4
≤
X
1≤i,j≤p
P
|(Σb PAV − ΣICV)ij | ≥ η
√
log p
n1/4
≤
X
1≤i,j≤p
C3 exp(−
√
nC4(η
p
log p/n1/4
)
2
)
= p
2C3 exp(−(C4η
2
) log p)
=
C3
p
C4η
2−2
.
(A.11)
Therefore, with λ = ηM√
log p/n1/4
, by (A.3), with probability greater than 1−C3/pC4η
2−2
,
RCLIME−SVMN
Rmin
− 1 = O
M2−2q
s0
(log p)
(1−q)/2
/n(1−q)/4
.
This completes the proof.
A.3 Proof of Theorem 3
Proof. Note that RbCLIME−SV/Rmin = 1
|Ω1/1
|Ωb CLIME−SV1 and RbCLIME−SVMN/Rmin =
1
|Ω1/1
|Ωb CLIME−SVMN1. The result for the noiseless case follows from (A.2) and (A.4),
and in the noisy case follows from (A.2) and (A.11).
A.4 Proof of Proposition 1
Proof. Note that
Rbp
Rmin
=
1
|Σ−11
1
|S−11
.
26
Electronic copy available at: https://ssrn.com/abstract=3105815
By Theorem 3.2.12 of Muirhead (1982), we have
(n − 1) Rbp
Rmin
∼ χ
2
n−p
. (A.12)
The convergence (2.16) immediately follows. Moreover,
n − 1
n
Rbmin
Rmin
=
(n − 1)
n − p
Rbp
Rmin
∼
1
n − p
χ
2
n−p
.
Therefore, as n − p → ∞,
√
n − p
n − 1
n
Rbmin
Rmin
− 1
!
⇒ N(0, 2),
which trivially implies (2.19).
A.5 Proof of Theorem 4
The proof relies on the following lemma.
Lemma 3. Let ξ1, . . . , ξn be independent random variables with mean zero. Suppose that
there exist η > 0 and K > 0 such that
E(e
η|ξk|
) ≤ K, for all k = 1, . . . , n.
Then for all ` ∈ N, there exists C = C(K, `) > 0 such that for all p and n satisfying
log p/n ≤ 1/2,
P
1
n
Xn
k=1
ξk ≥ η
−1C
r
log p
n
!
≤
1
p
`
.
Proof. The proof follows similar calculations to the proof of Theorems 1(a) and 4(a) in
Cai et al. (2011) and uses the result that |e
x − 1 − x| ≤ x
2
e
|x|
for any x ∈ R. Define
t = η
p
log p/n. Because E(ξk) = 0, by Jensen’s inequality, E(e
tξk ) ≥ 1. It follows that
log(E(e
tξk )) ≤ E(e
tξk ) − 1
= E
exp(tξk) − 1 − tξk
≤ E
t
2
ξ
2
k
e
|tξk|
≤ Cη2K
log p
n
.
The last inequality follows from the assumption log p/n ≤ 1/2 and the fact that there exists
C > 0 such that
x
2
e
η
qlog p
n
|x| ≤ x
2
e
η
q1
2
|x| ≤ Ceη|x|
, for all x ≥ 0.
27
Electronic copy available at: https://ssrn.com/abstract=3105815
Therefore, for any CK > 0,
P
1
n
Xn
k=1
ξk ≥ η
−1CK
r
log p
n
!
= P
Xn
k=1
tξk ≥ tη−1CK
p
n log p
!
= P
Xn
k=1
tξk ≥ CK log p
!
≤ exp(−CK log p)E
Yn
k=1
exp(tξk)
!
≤ exp
−CK log p + Cη2K log p
,
which is smaller than 1/p`
for all CK such that CK > ` + Cη2K.
We now prove Theorem 4.
Proof. To simplify the notation, we assume that E(fk) ≡ 0 as all results remain true with fk
replaced by fk − E(fk) in the following proof. We first show that with probability greater
than 1 − O(p
−1
), for a sufficiently large positive constant C1,
||Sf,r − ΓSf
||∞ ≤ C1
p
log p/n, (A.13)
where
Sf,r =
1
n
Xn
k=1
(rk − ¯r)(fk − ¯f)
|
and Sf =
1
n
Xn
k=1
(fk − ¯f)(fk − ¯f)
|
.
To see this, note that
Sf,r =
1
n
Xn
k=1
(Γ(fk − ¯f) + zk − z¯)(fk − ¯f)
|
= ΓSf +
1
n
Xn
k=1
(zk − z¯)(fk − ¯f)
|
.
Therefore, it suffices to show that with probability greater than 1 − O(p
−1
),
1
n
Xn
k=1
(zk − z¯)(fk − ¯f)
|
∞
≤ C1
p
log p/n. (A.14)
By Lemma 3, we have for some constant C > 0,
max
1≤i≤p,1≤j≤m
P
1
n
Xn
k=1
zkifkj
≥ C
p
log p/n!
≤
2
p
2
, (A.15)
max
1≤j≤m
P
¯fj
≥ C
p
log p/n
≤ 2/p2
, max
1≤i≤p
P
|z¯i
| ≥ C
p
log p/n
≤ 2/p2
. (A.16)
These imply (A.14).
28
Electronic copy available at: https://ssrn.com/abstract=310581
Next, recall that Γ =b Sf,rS
−1
f
is the least square estimate of Γ. Rewrite the bound (A.13)
as
||(Γb − Γ)Sf
||∞ ≤ C1
p
log p/n. (A.17)
By Lemma 3, again we have for some constant C2 > 0,
P
||Σf − Sf
||∞ ≥ C2
p
log p/n
≤ 2p
−1
.
Therefore, we have, with probability greater than 1 − O(p
−1
),
||(Γb − Γ)Σf
||∞ ≤ ||(Γb − Γ)Sf
||∞ + ||(Γb − Γ)(Σf − Sf)||∞
≤ C1
p
log p/n + C2||Γb − Γ||L∞
p
log p/n.
It follows that
||Γb − Γ||∞ ≤ ||(Γb − Γ)Σf
||∞||Σ
−1
f
||L∞
≤ ||Σ
−1
f
||L∞
C1
p
log p/n + C2||Γb − Γ||L∞
p
log p/n
≤ ||Σ
−1
f
||L∞
C1
p
log p/n + C2m||Γb − Γ||∞
p
log p/n
.
By Assumption (C.i), log p/n → 0 as n → ∞, hence for all large n,
||Γb − Γ||∞ ≤ C1||Σ
−1
f
||L∞
p
log p/n +
1
2
||Γb − Γ||∞.
Thus, (3.6) holds.
To prove result (3.7), all we need to derive is the bound on ||Se − Σ0||∞ where Se is the
sample covariance matrix of residuals ek = rk − αˆ − Γbfk, k = 1, . . . , n. The proof follows
similar lines to the proof of Theorems 4 and 5 in Cai et al. (2012), so we only give sketch
here.
Set Σbz =
1
n
Pn
k=1(zk − z¯)(zk − z¯)
|
. By Lemma 3, for some constant C > 0, we have
P
||Σbz − Σ0||∞ ≥ C
p
log p/n
≤ 2/p. (A.18)
Note also that
Se − Σbz =
1
n
Xn
k=1
((Γ − Γ)( b fk − ¯f) + zk − z¯)((Γ − Γ)( b fk − ¯f) + zk − z¯)
| − Σbz
= (Γ − Γ) b Sf(Γ − Γ) b | + (Γ − Γ) b
1
n
Xn
k=1
fkz
|
k
!
+
1
n
Xn
k=1
zkf
|
k
!
(Γ − Γ) b |
−(Γ − Γ) b ¯fz¯
| − z¯¯f
|
(Γ − Γ) b |
.
By (3.6), (A.15), (A.16), (A.17) and Lemma 3, with probability greater than 1 − O(p
−1
),
the following holds:
(Γ − Γ) b Sf(Γ − Γ) b |
∞
≤ C log p/n,
29
Electronic copy available at: https://ssrn.com/abstract=3105815
(Γ − Γ) b
1
n
Xn
k=1
fkz
|
k
!
∞
≤ C log p/n, and
(Γ − Γ) b ¯fz¯
|
∞
≤ C (log p/n)
3/2
.
Combining the bounds above, we obtain that with probability greater than 1 − O(p
−1
),
||Se − Σbz||∞ ≤ C log p/n.
Therefore, by (A.18) we get with probability greater than 1 − O(p
−1
),
||Se − Σ0||∞ ≤ C
p
log p/n.
Hence, with tuning parameter λ chosen as CMp
log p/n, the bound (3.7) follows from
Proposition 2.
Finally, about (3.8), similar to the proof of Theorem in Fan et al. (2008), we have
||Ωb CLIME−F − Ωr||2 ≤ D1 + D2 + D3 + D4 + D5 + D6, (A.19)
where
D1 = ||Ωb0 − Ω0||2,
D2 = ||Ω0Γ[(Σ
−1
f + Γ|Ω0Γ)−1 − (S
−1
f + Γb|Ωb0Γ) b −1
]Γ|Ω0||2,
D3 = ||Ω0Γ(S
−1
f + Γb|Ωb0Γ) b −1
(Γ − Γ) b |Ω0||2,
D4 = ||Ω0(Γ − Γ)( b S
−1
f + Γb|Ωb0Γ) b −1Γb|Ω0||2,
D5 = ||Ω0Γ( b S
−1
f + Γb|Ωb0Γ) b −1Γb|
(Ω0 − Ωb0)||2, and
D6 = ||(Ω0 − Ωb0)Γ( b S
−1
f + Γb|Ωb0Γ) b −1Γb|Ωb0||2.
Denote G = (Σ
−1
f + Γ|Ω0Γ)−1 and Gb = (S
−1
f + Γb|Ωb0Γ) b −1
.
First, we notice that, under Assumptions (C.ii)-(C.iv), for some constant c > 0,
λmin(Σ
−1
f + Γ|Ω0Γ) ≥ λmin(Γ|Ω0Γ) ≥ λmin(Ω0)λmin(Γ|Γ) ≥ cp.
Therefore, ||G||2 = O(p
−1
). Moreover, we have
||(Σ
−1
f + Γ|Ω0Γ) − (S
−1
f + Γb|Ωb0Γ) b ||2 ≤ ||Σ
−1
f − S
−1
f
||2 + ||(Γb − Γ)|Ωb0(Γb − Γ)||2
+2||Γ
|Ωb0(Γb − Γ)||2 + ||Γ
|
(Ωb0 − Ω0)Γ||2.
Note that
||Σ
−1
f − S
−1
f
||2 = ||Σ
−1
f
(Σf − Sf)S
−1
f
||2 ≤ ||Σ
−1
f
||2 ||Σf − Sf
||2 ||S
−1
f
||2
≤ ||Σ
−1
f
||2 · m||Σf − Sf
||∞ ||S
−1
f
||2
= Op(||Σf − Sf
||∞),
and
||(Γb − Γ)|Ωb0(Γb − Γ)||2 ≤ ||Γb − Γ||2
2
||Ωb0||2 ≤ (
√
pm||Γb − Γ||∞)
2
||Ωb0||2 = Op(p||Γb − Γ||2
∞),
30
Electronic copy available at: https://ssrn.com/abstract=3105815
where ||Ωb0||2 = Op(1) follows from ||Ω0||2 = O(1) and (3.7). In addition, by Assumption (C.iv), we have
||Γ
|Ωb0(Γb − Γ)||2 ≤ ||Γ
|Ωb0||2 ||Γb − Γ||2 ≤ ||Γ
|Ωb0||2
√
pm||Γb − Γ||∞ = Op(p||Γb − Γ||∞),
and ||Γ
|
(Ωb0 − Ω0)Γ||2 = Op(p ||Ωb0 − Ω0||2). Combining all the bounds above with (3.6)
and (3.7), we obtain that
||(Σ
−1
f + Γ|Ω0Γ) − (S
−1
f + Γb|Ωb0Γ) b ||2 = Op(p ||Ωb0 − Ω0||2).
It follows that
||G − Gb||2 = ||G(G
−1 − Gb−1
)Gb||2
≤ ||G||2 ||(Σ
−1
f + Γ|Ω0Γ) − (S
−1
f + Γb|Ωb0Γ) b ||2 ||Gb||2
= Op(||Ωb0 − Ω0||2)||Gb||2.
Because ||Ωb0 − Ω0||2 = op(1) by (3.7), and ||Gb||2 ≤ ||G||2 + ||G − Gb||2, we get ||Gb||2 =
Op(||G||2) = Op(p
−1
). This further leads to
||G − Gb||2 = Op(p
−1
||Ωb0 − Ω0||2).
It follows that
D2 ≤ ||Ω0Γ||2
2
||G − Gb||2 = Op(||Ωb0 − Ω0||2). (A.20)
Next, by (3.6), we have
D3 ≤ ||Ω0Γ||2 ||Gb||2 ||(Γ − Γ) b |Ω0||2 = Op(||Γb − Γ||∞) = Op
r
log p
n
!
. (A.21)
Similarly, D4 = Op(||Γb − Γ||∞) = Op(
p
log p/n).
Finally, since ||Gb||2 = Op(p
−1
), by Assumption (C.iv) and (3.6), we have
||ΓbGbΓb|
||2
≤ ||(Γb − Γ)Gb(Γb − Γ)|
||2 + ||ΓGb(Γb − Γ)|
||2 + ||(Γb − Γ)GbΓ
|
||2 + ||ΓGbΓ
|
||2
= Op(1).
Therefore,
D5 ≤ ||Ω0||2 ||ΓbGbΓb|
||2 ||Ωb0 − Ω0||2 = Op(||Ωb0 − Ω0||2). (A.22)
Similarly, D6 = Op(||Ωb0 − Ω0||2).
Combining all above results and by (3.7), we have
||Ωb CLIME−F − Ωr||2 = Op(||Ωb0 − Ω0||2) + Op
r
log p
n
!
= Op
M1−q
s0
log p
n
(1−q)/2
!
.
31
Electronic copy available at: https://ssrn.com/abstract=3105815
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。