联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2024-03-01 10:58

High-dimensional Minimum Variance Portfolio Estimation

Based on High-frequency Data

June 1, 2019

Abstract

This paper studies the estimation of high-dimensional minimum variance portfolio

(MVP) based on the high frequency returns which can exhibit heteroscedasticity and

possibly be contaminated by microstructure noise. Under certain sparsity assumptions

on the precision matrix, we propose estimators of the MVP and prove that our portfolios asymptotically achieve the minimum variance in a sharp sense. In addition, we

introduce consistent estimators of the minimum variance, which provide reference targets. Simulation and empirical studies demonstrate the favorable performance of the

proposed portfolios.

Key Words: Minimum variance portfolio; High dimension; High frequency; CLIME

estimator; Precision matrix.

JEL Codes: C13, C55, C58, G11

1 Introduction

1.1 Background

Since the ground-breaking work of Markowitz (1952), the mean-variance portfolio has caught

significant attention from both academics and practitioners. To implement such a strategy in

practice, the accuracy in estimating both the expected returns and the covariance structure

of returns is vital. It has been well documented that the estimation of the expected returns

is more difficult than the estimation of covariances (Merton (1980)), and the impact on

portfolio performance caused by the estimation error in the expected returns is larger than

that caused by the error in covariance estimation. These difficulties pose serious challenges

for the practical implementation of the Markowitz portfolio optimization. Theoretically,

?Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104.

tcai@wharton.upenn.edu.

?Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706. huj@stat.wisc.edu.

?Department of ISOM and Department of Finance, Hong Kong University of Science and Technology,

Clear Water Bay, Kowloon, Hong Kong. yyli@ust.hk.

§Department of ISOM, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon,

Hong Kong. xhzheng@ust.hk.

1

Electronic copy available at: https://ssrn.com/abstract=3105815

mean-variance optimization has been criticized from the portfolio standpoint due to invalid

preferences from the axioms of choice, inconsistent dynamic behavior, etc.

The minimum variance portfolio (MVP) has received growing attention over the past few

years (see, e.g., DeMiguel et al. (2009a) and the references therein). It avoids the difficulties

in estimating the expected returns and is on the efficient frontier. In addition, the MVP is

found to perform well on real data. Empirical studies in Haugen and Baker (1991), Chan

et al. (1999), Schwartz (2000), Jagannathan and Ma (2003) and Clarke et al. (2006) have

found that the MVP can enjoy both lower risk and higher return compared with some

benchmark portfolios. These features make the MVP an attractive investment strategy in

practice.

The MVP is more natural in the context of high-frequency data. Over short time horizons, mean return is usually completely dominated by the volatility, consequently, as a

prudent common practice, when the time horizon of interest is short, the expected returns

are often assumed to be zero (see, e.g., Part II of Christoffersen (2012) and the references

therein). Fan et al. (2012a) make this assumption when considering the management of

portfolios that are rebalanced daily or every other few days. When the expected returns are

zero, the mean-variance optimization reduces to the risk minimization problem, in which

one seeks the MVP.

In addition, there are benefits of using high-frequency data. On the one hand, large

number of observations can potentially help facilitate better understanding of the covariance

structure of returns. Developments in this direction in the high-dimensional setting include

Wang and Zou (2010); Tao et al. (2011); Zheng and Li (2011); Tao et al. (2013); Kim et al.

(2016); A?t-Sahalia and Xiu (2017); Xia and Zheng (2018); Dai et al. (2019); Pelger (2019),

among others. On the other hand, high-frequency data allow short-horizon rebalancing

and hence the portfolios can adjust quickly to time variability of volatilities/co-volatilities.

However, high-frequency data do come with significant challenges in analysis. Complications

arise due to heteroscedasticity and microstructure noise, among others.

We consider in this paper the estimation of high-dimensional MVP using high-frequency

data. To be more specific, given p assets, whose returns X = (X1, . . . , Xp)

| have covariance

matrix Σ, we aim to find:

arg min

w

w|Σw subject to w|1 = 1, (1.1)

where w = (w1, . . . , wp)

|

represents the weights put on different assets, and 1 = (1, . . . , 1)|

is the p-dimensional vector with all entries being 1. The optimal solution is given by

wopt =

Σ?11

1

|Σ?11

, (1.2)

which yields the minimum risk

Rmin = w

|

optΣwopt =

1

1

|Σ?11

. (1.3)

More generally, one may be interested in the following optimization problem: for a given

vector β = (β1, . . . , βp)

| and constant c,

arg min

w

we

|Σwe subject to w|1 = c, where we = (β1w1, . . . , βpwp), (1.4)

2

Electronic copy available at: https://ssrn.com/abstract=3105815

or its equivalent formulation:

arg min

we

we

|Σwe subject to we

?1 = c, where β

?1

:= (1/β1, . . . , 1/βp)

|

. (1.5)

Such a setting applies, for example, in leveraged investment. We remark that the optimization problem (1.5) can be reduced to (1.1) by noticing that if we solves (1.5), then

wˇ := (we1/β1, . . . , wep/βp)

|/c solves (1.1) with Σe = diag(β1, . . . , βp)Σ diag(β1, . . . , βp), and

vice versa. For this reason, the two optimization problems (1.1) and (1.4) or (1.5) can be

transformed into each other. In the rest of the paper, we focus on the problem (1.1).

A main challenge of solving the optimization problem (1.1) comes from high-dimensionality

because modern portfolios often involve a large number of assets. See, for example, Zheng

and Li (2011), Fan et al. (2012a), Fan et al. (2012b), Ao et al. (2018), Xia and Zheng (2018),

and Dai et al. (2019) on issues about and progress made on vast portfolio management.

On the other hand, the estimation of the minimum risk defined in (1.3) is also a problem

of interest. This provides a reference target for the estimated minimum variance portfolios.

In practice, because the true covariance matrix is unknown, the sample covariance matrix S is usually used as a proxy, and the resulting “plug-in” portfolio, wp = S

?11/1

|S

?11,

has been widely adopted. How well does such a portfolio perform? This question has been

considered in Basak et al. (2009). The following simulation result visualizes their first finding (Proposition 1 therein). Figure 1 shows the risk of the plug-in portfolio based on 100

replications. One can see that the actual risk R(wp) = w

|

pΣwp of the plug-in portfolio

can be devastatingly higher than the theoretical minimum risk. On the other hand, the

perceived risk Rbp = w

|

pSwp can be even lower than the theoretical minimum risk. Such

contradictory phenomena lead to two questions: (1) Can we consistently estimate the true

minimum risk? ; and (2) More importantly, can we find a portfolio with a risk close to the

true minimum risk?

3

Electronic copy available at: https://ssrn.com/abstract=3105815

0 20 40 60 80 100

1.0e?05 1.5e?05 2.0e?05

Comparison of risks

replication

risk

perceived risk

actual risk

minimun risk

Figure 1. Comparison of actual and perceived risks of the plug-in portfolio. The portfolios

are constructed based on returns simulated from i.i.d. multivariate normal distribution with

mean zero and covariance matrix Σ calibrated from real data; see Section 4 for details.

The number of assets and observations are 80 and 252, respectively. The comparison is

replicated 100 times.

Because of such issues with the plug-in portfolio, alternative methods have been proposed. Jagannathan and Ma (2003) argue that imposing no short-sale constraint helps.

More generally, Fan et al. (2012b) study the MVP under the following gross-exposure constraint:

arg min

w

w|Σw subject to w|1 = 1 and ||w||1 ≤ λ , (1.6)

where ||w||1 =

Pp

i=1 |wi

| and λ is a chosen constant. They derive the following bound on

the risk of estimated portfolio. If Σb is an estimator of Σ, then the solution to (1.6) with Σ

replaced by Σb, denoted by wbopt, satisfies that

|R(wbopt) ? Rmin| ≤ λ

2

· ||Σb ? Σ||∞, (1.7)

where for any weight vector w, R(w) = w|Σw stands for the risk measured by the variance

of the portfolio return, and for any matrix A = (aij ), ||A||∞ := maxij |aij |. In particular,

||Σb ?Σ||∞ is the maximum element-wise estimation error in using Σb to estimate Σ. Fan et al.

(2012a) consider the high-frequency setting, where they use the two-scale realized covariance

matrix (Zhang et al. (2005)) to estimate the integrated covariance matrix (see Section 2.1

below for related background), and establish concentration inequalities for the element-wise

estimation error. These concentration inequalities imply that even if the number of assets p

grows faster than the number of observations n, one still has that ||Σb ?Σ||∞ → 0 as n → ∞;

see equation (18) in Fan et al. (2012a) for the precise statement. In particular, bound (1.7)

4

Electronic copy available at: https://ssrn.com/abstract=3105815

guarantees that under gross-exposure constraint, the difference between the risk associated

with wbopt and the minimum risk is asymptotically negligible.

The difference between the risk of an estimated portfolio and the minimum risk going to

zero, however, may not be sufficient to guarantee (near) optimality. In fact, under rather general assumptions (which do not exclude factor models), the minimum risk Rmin = 1/1

|Σ?11

may go to zero as the number of assets p → ∞; see Ding et al. (2018) for a thorough discussion. If indeed the minimum risk goes to 0 as p → ∞, then the difference |R(wbopt) ? Rmin|

going to 0 is not enough to guarantee (near) optimality. Based on the above consideration,

we turn to find an asset allocation wb which satisfies a stronger sense of consistency in that

the ratio between the risk of the estimated portfolio and the minimum risk goes to one, i.e.,

R(wb)

Rmin

p

?→ 1 as p → ∞, (1.8)

where p

?→ stands for convergence in probability.

1.2 Main contributions of the paper

Our contributions mainly lie in the following aspects.

We propose estimators of minimum variance portfolio that can accommodate stochastic

volatility and market microstructure noise, which are intrinsic to high-frequency returns.

Under some sparsity assumptions on the inverse of the covariance matrix (also known as the

precision matrix), our estimated portfolios enjoy the desired convergence (1.8).

We also introduce consistent estimators of the minimum risk. One such estimator does

not depend on the sparsity assumption and also enjoys a CLT.

1.3 Organization of the paper

The paper is organized as follows. In Section 2, we present our estimators of the MVP

and show that their risks converge to the minimum risk in the sense of (1.8). We have

an estimator that incorporates stochastic volatility (CLIME-SV) and one that incorporates

stochastic volatility and microstructure noise (CLIME-SVMN). A consistent estimator of the

minimum risk is proposed in Section 2.4, for which we also establish the CLT. An extension of

our method to utilize factors is developed in Section 3. Section 4 presents simulation results

to illustrate the performance of both portfolio and minimum risk estimations. Empirical

study results based on S&P 100 Index constituents are reported in Section 5. We conclude

our paper with a brief summary in Section 6. All proofs are given in the Appendix.

2 Estimation Methods and Asymptotic Properties

2.1 High-frequency data model

We assume that the latent p-dimensional log-price process (Xt) follows a diffusion model:

dXt = μtdt + Θt dWt

, for t ≥ 0, (2.1)

5

Electronic copy available at: https://ssrn.com/abstract=3105815

where (μt) = (μ

1

t

, . . . , μ

p

t

)

|

is the drift process, (Θt) = (θ

ij

t

)1≤i,j≤p is a p × p matrix-valued

process called the spot co-volatility process, and (Wt) is a p-dimensional Brownian motion.

Both (μt) and (Θt) are stochastic, càdlàg, and may depend on (Wt), all defined on a

common filtered probability space (?, F,(Ft)t≥0).

Let

Σt = ΘtΘ

|

t

:= (σ

ij

t

)

be the spot covariance matrix process. The ex-post integrated covariance (ICV) matrix over

an interval, say [0, 1], is

ΣICV = ΣICV,1 = (σ

ij ) := Z 1

0

Σt dt.

Denote its inverse by ?ICV := Σ

?1

ICV. The ex-post minimum risk, Rmin, is obtained by

replacing the Σ in (1.3) with ΣICV.

Let us emphasize that in general, ΣICV is a random variable which is only measurable

to F1, and so is Rmin. It is therefore in principle impossible to construct a portfolio that

is measurable to F0 to achieve the minimum risk Rmin. Practical implementation of the

minimum variance portfolio relies on making forecasts of ΣICV based on historical data.

The simplest approach is to assume that ΣICV,t ≈ ΣICV,t+1

1

, where ΣICV,t stands for the

ICV matrix in period [t?1, t]. Under such an assumption, if we can construct a portfolio w

based on the observations during [t ? 1, t] (and hence only measurable to Ft) that can

approximately minimize the ex-post risk w|ΣICV,tw, then if we hold the portfolio during

the next period [t, t + 1], the actual risk w|ΣICV,t+1w is still approximately minimized. In

this article we adopt such a strategy.

2.2 High-frequency case with no microstructure noise

We first consider the case when there is no microstructure noise - in other words, one observes

the true log-prices (Xi

t

).

Our approach to estimate the minimum variance portfolio relies on the constrained l1-

minimization for inverse matrix estimation (CLIME) proposed in Cai et al. (2011). The

original CLIME method is developed under the i.i.d. observation setting. Specifically, suppose one has n i.i.d. observations from a population with covariance matrix Σ. Let ? := Σ?1

be the corresponding precision matrix. The CLIME estimator of ? is defined as

?b CLIME := arg min

?0

||?0

||1 subject to ||Σ?b 0 ? I||∞ ≤ λ, (2.2)

where Σb is the sample covariance matrix, I is the identity matrix, and for any matrix

A = (aij ), ||A||1 := P

i,j |aij | and, recall that, ||A||∞ = maxij |aij |. The λ is a tuning

parameter, and is usually chosen via cross-validation.

1This is related to the phenomenon that the volatility process is often found to be nearly unit root, in

which case the one-step ahead prediction is approximately the current value. The assumption is also used

in Fan et al. (2012a) and Dai et al. (2019), among others.

6

Electronic copy available at: https://ssrn.com/abstract=3105815

The CLIME method is designed for the following uniformity class of precision matrices.

For any 0 ≤ q < 1, s0 = s0(p) < ∞ and M = M(p) < ∞, let

U(q, s0, M) = n

? = (?ij )p×p : ? positive definite, ||?||L1 ≤ M, max

1≤i≤p

X

p

j=1

|?ij |

q ≤ s0

o

,

(2.3)

where ||?||L1

:= max1≤j≤p

Pp

i=1 |?ij |.

Under the situation when the observations are i.i.d. sub-Gaussian and the underlying ?

belongs to U(q, s0(p), M(p)), Cai et al. (2011) establish consistency of ?b CLIME when q, s0(p)

and M(p) satisfy M2?2q

s0 (log p/n)

(1?q)/2 → 0; see Theorem 1(a) therein.

In the high-frequency setting, due to stochastic volatility, leverage effect etc., the returns

are not i.i.d., so the results in Cai et al. (2011) do not apply. Before we discuss how

to tackle this difficulty, we remark that the sparsity assumption ? = Σ?1 ∈ U(q, s0, M)

appears to be reasonable in financial applications. For example, if the returns are assumed

to follow a (conditional) multivariate normal distribution with covariance matrix Σ, then

the (i, j)th element in ? being 0 is equivalent to that the returns of the ith and jth assets

are conditionally independent given the other asset returns. For stocks in different sectors,

many pairs might be conditionally independent or only weakly dependent.

Now we discuss how to adopt CLIME to the high-frequency setting. Our goal is to

estimate ΣICV, or, more precisely, its inverse ?ICV := Σ

?1

ICV. In order to apply CLIME, we

need to decide which Σb to use in (2.2).

When the true log-prices are observed, one of the most commonly used estimators

for ΣICV is the realized covariance (RCV) matrix. Specifically, for each asset i, suppose

the observations at stage n are

Xi

t

i,n

`



, where 0 = t

i,n

0 < ti,n

1 < · · · < ti,n

Ni

= 1 are the

observation times. The n characterizes the observation frequency, and Ni → ∞ as n → ∞.

The synchronous observation case corresponds to

t

i,n

` ≡ t

n

`

for all i = 1, . . . , p, (2.4)

which, in the simplest equidistant setting, reduces to

t

i,n

` = t

n

` = `/n, ` = 0, 1, . . . , n. (2.5)

In the synchronous observation case (2.4), for ` = 1, . . . , n, let

?X`

:= Xt

n

`

? Xt

n

`?1

be the log-return vector over the time interval [t

n

`?1

, tn

`

]. The RCV matrix is defined as

Σb RCV =

Xn

`=1

?X`(?X`)

|

. (2.6)

We are now ready to define the constrained l1-minimization for inverse matrix estimation

with stochastic volatility (CLIME-SV), ?b CLIME?SV:

?b CLIME?SV := arg min

?0

||?0

||1 subject to ||Σb RCV?0 ? I||∞ ≤ λ. (2.7)

7

Electronic copy available at: https://ssrn.com/abstract=310581

The resulting MVP estimator is given by

wbCLIME?SV =

?b CLIME?SV1

1

|?b CLIME?SV1

, (2.8)

which is associated with a risk of

RCLIME?SV = wb

|

CLIME?SVΣICVwbCLIME?SV =

(?b CLIME?SV1)

|ΣICV(?b CLIME?SV1)

(1

|?b CLIME?SV1)

2

. (2.9)

Next we discuss the theoretical properties of this portfolio estimator. We make the

following assumptions.

Assumption A:

(A.i) There exists δ ∈ (0, 1) such that for all p, δ ≤ λmin(ΣICV) ≤ λmax(ΣICV) < 1/δ

almost surely, where for any symmetric matrix A, λmin(A) and λmax(A) stand

for its smallest and largest eigenvalues, respectively.

(A.ii) The drift process is such that |μ

i

t

| ≤ Cμ for some constant Cμ < ∞ for all

i = 1, . . . , p and t ∈ [0, 1] almost surely.

(A.iii) There exists constant Cσ such that |σ

ij

t

| ≤ Cσ < ∞ for all i = 1, . . . , p and t ∈

[0, 1] almost surely.

(A.iv) The observation times t

n

`

under the synchronous setting (2.4) satisfy that there

exists constant C? such that

sup

n

max

1≤`≤n

n|t

n

` ? t

n

`?1

| ≤ C? < ∞. (2.10)

(A.v) p and n satisfy that log p/n → 0.

Remark 1. Assumptions (A.ii) and (A.iii) are standard in the high frequency literature, one

assuming bounded drift, the other assuming bounded volatility. One may relax the bounded

volatility assumption by using techniques similar to Lemma 3 of Fan et al. (2012a). Assumption (A.iv) is also widely adopted, although one may need to apply synchronization

techniques such as the previous tick (Zhang (2011)) and refresh time (Barndorff-Nielsen

et al. (2011)) in order to synchronize the observations. Assumption (A.v) implies that the

number of assets, p, can grow faster than the observation frequency.

Theorem 1. Suppose that (Xt) satisfies (2.1) and the underlying precision matrix ?ICV =

Σ

?1

ICV ∈ U(q, s0, M). Under Assumptions (A.i)-(A.v), with λ = ηMp

log p/n for some

η > 0, there exist constants C1, C2 > 0 such that

P



RCLIME?SV

Rmin

? 1 = O



M2?2q

s0



(log p/n)

(1?q)/2



≥ 1 ?

C1

p

C2η

2?2

,

where RCLIME?SV is defined in (2.9) and Rmin = 1/(1

|?ICV1).

Remark 2. Theorem 1 guarantees that as long as M2?2q

s0


(log p/n)

(1?q)/2



→ 0, the

risk of our estimated MVP is consistent in the sense of (1.8). The estimated portfolio is

therefore applicable to the ultra-high-dimensional setting where the number of assets can be

much larger than the number of observations.

8

Electronic copy available at: https://ssrn.com/abstract=310581

Remark 3. The case where observations are i.i.d. is a special case of the high frequency

setting that we adopt here. In fact, to generate i.i.d. returns under our setting, one just

needs to take constant drift and volatility processes. Our setting is therefore more general

than the i.i.d. observation setting, and all our results readily apply to that case.

2.3 High-frequency case with microstructure noise

In general, the observed prices are believed to be contaminated by microstructure noise. In

other words, instead of observing the true log-prices

Xi

t

i,n

`



, for each asset i, the observations

at stage n are

Y

i

t

i,n

`

= Xi

t

i,n

`

+ ε

i

`

, (2.11)

where ε

i

`

’s represent microstructure noise. In this case, if one simply plugs

Y

i

t

i,n

`



into the

formula of RCV in (2.6), the resulting estimator is not consistent even when the dimension p

is fixed. Consistent estimators in the univariate case include the two-scales realized volatility

(TSRV, Zhang et al. (2005)), multi-scale realized volatility (MSRV, Zhang (2006)), preaveraging estimator (PAV, Jacod et al. (2009), Podolskij and Vetter (2009) and Jacod et al.

(2019)), realized kernels (RK, Barndorff-Nielsen et al. (2008)), quasi-maximum likelihood

estimator (QMLE, Xiu (2010)), estimated-price realized volatility (ERV, Li et al. (2016))

and the unified volatility estimator (UV, Li et al. (2018)). All these estimators, however,

are not consistent in the high-dimensional setting.

In this article we choose to work with the PAV estimator. To reduce non-essential technical complications, we work with the equidistant time setting (2.5) (such as data sampled

every minute). Asynchronicity can be dealt with by using existing data synchronization

techniques. Compared with microstructure noise, asynchronicity is less an issue; see, for

example, Section 2.4 in Xia and Zheng (2018).

To implement the PAV estimator, we fix a constant θ > 0 and let kn = [θn1/2

] be the

window length over which the averaging takes place. Define

Y

n

k =

Pkn?1

i=kn/2 Ytk+i ?

Pkn/2?1

i=0 Ytk+i

kn

.

The PAV with weight function g(x) = x ∧ (1 ? x) for x ∈ (0, 1) is defined as

Σb PAV =

12

θ

n

n?X

kn+1

k=0

Y

n

k

· (Y

n

k

)

| ?

6

θ

2n

diag Xn

k=1

(?Y

i

tk

)

2

!

i=1,...,p

. (2.12)

We now define the constrained l1-minimization for inverse matrix estimation with stochastic volatility and microstructure noise (CLIME-SVMN), ?b CLIME?SVMN, as

?b CLIME?SVMN := arg min

?0

||?0

||1 subject to ||Σb PAV?0 ? I||∞ ≤ λ. (2.13)

Correspondingly, we define the estimated MVP, wbCLIME?SVMN, by replacing ?b CLIME?SV

in (2.8) with ?b CLIME?SVMN. Its associated risk, RCLIME?SVMN, is given by replacing

wbCLIME?SV and ?b CLIME?SV in (2.9) with wbCLIME?SVMN and ?b CLIME?SVMN, respectively.

9

Electronic copy available at: https://ssrn.com/abstract=31058

Now we state the assumptions that we need in order to establish the statistical properties

of the portfolio wbCLIME?SVMN.

Assumption B:

(B.i) The microstructure noise (ε

i

`

) is strictly stationary and independent with mean

zero and also independent of (Xt).

(B.ii) For any γ ∈ R,

E(exp(γεi

`

)) ≤ exp(Cγ2

),

where C is a fixed constant. Suppose also that there exists Cε > 0 such that

Var(ε

i

`

) ≤ Cε for all i and `.

(B.iii) p and n satisfy that log p/√

n → 0.

Theorem 2. Suppose that (Xt) satisfies (2.1) and ?ICV = Σ

?1

ICV ∈ U(q, s0, M). Under

Assumptions (A.i)-(A.iv) and (B.i)-(B.iii), with λ = ηM√

log p/n1/4

for some η > 0, there

exist constants C3, C4 > 0 such that

P



RCLIME?SVMN

Rmin

? 1 = O



M2?2q

s0



(log p)

(1?q)/2

/n(1?q)/4



≥ 1 ?

C3

p

C4η

2?2

.

Remark 4. Theorem 2 states that in the noisy case, if M2?2q

s0


(log p)

(1?q)/2/n(1?q)/4



→ 0,

then the risk of our estimated MVP is consistent in the sense of (1.8).

Remark 5. The reduction in the rate from (

log p/√

n)

1?q

in Theorem 1 to (

log p/n1/4

)

1?q

in the current theorem is an inevitable consequence due to noise. In fact, the optimal rate in

estimating the integrated volatility in the noisy case is O(n

1/4

) (Gloter and Jacod (2001)),

as is compared with the rate of O(n

1/2

) in the noiseless case.

2.4 Estimating the minimum risk

So far we have seen that under certain sparsity assumptions on the precision matrix, we can

construct a portfolio with a risk close to the minimum risk in the sense of (1.8). Now we

turn to consistent estimation of the minimum risk.

2.4.1 Assuming sparsity: using CLIME based estimator

The CLIME based estimators we considered in Sections 2.2 and 2.3 also lead to estimators of

the minimum risk Rmin. Recall that Rmin = 1/1

|?ICV1. Our estimators are, when working

under high-frequency setting without noise,

RbCLIME?SV =

1

1

|?b CLIME?SV1

, (2.14)

where, recall that, ?b CLIME?SV is defined in (2.7). Similarly, if microstructure noise is

present, then the minimum risk estimator is defined as

RbCLIME?SVMN =

1

1

|?b CLIME?SVMN1

, (2.15)

10

Electronic copy available at: https://ssrn.com/abstract=310581

where ?b CLIME?SVMN is defined in (2.13).

We have the following results for these estimators.

Theorem 3. (i) Under the assumptions in Theorem 1, the minimum risk estimator RbCLIME?SV

defined in (2.14) satisfies that

P


RbCLIME?SV

Rmin

? 1 = O



M2?2q

s0



(log p/n)

(1?q)/2

!

≥ 1 ?

C1

p

C2η

2?2

.

(ii) Under the assumptions in Theorem 2, the minimum risk estimator RbCLIME?SVMN

defined in (2.15) satisfies that

P


RbCLIME?SVMN

Rmin

? 1 = O



M2?2q

s0



(log p)

(1?q)/2

/n(1?q)/4

!

≥ 1 ?

C3

p

C4η

2?2

.

2.4.2 Without sparsity assumption: low-frequency i.i.d. returns

CLIME based estimators work under mild sparsity assumptions on the precision matrix.

Now we propose another estimator of the minimum risk, which does not rely on such sparsity

assumptions, although on the other hand, it assumes that the observations are i.i.d. normally

distributed. This estimator is hence more suitable for the low frequency setting and can be

used to estimate the minimum risk over a long time period.

More specifically, suppose that we observe n i.i.d. returns X1, . . . , Xn (possibly at low

frequency). Let S be the sample covariance matrix, and let wp = S

?11/1

|S

?11 be the “plugin” portfolio. The corresponding perceived risk is Rbp = w

|

pSwp. We have the following result

on the relationship between Rbp and the minimum risk Rmin, based on which a consistent

estimator of the minimum risk is constructed.

Proposition 1. Suppose that the returns X1, . . . , Xn ~i.i.d. N(μ, Σ). Suppose further that

both n and p → ∞ in such a way that ρn := p/n → ρ ∈ (0, 1). Then

Rbp

Rmin

? (1 ? ρn)

p

?→ 0. (2.16)

Therefore, if we define

Rbmin =

1

1 ? ρn

Rbp, (2.17)

then

Rbmin

Rmin

p

?→ 1. (2.18)

Furthermore, we have

n ? p


Rbmin

Rmin

? 1

!

? N(0, 2). (2.19)

11

Electronic copy available at: https://ssrn.com/abstract=3105815

The convergence (2.19) actually shows “blessing” of dimensionality: the higher the dimension, the more accurate the estimation.

The convergence (2.16) explains why in Figure 1 the perceived risk is systematically

lower than the minimum risk. We also remark that the “plug-in” portfolio is not optimal.

In Basak et al. (2009), it is shown that the risk of the plug-in portfolio is on average a

higher-than-one multiple of the minimum risk; see Propositions 1 and 2 therein. Under the

condition that both p and n → ∞ and p/n → ρ ∈ (0, 1), their result can be strengthened to

be that the risk of the plug-in portfolio is, with probability approaching one, a larger-thanone multiple of the minimum risk. In fact, using the relationship (5) in Basak et al. (2009),

it is easy to show that

R(wp)

Rmin

p

?→

1

1 ? ρ

, (2.20)

where R(wp) = w

|

pΣwp is the risk of the plug-in portfolio.

3 Estimation with Factors

In practice, stock returns often exhibit a factor structure. Prominent examples include

CAPM (Sharpe (1964)), Fama-French Three Factor and Five Factor models (Fama and

French (1992, 2015)). Recently, there have also been studies on factor modeling for highfrequency returns; see A?t-Sahalia and Xiu (2017); Dai et al. (2019); Pelger (2019). In this

section, we extend our approach to accommodate such a structure.

Let rk = (rk,1, . . . , rk,p)

|

, k = 1, . . . , n, be asset returns, which can be either highfrequency such as 5-minute returns or low-frequency like daily or weekly returns. We assume

that (rk) admits a factor structure as follows:

rk = α + Γfk + zk, (3.1)

where α is a p × 1 unknown vector, Γ is a p × m unknown coefficient matrix, fk =

(fk,1, . . . , fk,m)

| are factor returns, and zk is p×1 random vector with mean 0 and covariance

matrix Σk,0 = (σ

k,0

ij ). We assume that for each k, fk’s and zk’s are independent, and the pairs

(rk,fk), k = 1, . . . , n, are mutually independent. Let Σr,k = Cov(rk) and Σf,k = Cov(fk).

Note that we allow both Σr,k and Σf,k to depend on the time index k, to accommodate

stochastic (co-)volatility. Because of such a feature, it is impossible to estimate individual Σr,k or Σf,k, but it is possible to estimate their means, namely, Σr =

1

n

Pn

k=1 Σr,k,

Σf =

1

n

Pn

k=1 Σf,k, and Σ0 =

1

n

Pn

k=1 Σk,0 and the corresponding ?0 = Σ

?1

0

. The estimation procedure requires consistently estimating the means of (rk,fk), which entails the

condition that E(fk)’s are the same for all k.

To estimate the model, we first calculate the least squares estimators of α and Γ, denoted

by α? and Γb respectively. The residuals are ek := rk ? α? ? Γbfk. We then obtain a CLIME

estimator of ?0, denoted by ?b0, based on the residuals. Specifically,

?b0 := arg min

?0

||?0

||1 subject to ||Se?0 ? I||∞ ≤ λ, (3.2)

where Se is the sample covariance matrix of residuals (e1, . . . , en).

12

Electronic copy available at: https://ssrn.com/abstract=3105815

Our goal is to estimate the precision matrix ?r := Σ?1

r

. To do so, note that Σr =

ΓΣfΓ

| + Σ0, it follows that

?r = (ΓΣfΓ

| + Σ0)

?1 = ?0 ? ?0Γ(Σ

?1

f + Γ|?0Γ)?1Γ

|?0. (3.3)

Therefore, we can define the constrained l1-minimization for inverse matrix estimation adjusted for factor (CLIME-F), ?b CLIME?F, as

?b CLIME?F = ?b0 ? ?b0Γ( b S

?1

f + Γb|?b0Γ) b ?1Γb|?b0, (3.4)

where Sf

is the sample covariance matrix of (f1, . . . ,fn). From here the MVP estimator is

wbCLIME?F =

?b CLIME?F1

1

|?b CLIME?F1

. (3.5)

To establish the statistical properties for this estimator, the following assumptions are

made.

Assumption C:

(C.i) Let the number of factors, m, be fixed, and log(p) = o(n). Moreover, there exist

η > 0 and K > 0 such that for all i = 1, . . . , m, j = 1, . . . , p, and k = 1, . . . , n,

E(exp(ηf 2

ki)) ≤ K, E(exp(ηz2

kj/σk,0

jj )) ≤ K, and σ

k,0

jj ≤ K.

(C.ii) Σf =

1

n

Pn

k=1 Cov(fk) is of full rank, and there exists Kf > 0 such that

||Σ

?1

f

||L∞ ≤ Kf

, where for any matrix A = (aij ), ||A||L∞ := maxi

P

j

|aij |.

(C.iii) There exists δ ∈ (0, 1) such that for all p, δ ≤ λmin(Σ0) ≤ λmax(Σ0) < 1/δ.

(C.iv) For all p, the eigenvalues of p

?1Γ

|Γ are bounded away from 0 and ∞.

Theorem 4. Suppose that the returns (r1, . . . , rn) follow (3.1) and the precision matrix

?0 ∈ U(q, s0, M). Under Assumptions (C.i)-(C.iv), with probability greater than 1?O(p

?1

),

we have, for some constant C5 > 0,

||Γb ? Γ||∞ ≤ C5



log p

n

1/2

. (3.6)

Moreover, with λ = C6M

p

log p/n for some sufficiently large positive constant C6, with

probability greater than 1 ? O(p

?1

), we have, for some constants C7, C8 > 0,

||?b0 ? ?0||2 ≤ C7M1?q

s0



log p

n

(1?q)/2

, (3.7)

and

||?b CLIME?F ? ?r||2 ≤ C8M1?q

s0



log p

n

(1?q)/2

, (3.8)

where for any symmetric matrix A, ||A||2 stands for its spectral norm.

Remark 6. When (rk,fk), k = 1, . . . , n, are i.i.d., we have Σr,k ≡ Σr and Σf,k ≡ Σf

, and

Theorem 4 readily applies. We do not impose such a strong assumption and allow both Σr,k

and Σf,k to change over k, to accommodate the stochastic (co-)volatility in high-frequency

returns.

13

Electronic copy available at: https://ssrn.com/abstract=3105815

4 Simulation Studies

We simulate the p-dimensional log-price process (Xt) from a stochastic factor model:

dXt = Γ

μf dt + γtΣ

1/2

f

dW(1)

t



+ γtΣ

1/2

0

dW(2)

t

, for t ∈ [0, 1], (4.1)

where μf and Σf are the mean vector and covariance matrix of an m-dimensional factor

return, (W(1)

t

) and (W(2)

t

) are independent m-dimensional and p-dimensional standard

Brownian motions, and (γt) is a stochastic U-shaped process that obeys

dγt = ?ρ(γt ? ?t) dt + σ dBt

,

where ρ = 10, σ = 0.05,

?t =

p

1 + 0.8 cos(2πt), for t ∈ [0, 1],

and (Bt) is a standard Brownian motion that is independent of (W(1)

t

,W(2)

t

). Under such

a setting, the ICV matrix is

ΣICV =

Z 1

0

γ

2

t dt

Σ, where Σ = ΓΣfΓ

| + Σ0.

We now discuss the specifications of the parameters in (4.1). The parameter values are

learned from real data as follows. We work with the daily returns in Year 2012 of randomly

selected eighty S&P100 Index constituents that are used in our empirical studies (Section 5).

The factors are taken to mimic the Fama-French three factors, and we set μf and Σf to be

the sample mean and sample covariance matrix of the daily factor returns in Year 2012. As

to the coefficients Γ, they are taken to be the estimated coefficients by fitting the FamaFrench three-factor model to the daily returns of the eighty stocks. Finally, we construct a

CLIME estimator ?b0 based on the residuals as in (3.2), and set Σ0 = ?b ?1

0

.

We shall compare our proposed portfolios with several existing methods that are based

on low-frequency data. We simulate 11 years’ data, each year consisting of T = 252 days

and each day n = 391 log-prices. The first year is taken as the initial training period and the

remaining 10 years as the testing period. The log-prices are contaminated with simulated

microstructure noise. Specifically, for each stock i, the noise is generated from N(0, σ2

e,i),

where σe,i ~ Unif(0.0001, 0.0005).

We include the following portfolios in our comparison. All portfolios are constructed

using a rolling window scheme, with different window lengths according to whether a portfolio is a low-frequency strategy or a high-frequency one. For methods using low-frequency

data (methods (ii)-(vi)), daily returns during the immediate past 252 days are used to construct the covariance matrix estimator Σb or precision matrix estimator ?b, which is then

plugged into (1.2), and the weights are re-calculated every 21 days. For methods using

high-frequency data (methods (vii)-(ix)), the portfolio weights are estimated with returns

from the immediate past 10 days and are re-estimated every 5 days.

(i) Equally-weighted (EW) portfolio. This serves as an important benchmark portfolio as

shown in DeMiguel et al. (2009b);

14

Electronic copy available at: https://ssrn.com/abstract=310581

(ii) FFL portfolio. This is based on Fan et al. (2008) and utilizes the three factors from

the data-generating process;

(iii) Nonlinear shrinkage (NLS) portfolio. This is based on Ledoit and Wolf (2012);

(iv) NLSF portfolio. This is based on Ledoit and Wolf (2017);

(v) CLIME portfolio. To construct this portfolio, daily returns are treated as i.i.d. observations and the original CLIME method is directly applied to estimate the precision

matrix and the portfolio weight;

(vi) CLIME-F portfolio. This is based on Theorem 4 using daily returns and utilizing the

three factors from the data-generating process;

(vii) CLIME-SV(5min) portfolio. This is based on Theorem 1 using 5-min returns;

(viii) CLIME-F(5min) portfolio: This is a high-frequency version of CLIME-F where the

factor model is estimated using 5-min intra-day returns;

(ix) CLIME-SVMN(1min) portfolio. This is based on Theorem 2 using 1-min returns;

(x) Oracle portfolio. This is the optimal portfolio obtained by plugging the ICV matrix into

the weight formula (1.2). This portfolio is infeasible because ICV is only measurable to

the filtration at the end of the day while the portfolio is to be held from the beginning

of each day.

Table 1 reports the annualized standard deviations of these portfolios based on 10 years

of testing.

Low Frequency High Frequency

Non-factor-based EW NLS CLIME CLIME-SV(5min) CLIME-SVMN(1min)

Ann. SD 9.64% 6.29% 6.27% 6.08% 6.01%

Factor-based FFL NLSF CLIME-F CLIME-F(5min) Oracle

Ann. SD 8.00% 6.18% 6.10% 5.96% 5.49%

Table 1.

Simulation comparison based on 10 years testing period. For each portfolio, its annualized

standard deviation (Ann. SD) is reported.

We observe from Table 1 that

? Among low-frequency methods, when the factor structure is not taken into consideration, CLIME performs substantially better than EW and is comparable to NLS;

? Our proposed high-frequency methods, CLIME-SV and CLIME-SVMN, work even

better. In particular, CLIME-SVMN can be used on noisy high-frequency data and

attains a substantially lower risk;

? Further incorporating factor structure helps. CLIME-F(5min) attains the lowest risk

among all feasible portfolios.

Next, we examine the estimation of minimum risk. As we discussed in Section 2.1, the

daily minimum risk is a random variable that changes from one day to another. Hence, we

obtain one estimated RbCLIME?SVMN for each day, and totally 252 × 10 = 2520 estimates.

By the law of total variance and under the stationarity assumption, we can then calculate

the unconditional daily minimum risk by taking the average of these estimates (ignoring

15

Electronic copy available at: https://ssrn.com/abstract=3105815

the variation in the expected daily returns). Alternatively, we can use daily returns in the

whole sample to estimate the (long-term) minimum risk based on Proposition 1 (Rbmin based

on (2.17)). The results are summarized in Table 2.

Estimator Rbmin based on (2.17) RbCLIME?SVMN Oracle

Estimate (annualized) 5.51% 5.29% 5.49%

Table 2. Annualized unconditional minimum risk estimates.

We see from Table 2 that both estimators give quite accurate estimates.

5 Empirical Studies

In this section, we conduct empirical studies and compare minimum variance portfolios

constructed via different methods as well as the equally-weighted portfolio.

We work with S&P 100 Index constitutes during Years 2003 – 2013. For each year to

be tested, we consider those stocks that are included in the Index during that year as well

as the previous year. For example, when constructing portfolios for Year 2004, the stocks

considered are constitutes that are common in Years 2003 and 2004, and for Year 2005, the

stocks used would be those common constitutes for Years 2004 and 20052

. The observations

are 1-min intra-day prices. For methods based on factor models, Fama-French three factors

are the factors that are utilized3

. In addition to the portfolios in the Simulation Studies

(except the Oracle portfolio which is infeasible in practice), we add the following portfolio,

which represents the latest development in the literature:

? DLX(5min) portfolio: This is based on Dai et al. (2019) and uses the Fama-French

three-factor model. 5-min intra-day observations from the immediate past 10 days

with GICS codes at the industry group level are used to construct the covariance

matrix estimator Σb, which is plugged into (1.2) to obtain the portfolio weights. The

weights are re-estimated every 5 days.

Note that because this method depends on GICS codes, we cannot implement it in

simulations.

Table 3 reports the annualized standard deviations of all the portfolios under comparison

based on daily returns in the 10-year testing period.

2For stock pools selection, this is a bit forward looking. This helps to maintain the quality of data in

test. Similar arrangements were used in other research works. Note that other than this, we use no future

information. Our testing is fully out-of-sample.

3We thank the authors of Dai et al. (2019) for sharing their high-frequency (5-min) factors data with us.

16

Electronic copy available at: https://ssrn.com/abstract=3105815

Low Frequency High Frequency

Non-factor-based EW NLS CLIME CLIME-SV(5min) CLIME-SVMN(1min)

Ann. SD 14.84% 9.64% 9.52% 9.56% 9.25%

Factor-based FFL NLSF CLIME-F DLX(5min) CLIME-F(5min)

Ann. SD 9.73% 9.50% 9.20% 9.52% 9.13%

Table 3.

Portfolio comparison based on S&P100 Index constituents during Years 2003 – 2013. For

each portfolio, its annualized standard deviation (Ann. SD) is reported.

From Table 3 we observe similar comparisons to Table 1. Specifically,

? Among low-frequency methods, when the factor structure is not taken into consideration, CLIME outperforms EW and NLS;

? With high-frequency data, CLIME-SVMN improves and attains a substantially lower

risk;

? CLIME-F(5min) performs even better and attains the lowest risk.

As a reference, the minimum risk estimated based on (2.17) with daily returns is 8.28%.

This indicates that our methods approach the minimum risk reasonably well.

Dai et al. (2019) use a slightly different measure for comparison, which is based on

monthly realized volatilities. Specifically, for each portfolio, its monthly realized volatilities

(based on 15-minute returns) are calculated and the average of monthly volatilities (after

annualization) is reported. The comparison under this alternative measure is reported in

Table 4.

Low Frequency High Frequency

Non-factor-based EW NLS CLIME CLIME-SV(5min) CLIME-SVMN(1min)

Ann. Vol. 14.46% 11.04% 10.86% 10.19% 10.12%

Factor-based FFL NLSF CLIME-F DLX(5min) CLIME-F(5min)

Ann. Vol. 10.84% 10.82% 10.60% 10.34% 9.92%

Table 4. Comparison of (annualized) average monthly volatilities based on 15-min returns.

We observe that our methods outperform the approach of Dai et al. (2019) based on

such an alternative measure as well.

To practically implement MVP, an important issue is the possible presence of extreme

positions, which sometimes calls for additional short-sale or gross-exposure constraints. For

this reason, we analyze the maximum (absolute) weights of each portfolio during the testing

period. Table 5 reports the medians and means of extreme positions for different methods.

17

Electronic copy available at: https://ssrn.com/abstract=3105815

Non-factor-based median mean

NLS 0.113 0.119

CLIME 0.071 0.074

CLIME-SV(5min) 0.069 0.070

CLIME-SVMN(1min) 0.089 0.089

Factor-based median mean

FFL 0.128 0.133

NLSF 0.158 0.163

CLIME-F 0.126 0.126

DLX(5min) 0.163 0.174

CLIME-F(5min) 0.114 0.118

Table 5.

Median and mean of max (absolute) weight for each portfolio in the testing period. Portfolio

EW has evenly distributed positions of 0.012 on each asset, and is not included in the table.

We observe from Table 5 that our proposed methods appear to be advantageous in

this regard. In both non-factor-based and factor-based panels, CLIME-based methods have

milder extreme positions. An interesting observation is that factor-based methods lead to

more severe extreme positions.

6 Conclusion

We propose estimators of the minimum variance portfolio in the high-dimensional setting

based on high-frequency data. The desired risk convergence (1.8) for the estimated portfolios

is obtained under certain sparsity assumptions on the precision matrix. We further propose

consistent estimators of the minimum risk, one of which does not require sparsity.

Numerical studies demonstrate that our methods perform favorably. An important conclusion is that high-frequency data can add value to portfolio allocation.

Acknowledgements

We thank the Editor and two anonymous referees for their constructive suggestions. We also

thank the participants of SoFiE 2016 and FERM 2016 conferences for helpful discussions on

the preliminary versions of this paper. We are grateful to insightful comments from Robert

Engle, Raymond Kan, Olivier Ledoit and Dacheng Xiu. We thank the authors of Dai et al.

(2019) for sharing their high-frequency factors data with us. Research partially supported

by NSF Grant DMS-1712735 and NIH grants R01-GM129781 and R01-GM123056; and

the RGC grants GRF 16305315, GRF 16304317, GRF 16502014, GRF 16518716 and GRF

16502118 of the HKSAR.

18

Electronic copy available at: https://ssrn.com/abstract=3105815

REFERENCES

A?t-Sahalia, Y. and Xiu, D. (2017), “Using principal component analysis to estimate a high

dimensional factor model with high-frequency data,” Journal of Econometrics, 201, 384–

399.

Ao, M., Li, Y., and Zheng, X. (2018), “Approaching mean-variance efficiency for large portfolios,” to appear in Review of Financial Studies.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2008), “Designing

realized kernels to measure the ex post variation of equity prices in the presence of noise,”

Econometrica, 76, 1481–1536.

— (2011), “Multivariate realised kernels: Consistent positive semi-definite estimators of the

covariation of equity prices with noise and non-synchronous trading,” Journal of Econometrics, 162, 149–169.

Basak, G. K., Jagannathan, R., and Ma, T. (2009), “Jackknife estimator for tracking error

variance of optimal portfolios,” Management Science, 55, 990–1002.

Cai, T. T., Li, H., Liu, W., and Xie, J. (2012), “Covariate-adjusted precision matrix estimation with an application in genetical genomics,” Biometrika, 100, 139–156.

Cai, T. T., Liu, W., and Luo, X. (2011), “A constrained `1 minimization approach to sparse

precision matrix estimation,” Journal of the American Statistical Association, 106, 594–

607.

Chan, L. K., Karceski, J., and Lakonishok, J. (1999), “On portfolio optimization: Forecasting

covariances and choosing the risk model,” Review of Financial Studies, 12, 937–974.

Christoffersen, P. (2012), Elements of Financial Risk Management, Oxford: Academic Press.

Clarke, R. G., De Silva, H., and Thorley, S. (2006), “Minimum-variance portfolios in the US

equity market,” The Journal of Portfolio Management, 33, 10–24.

Dai, C., Lu, K., and Xiu, D. (2019), “Knowing factors or factor loadings, or neither? Evaluating estimators of large covariance matrices with noisy and asynchronous data,” Journal

of Econometrics, 208, 43–79.

DeMiguel, V., Garlappi, L., Nogales, F. J., and Uppal, R. (2009a), “A generalized approach

to portfolio optimization: Improving performance by constraining portfolio norms,” Management Science, 55, 798–812.

DeMiguel, V., Garlappi, L., and Uppal, R. (2009b), “Optimal versus naive diversification:

How inefficient is the 1/N portfolio strategy?” Review of Financial Studies, 22, 1915–1953.

Ding, Y., Li, Y., and Zheng, X. (2018), “Estimating high dimensional minimum variance

portfolio under factor model,” working paper.

Fama, E. F. and French, K. R. (1992), “The cross-section of expected stock returns,” The

Journal of Finance, 47, 427–465.

19

Electronic copy available at: https://ssrn.com/abstract=3105815

— (2015), “A five-factor asset pricing model,” Journal of Financial Economics, 116, 1–22.

Fan, J., Fan, Y., and Lv, J. (2008), “High dimensional covariance matrix estimation using a

factor model,” Journal of Econometrics, 147, 186–197.

Fan, J., Li, Y., and Yu, K. (2012a), “Vast volatility matrix estimation using high-frequency

data for portfolio selection,” Journal of the American Statistical Association, 107, 412–428.

Fan, J., Zhang, J., and Yu, K. (2012b), “Vast portfolio selection with gross-exposure constraints,” Journal of the American Statistical Association, 107, 592–606.

Gloter, A. and Jacod, J. (2001), “Diffusions with measurement errors. I. Local asymptotic

normality,” ESAIM: Probability and Statistics, 5, 225–242.

Haugen, R. A. and Baker, N. L. (1991), “The efficient market inefficiency of capitalizationweighted stock portfolios,” The Journal of Portfolio Management, 17, 35–40.

Jacod, J., Li, Y., Mykland, P. A., Podolskij, M., and Vetter, M. (2009), “Microstructure

noise in the continuous case: the pre-averaging approach,” Stochastic Processes and their

Applications, 119, 2249–2276.

Jacod, J., Li, Y., and Zheng, X. (2019), “Estimating the integrated volatility with tick

observations,” Journal of Econometrics, 208, 80–100.

Jagannathan, R. and Ma, T. (2003), “Risk reduction in large portfolios: Why imposing the

wrong constraints helps,” The Journal of Finance, 58, 1651–1684.

Kim, D. and Wang, Y. (2016), “Sparse PCA based on high-dimensional Ito processes with

measurement errors,” Journal of Multivariate Analysis, 152, 172–189.

Kim, D., Wang, Y., and Zou, J. (2016), “Asymptotic theory for large volatility matrix estimation based on high-frequency financial data,” Stochastic Processes and their Applications,

126, 3527–3577.

Ledoit, O. and Wolf, M. (2012), “Nonlinear shrinkage estimation of large-dimensional covariance matrices,” The Annals of Statistics, 40, 1024–1060.

— (2017), “Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz

meets Goldilocks,” Review of Financial Studies, 30, 4349–4388.

Li, Y., Xie, S., and Zheng, X. (2016), “Efficient estimation of integrated volatility incorporating trading information,” Journal of Econometrics, 195, 33–50.

Li, Y., Zhang, Z., and Li, Y. (2018), “A unified approach to volatility estimation in the

presence of both rounding and random market microstructure noise,” Journal of Econometrics, 203, 187–222.

Markowitz, H. (1952), “Portfolio selection,” The Journal of Finance, 7, 77–91.

Merton, R. C. (1980), “On estimating the expected return on the market: An exploratory

investigation,” Journal of Financial Economics, 8, 323–361.

20

Electronic copy available at: https://ssrn.com/abstract=3105815

Muirhead, R. J. (1982), Aspects of Multivariate Statistical Theory, John Wiley & Sons, Inc.,

New York, wiley Series in Probability and Mathematical Statistics.

Pelger, M. (2019), “Large-dimensional factor modeling based on high-frequency observations,” Journal of Econometrics, 208, 23–42.

Podolskij, M. and Vetter, M. (2009), “Estimation of volatility functionals in the simultaneous

presence of microstructure noise and jumps,” Bernoulli, 15, 634–658.

Schwartz, T. (2000), “How to beat the S&P 500 with portfolio optimization,” working paper.

Sharpe, W. (1964), “Capital asset prices: A theory of market equilibrium under conditions

of risk,” The Journal of Finance, 19, 425–442.

Tao, M., Wang, Y., and Chen, X. (2013), “Fast convergence rates in estimating large volatility matrices using high-frequency financial data,” Econometric Theory, 29, 838–856.

Tao, M., Wang, Y., Yao, Q., and Zou, J. (2011), “Large volatility matrix inference via combining low-frequency and high-frequency approaches,” Journal of the American Statistical

Association, 106, 1025–1040.

Wang, Y. and Zou, J. (2010), “Vast volatility matrix estimation for high-frequency financial

data,” The Annals of Statistics, 38, 943–978.

Xia, N. and Zheng, X. (2018), “On the inference about the spectral distribution of highdimensional covariance matrix based on high-frequency noisy observations,” The Annals

of Statistics, 46, 500–525.

Xiu, D. (2010), “Quasi-maximum likelihood estimation of volatility with high frequency

data,” Journal of Econometrics, 159, 235–250.

Zhang, L. (2006), “Efficient estimation of stochastic volatility using noisy observations: a

multi-scale approach,” Bernoulli, 12, 1019–1043.

— (2011), “Estimating covariation: Epps effect, microstructure noise,” Journal of Econometrics, 160, 33–47.

Zhang, L., Mykland, P. A., and A?t-Sahalia, Y. (2005), “A tale of two time scales: Determining integrated volatility with noisy high-frequency data,” Journal of the American

Statistical Association, 100, 1394–1411.

Zheng, X. and Li, Y. (2011), “On the estimation of integrated covariance matrices of high

dimensional diffusion processes,” The Annals of Statistics, 39, 3121–3151.

21

Electronic copy available at: https://ssrn.com/abstract=3105815

A Appendix: Proofs

We start with a result about the CLIME estimator. It is a direct consequence of Theorem 6

in Cai et al. (2011) .

Proposition 2. Suppose that ? ∈ U(q, s0, M). If λ ≥ M||Σb ? Σ||∞ with Σb the estimator

for Σ used to construct CLIME estimator, then we have

||?b ? ?||2 ≤ Cs0M1?qλ

1?q

, (A.1)

where C ≤ 2(1 + 21?q + 31?q

)41?q

.

Next, we show that if (A.1) holds and s0M1?qλ

1?q → 0, then under Assumption (A.i),

for all p large enough, we have


||?b ? ?||2. (A.2)

In fact, if s0M1?qλ

1?q → 0, then (A.1) implies that ||?b ? ?||2 ≤ δ/2 for all p large enough,

in which case we have


(A.3)

A.1 Proof of Theorem 1

We will show that under Assumption (A.i)-(A.v), for some positive constants C1, C2, we

have

P



||Σb RCV ? ΣICV||∞ > ηp

log p/n

C1

p

C2η

2?2

. (A.4)

Then with λ = ηMp

log p/n, by Proposition 2 and (A.3), with probability greater than

1 ? C1/pC2η

2?2

,

RCLIME?SV

Rmin

? 1 = O



M2?2q

s0



(log p/n)

(1?q)/2

 ,

namely, the conclusion in the theorem holds.

We now prove (A.4). We first consider the estimation of integrated volatility and covolatility.

Lemma 1. Suppose that (Xt) satisfies

dXt = μt dt + σt dWt

, t ∈ [0, 1],

and there exist constants Cμ, Cσ such that

|μt

| ≤ Cμ, and |σt

| ≤ Cσ < ∞, for all t ∈ [0, 1] almost surely.

Suppose further that the observation times (t

n

i

) satisfy (2.10). Denote the realized volatility

by [X, X]t

:= P

{i:t

n

i ≤t}

(Xt

n

i

? Xt

n

i?1

)

2

. Then, we have

P



n|[X, X]1 ?

Z 1

0

σ

2

t dt| > x

≤ C exp 

?

x

2

32C4

σC2

?



(A.5)

for all 0 ≤ x ≤ 2C

2

σC?

n, where the constant C = 2 exp(C

2

μ/(4C

2

σ

)) + exp(C

2

μ/C2

σ

).

Proof. For notational ease, we shall omit the superscript n and write ti for t

n

i

etc. Define

Xet = X0 +

R t

0

σsdWs. Then we have Xt = Xet +

R t

0

μsds. Observe that

[X, X]t ?

Z t

0

σ

2

s ds

=



[X, e Xe]t ?

Z t

0

σ

2

s ds

+

X

ti≤t

Z ti

ti?1

μsds!2

+ 2X

ti≤t



Xeti ? Xeti?1



·

Z ti

ti?1

μsds

:=I1 + I2 + I3.

23

Electronic copy available at: https://ssrn.com/abstract=31058

We first consider terms I1 and I2. By Eqn.(A.6) in Fan et al. (2012a), we have for

|θ| ≤ √

n/(4C

2

σC?),

E


exp(θ

n · I1)



≤ exp(2θ

2C

4

σC

2

?). (A.6)

As to term I2, it is easy to see that I2 ≤ C

2

μC?/n. It follows that for |θ| ≤ √

n/(4C

2

σC?),

E


exp(θ

n(I1 + I2))

≤ C1 exp(2θ

2C

4

σC

2

?),

where C1 = exp(C

2

μ/(4C

2

σ

)). By Markov’s inequality, we then obtain that for 0 ≤ θ ≤

n/(4C

2

σC?),

P(

n|I1 + I2| > x) ≤ exp(?θx)E


exp{θ

n|I1 + I2|}

≤2C1 exp

2C

4

σC

2

? ? x θ

.

Taking θ = x/4C

4

σC

2

? yields that for x ∈ [0,

nC2

σC?],

P(

n|I1 + I2| > x) ≤ 2C1 exp 

?

x

2

8C4

σC2

?



. (A.7)

Next we consider term I3. By Cauchy-Schwartz inequality, we have

|I3| ≤ 2

q

[X, e Xe]1I2 ≤

2Cμ

C?

n

q

[X, e Xe]1 ≤

2CμC?

n

q

[X, e Xe]1.

Therefore

P(

n|I3| > x) ≤P(4C

2

μC

2

?[X, e Xe]1 > x2

)

≤E exp

C

2

μ

[X, e Xe]1

2C4

σ

?

x

2

8C4

σC2

?

!

.

By (A.6) and the assumption that |σt

| ≤ Cσ, we obtain that when n ≥ 2C

2

μC

2

?/C2

σ

,

P(

n|I3| > x) ≤ exp

C

4

μC

2

?

2C4

σ n

+

C

2

μ

2C2

σ

?

x

2

8C4

σC2

?

!

≤ C2 exp 

?

x

2

8C4

σC2

?



,

(A.8)

where C2 = exp(C

2

μ/C2

σ

).

Combining (A.7) and (A.8), we see that for all n ≥ 2C

2

μC

2

?/C2

σ and all 0 ≤ x ≤

2C

2

σC?

n,

P(

n|[X, X]1 ?

Z 1

0

σ

2

t dt| > x)

≤P(

n|I1 + I2| > x/2) + P(

n|I3| > x/2)

≤(2C1 + C2) exp 

?

x

2

32C4

σC2

?



.

The conclusion in the lemma follows by taking C = 2C1 + C2.

24

Electronic copy available at: https://ssrn.com/abstract=310

Now we focus on estimating co-volatility between two log-price processes. We assume

that two log price processes X and Y satisfy

dXt = X0 + μ

X

t dt + σ

X

t dWX

t and dYt = Y0 + μ

Y

t dt + σ

Y

t dWY

t

, (A.9)

where Corr(WX

t

, WY

t

) = ρ

(X,Y )

t

.

Lemma 2. Suppose that the two processes (X, Y ) satisfy (A.9). Furthermore, assume that

there exist finite positive constants Cμ, Cσ and C? such that |μ

i

t

| ≤ Cμ, |σ

i

t

| ≤ Cσ a.s. for

i = X, Y . Suppose also that the two processes are observed at times {t

i

n} which satisfy (2.10).

Then, we have for all 0 ≤ x ≤ 4C

2

σC?

n,

P



n

[X, Y ]1 ?

Z 1

0

σ

X

t σ

Y

t ρ

(X,Y )

t

dt

> x

≤ 2C exp 

?

x

2

128C4

σC2

?



,

where the constant C = 2 exp(C

2

μ/(4C

2

σ

)) + exp(C

2

μ/C2

σ

).

Proof. Again we shall omit the superscript n and write ti for t

n

i

etc. Define Z

± = X ± Y .

Then

[X, Y ]1 =

1

4

([Z

+, Z+]1 ? [Z

?, Z?]1). (A.10)

Note that both Z

± are stochastic processes with bounded drifts and volatilities. By Lemma 1,

for all 0 ≤ x ≤ 8C

2

σC?

n,

P



n|[Z

±, Z±]1 ?

Z 1

0

t

)

2

dt| > x

≤ C exp

?x

2

/(512C

4

σC

2

?)



.

It follows from the decomposition (A.10) that

P



n

[X, Y ]1 ?

Z 1

0

σ

X

t σ

Y

t ρ

(X,Y )

t

dt

> x

≤ 2C exp

?x

2

/(128C

4

σC

2

?)



.

With the previous two lemmas, we obtain that there exist positive constants C1, C2

such that for 0 ≤ x ≤ 4C

2

σC?

n,

max

i,j

P(

n|σbij ? σij | > x) ≤ C1 exp(?C2x

2

),

where (σij ) = ΣICV and (σbij ) = Σb RCV. Therefore, by the Bonferroni inequality,

P



||Σb RCV ? ΣICV||∞ > ηp

log p/n

X

1≤i,j≤p

P(|σbij ? σij | > ηp

log p/n)

≤ p

2

· C1 exp(?C2η

2

log(p)) = C1

p

C2η

2?2

.

This finishes the proof.

25

Electronic copy available at: https://ssrn.com/abstract=31058

A.2 Proof of Theorem 2

To prove Theorem 2, similar to the proof of Theorem 1, it is sufficient to provide a bound

on the element-wise estimation error in using Σb PAV to estimate ΣICV. The following result

from Kim and Wang (2016) gives an exponential tail bound for this element-wise estimation

error.

Proposition 3. [Theorem 1 in Kim and Wang (2016)] Suppose that (Xt) satisfies (2.1).

Under Assumption (A.i)-(A.iv) and (B.i)-(B.iii), the PAV estimator Σb PAV satisfies that

P



|(Σb PAV ? ΣICV)ij | ≥ x



≤ C3 exp(?

nC4x

2

),

where (Σb PAV ? ΣICV)ij denotes the ij-th entry of the matrix (Σb PAV ? ΣICV), and x is a

positive number in a neighbor of 0, and C3 and C4 are positive constants independent of n

and p.

It follows from the Bonferroni inequality that

P



||Σb PAV ? ΣICV||∞ ≥ η

log p

n1/4



X

1≤i,j≤p

P



|(Σb PAV ? ΣICV)ij | ≥ η

log p

n1/4



X

1≤i,j≤p

C3 exp(?

nC4(η

p

log p/n1/4

)

2

)

= p

2C3 exp(?(C4η

2

) log p)

=

C3

p

C4η

2?2

.

(A.11)

Therefore, with λ = ηM√

log p/n1/4

, by (A.3), with probability greater than 1?C3/pC4η

2?2

,

RCLIME?SVMN

Rmin

? 1 = O



M2?2q

s0



(log p)

(1?q)/2

/n(1?q)/4

 .

This completes the proof.

A.3 Proof of Theorem 3

Proof. Note that RbCLIME?SV/Rmin = 1

|?1/1

|?b CLIME?SV1 and RbCLIME?SVMN/Rmin =

1

|?1/1

|?b CLIME?SVMN1. The result for the noiseless case follows from (A.2) and (A.4),

and in the noisy case follows from (A.2) and (A.11).

A.4 Proof of Proposition 1

Proof. Note that

Rbp

Rmin

=

1

|Σ?11

1

|S?11

.

26

Electronic copy available at: https://ssrn.com/abstract=3105815

By Theorem 3.2.12 of Muirhead (1982), we have

(n ? 1) Rbp

Rmin

~ χ

2

n?p

. (A.12)

The convergence (2.16) immediately follows. Moreover,

n ? 1

n

Rbmin

Rmin

=

(n ? 1)

n ? p

Rbp

Rmin

1

n ? p

χ

2

n?p

.

Therefore, as n ? p → ∞,

n ? p


n ? 1

n

Rbmin

Rmin

? 1

!

? N(0, 2),

which trivially implies (2.19).

A.5 Proof of Theorem 4

The proof relies on the following lemma.

Lemma 3. Let ξ1, . . . , ξn be independent random variables with mean zero. Suppose that

there exist η > 0 and K > 0 such that

E(e

η|ξk|

) ≤ K, for all k = 1, . . . , n.

Then for all ` ∈ N, there exists C = C(K, `) > 0 such that for all p and n satisfying

log p/n ≤ 1/2,

P


1

n

Xn

k=1

ξk ≥ η

?1C

r

log p

n

!

1

p

`

.

Proof. The proof follows similar calculations to the proof of Theorems 1(a) and 4(a) in

Cai et al. (2011) and uses the result that |e

x ? 1 ? x| ≤ x

2

e

|x|

for any x ∈ R. Define

t = η

p

log p/n. Because E(ξk) = 0, by Jensen’s inequality, E(e

tξk ) ≥ 1. It follows that

log(E(e

tξk )) ≤ E(e

tξk ) ? 1

= E



exp(tξk) ? 1 ? tξk



≤ E



t

2

ξ

2

k

e

|tξk|



≤ Cη2K

log p

n

.

The last inequality follows from the assumption log p/n ≤ 1/2 and the fact that there exists

C > 0 such that

x

2

e

η

qlog p

n

|x| ≤ x

2

e

η

q1

2

|x| ≤ Ceη|x|

, for all x ≥ 0.

27

Electronic copy available at: https://ssrn.com/abstract=3105815

Therefore, for any CK > 0,

P


1

n

Xn

k=1

ξk ≥ η

?1CK

r

log p

n

!

= P

Xn

k=1

tξk ≥ tη?1CK

p

n log p

!

= P

Xn

k=1

tξk ≥ CK log p

!

≤ exp(?CK log p)E

Yn

k=1

exp(tξk)

!

≤ exp

?CK log p + Cη2K log p



,

which is smaller than 1/p`

for all CK such that CK > ` + Cη2K.

We now prove Theorem 4.

Proof. To simplify the notation, we assume that E(fk) ≡ 0 as all results remain true with fk

replaced by fk ? E(fk) in the following proof. We first show that with probability greater

than 1 ? O(p

?1

), for a sufficiently large positive constant C1,

||Sf,r ? ΓSf

||∞ ≤ C1

p

log p/n, (A.13)

where

Sf,r =

1

n

Xn

k=1

(rk ? ˉr)(fk ? ˉf)

|

and Sf =

1

n

Xn

k=1

(fk ? ˉf)(fk ? ˉf)

|

.

To see this, note that

Sf,r =

1

n

Xn

k=1

(Γ(fk ? ˉf) + zk ? zˉ)(fk ? ˉf)

|

= ΓSf +

1

n

Xn

k=1

(zk ? zˉ)(fk ? ˉf)

|

.

Therefore, it suffices to show that with probability greater than 1 ? O(p

?1

),

1

n

Xn

k=1

(zk ? zˉ)(fk ? ˉf)

|

≤ C1

p

log p/n. (A.14)

By Lemma 3, we have for some constant C > 0,

max

1≤i≤p,1≤j≤m

P


1

n

Xn

k=1

zkifkj

≥ C

p

log p/n!

2

p

2

, (A.15)

max

1≤j≤m

P



ˉfj

≥ C

p

log p/n

≤ 2/p2

, max

1≤i≤p

P



|zˉi

| ≥ C

p

log p/n

≤ 2/p2

. (A.16)

These imply (A.14).

28

Electronic copy available at: https://ssrn.com/abstract=310581

Next, recall that Γ =b Sf,rS

?1

f

is the least square estimate of Γ. Rewrite the bound (A.13)

as

||(Γb ? Γ)Sf

||∞ ≤ C1

p

log p/n. (A.17)

By Lemma 3, again we have for some constant C2 > 0,

P



||Σf ? Sf

||∞ ≥ C2

p

log p/n

≤ 2p

?1

.

Therefore, we have, with probability greater than 1 ? O(p

?1

),

||(Γb ? Γ)Σf

||∞ ≤ ||(Γb ? Γ)Sf

||∞ + ||(Γb ? Γ)(Σf ? Sf)||∞

≤ C1

p

log p/n + C2||Γb ? Γ||L∞

p

log p/n.

It follows that

||Γb ? Γ||∞ ≤ ||(Γb ? Γ)Σf

||∞||Σ

?1

f

||L∞

≤ ||Σ

?1

f

||L∞



C1

p

log p/n + C2||Γb ? Γ||L∞

p

log p/n

≤ ||Σ

?1

f

||L∞



C1

p

log p/n + C2m||Γb ? Γ||∞

p

log p/n

.

By Assumption (C.i), log p/n → 0 as n → ∞, hence for all large n,

||Γb ? Γ||∞ ≤ C1||Σ

?1

f

||L∞

p

log p/n +

1

2

||Γb ? Γ||∞.

Thus, (3.6) holds.

To prove result (3.7), all we need to derive is the bound on ||Se ? Σ0||∞ where Se is the

sample covariance matrix of residuals ek = rk ? α? ? Γbfk, k = 1, . . . , n. The proof follows

similar lines to the proof of Theorems 4 and 5 in Cai et al. (2012), so we only give sketch

here.

Set Σbz =

1

n

Pn

k=1(zk ? zˉ)(zk ? zˉ)

|

. By Lemma 3, for some constant C > 0, we have

P



||Σbz ? Σ0||∞ ≥ C

p

log p/n

≤ 2/p. (A.18)

Note also that

Se ? Σbz =

1

n

Xn

k=1

((Γ ? Γ)( b fk ? ˉf) + zk ? zˉ)((Γ ? Γ)( b fk ? ˉf) + zk ? zˉ)

| ? Σbz

= (Γ ? Γ) b Sf(Γ ? Γ) b | + (Γ ? Γ) b


1

n

Xn

k=1

fkz

|

k

!

+


1

n

Xn

k=1

zkf

|

k

!

(Γ ? Γ) b |

?(Γ ? Γ) b ˉfzˉ

| ? zˉˉf

|

(Γ ? Γ) b |

.

By (3.6), (A.15), (A.16), (A.17) and Lemma 3, with probability greater than 1 ? O(p

?1

),

the following holds:

(Γ ? Γ) b Sf(Γ ? Γ) b |

≤ C log p/n,

29

Electronic copy available at: https://ssrn.com/abstract=3105815

(Γ ? Γ) b


1

n

Xn

k=1

fkz

|

k

!

≤ C log p/n, and

(Γ ? Γ) b ˉfzˉ

|

≤ C (log p/n)

3/2

.

Combining the bounds above, we obtain that with probability greater than 1 ? O(p

?1

),

||Se ? Σbz||∞ ≤ C log p/n.

Therefore, by (A.18) we get with probability greater than 1 ? O(p

?1

),

||Se ? Σ0||∞ ≤ C

p

log p/n.

Hence, with tuning parameter λ chosen as CMp

log p/n, the bound (3.7) follows from

Proposition 2.

Finally, about (3.8), similar to the proof of Theorem in Fan et al. (2008), we have

||?b CLIME?F ? ?r||2 ≤ D1 + D2 + D3 + D4 + D5 + D6, (A.19)

where

D1 = ||?b0 ? ?0||2,

D2 = ||?0Γ[(Σ

?1

f + Γ|?0Γ)?1 ? (S

?1

f + Γb|?b0Γ) b ?1

]Γ|?0||2,

D3 = ||?0Γ(S

?1

f + Γb|?b0Γ) b ?1

(Γ ? Γ) b |?0||2,

D4 = ||?0(Γ ? Γ)( b S

?1

f + Γb|?b0Γ) b ?1Γb|?0||2,

D5 = ||?0Γ( b S

?1

f + Γb|?b0Γ) b ?1Γb|

(?0 ? ?b0)||2, and

D6 = ||(?0 ? ?b0)Γ( b S

?1

f + Γb|?b0Γ) b ?1Γb|?b0||2.

Denote G = (Σ

?1

f + Γ|?0Γ)?1 and Gb = (S

?1

f + Γb|?b0Γ) b ?1

.

First, we notice that, under Assumptions (C.ii)-(C.iv), for some constant c > 0,

λmin(Σ

?1

f + Γ|?0Γ) ≥ λmin(Γ|?0Γ) ≥ λmin(?0)λmin(Γ|Γ) ≥ cp.

Therefore, ||G||2 = O(p

?1

). Moreover, we have

||(Σ

?1

f + Γ|?0Γ) ? (S

?1

f + Γb|?b0Γ) b ||2 ≤ ||Σ

?1

f ? S

?1

f

||2 + ||(Γb ? Γ)|?b0(Γb ? Γ)||2

+2||Γ

|?b0(Γb ? Γ)||2 + ||Γ

|

(?b0 ? ?0)Γ||2.

Note that

||Σ

?1

f ? S

?1

f

||2 = ||Σ

?1

f

(Σf ? Sf)S

?1

f

||2 ≤ ||Σ

?1

f

||2 ||Σf ? Sf

||2 ||S

?1

f

||2

≤ ||Σ

?1

f

||2 · m||Σf ? Sf

||∞ ||S

?1

f

||2

= Op(||Σf ? Sf

||∞),

and

||(Γb ? Γ)|?b0(Γb ? Γ)||2 ≤ ||Γb ? Γ||2

2

||?b0||2 ≤ (

pm||Γb ? Γ||∞)

2

||?b0||2 = Op(p||Γb ? Γ||2

∞),

30

Electronic copy available at: https://ssrn.com/abstract=3105815

where ||?b0||2 = Op(1) follows from ||?0||2 = O(1) and (3.7). In addition, by Assumption (C.iv), we have

||Γ

|?b0(Γb ? Γ)||2 ≤ ||Γ

|?b0||2 ||Γb ? Γ||2 ≤ ||Γ

|?b0||2

pm||Γb ? Γ||∞ = Op(p||Γb ? Γ||∞),

and ||Γ

|

(?b0 ? ?0)Γ||2 = Op(p ||?b0 ? ?0||2). Combining all the bounds above with (3.6)

and (3.7), we obtain that

||(Σ

?1

f + Γ|?0Γ) ? (S

?1

f + Γb|?b0Γ) b ||2 = Op(p ||?b0 ? ?0||2).

It follows that

||G ? Gb||2 = ||G(G

?1 ? Gb?1

)Gb||2

≤ ||G||2 ||(Σ

?1

f + Γ|?0Γ) ? (S

?1

f + Γb|?b0Γ) b ||2 ||Gb||2

= Op(||?b0 ? ?0||2)||Gb||2.

Because ||?b0 ? ?0||2 = op(1) by (3.7), and ||Gb||2 ≤ ||G||2 + ||G ? Gb||2, we get ||Gb||2 =

Op(||G||2) = Op(p

?1

). This further leads to

||G ? Gb||2 = Op(p

?1

||?b0 ? ?0||2).

It follows that

D2 ≤ ||?0Γ||2

2

||G ? Gb||2 = Op(||?b0 ? ?0||2). (A.20)

Next, by (3.6), we have

D3 ≤ ||?0Γ||2 ||Gb||2 ||(Γ ? Γ) b |?0||2 = Op(||Γb ? Γ||∞) = Op

r

log p

n

!

. (A.21)

Similarly, D4 = Op(||Γb ? Γ||∞) = Op(

p

log p/n).

Finally, since ||Gb||2 = Op(p

?1

), by Assumption (C.iv) and (3.6), we have

||ΓbGbΓb|

||2

≤ ||(Γb ? Γ)Gb(Γb ? Γ)|

||2 + ||ΓGb(Γb ? Γ)|

||2 + ||(Γb ? Γ)GbΓ

|

||2 + ||ΓGbΓ

|

||2

= Op(1).

Therefore,

D5 ≤ ||?0||2 ||ΓbGbΓb|

||2 ||?b0 ? ?0||2 = Op(||?b0 ? ?0||2). (A.22)

Similarly, D6 = Op(||?b0 ? ?0||2).

Combining all above results and by (3.7), we have

||?b CLIME?F ? ?r||2 = Op(||?b0 ? ?0||2) + Op

r

log p

n

!

= Op


M1?q

s0



log p

n

(1?q)/2

!

.

31

Electronic copy available at: https://ssrn.com/abstract=3105815



相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp