Comparisonwith Knockoff Filters in Low-dimensional CaseFalse discovery rate (FDR), formally definedas the expected fraction of false chosen features over all selected variables,is of high importance when we carry out a model selection procedure.Controlling this criteria at low-level guarantees most of the selectedvariables are true and reproducible. In this chapter, we are going to introducea method, called Knockoff Filters, to achieve this target when dealing with lowdimension cases, i.e.

when there are more observations than candidate variables(n>p). This method could also be generalized to work on high-dimensionallogistic regression, but will not be covered in this paper. (For details, see ) KnockoffFilter (KF)As in the previous chapters, we are goingto build the method on lasso regression. Our target is to build some sensibletest statistics, which could be used to test the null hypothesis: ?_j=0 foreach candidate variable. An important observation is that this method does notrequire any knowledge on the noise level ?, neither in the dummy variableconstruction nor in the FDR controlling theory. Steps of the method is given asfollowing: Constructknockoff variablesFor each candidate variable X_j?jth column of the n×p design matrix X?, wenormalize it such that the Gram matrix ?=X^T X satisfies? ??_jj=?||X_j ||?_2^2=1. Then we construct aknockoff copy (X_j ) ? obeying the following properties: X ?^T X ?=?, X^T X ?=?-diag{s}, where s is a pre-determined p-dimensional non-negativevector.

By definition, X ? has the same correlation structure as the originalmatrix since X_j^T (X_k ) ?=X_j^T X_k for all j?k. To ensure that the KnockoffFilter is powerful enough to differentiate the true variables from noise ones,the entries in s should be as large as possible so that (X_j ) ? is not toosimilar to X_j. There are various strategies to constructsuch knockoff variables: Fixed-X knockoffs, Model-X Gaussian knockoffs etc.

Atradeoff exists when choosing from these two methods, as the former does notrequire knowledge of the data generating process, at an expense that thecomplementary statistics should satisfy the “sufficiency” and “antisymmetry”property to perform FDR control. Model-X Gaussian knockoffs X ? isconstructed obeying the following two properties: Forany subset S?{1,2,…,p}, ?(X,X ?)?_(swap(S))~(X,X ?). This property iscalled the pairwise exchangeability: swapping the columns of any subset ofvariables and their knockoffs keeps the joint distribution invariant. X??Y|X. Note this is guaranteed if Y is not used in the construction.

Note that ?(X,X ?)?_(swap(S)) is obtained from (X,X ?) by swapping the columns X_j and(X_j ) ? for every j?S. For example, ?(X_1,X_2,X_3,(X_1) ?,(X_2 ) ?,(X_3 ) ?)?_(swap({1,2))=((X_1 ) ?,(X_2 )?,X_3,X_1,X_2 ,(X_3 ) ?). Sequential Conditional Independent Pairsgives an explicit construction:For (j in 1: p) {sample (X_j ) ? fromL(X_j ?| X_(-j),X ?_(1:j-1)) } To see why this algorithm produces knockoffvariables satisfying the pairwise exchangeability condition, refer to Appendix B. Determineappropriate statisticIn , it is defined that a statistic Whas 1) the sufficiency property if W depends only on the Gram matrix and onfeature-response inner products. 2) the antisymmetry property if swapping X_jand (X_j ) ? would result in a change of sign of W_j. We are going to introduce two teststatistics that are most related to this paper: Importancestatistics based on the lasso with cross-validationFit a linear regression model via penalizedmaximum likelihood and cross-validation. Then, compute the difference statisticW_j=|Z_j |-|(Z_j ) ? |, where Z_j and (Z_j ) ? are the coefficient estimatesfor the jth variable and its knockoff, respectively. However, this statisticdoes not satisfy the “sufficiency” condition, which potentially fails themechanism of FDR controlling, in particular, when pairing with the Fixed-Xknockoffs.

Penalizedlinear regression statistics for knockoffCompute the signed maximum statistic W_j=?max?(Z?_j,(Z_j ) ?)×?sign?(Z?_j-(Z_j ) ?), where Z_j and (Z_j ) ? arethe maximum values of ? at which the jth variable and its knockoff,respectively, enter the penalized linear regression model.We would expect Z_j and (Z_j ) ? to be large for most of the true variablesand small for null features, because a large value indicates that this featureenters the Lasso model early. On the other hand, a positive value of W_j wouldsuggest X_j being selected before its knockoff (X_j ) ?. As a result, to rejectthe null hypothesis that a candidate variable is a noise: X_j=0, we need tohave a large positive value of W_j. Calculatedata-dependent statisticIn this section, we are going to focus onthe second statistic from the previous section and explain briefly why themodel selecting procedure would perform FDR control.

Let’s remind ourselves howFDR is defined: FDP=(#{j:?_j=0 and j?S ?})/(#{j:j?S ? } ), FDR=E(FDP). Let W={|W_j |:j=1,…,p} and suppose q is thetarget FDR, we can define a variable T, which depends on the data, to be athreshold:(Knockoff) T=min?{t?W:(#{j:W_j?-t})/(#{j:W_j?t} )?q}, with model S ?={j:W_j?T}For a noise feature, it is equally likelyby our construction whether the original variable X_j or its knockoff (X_j ) ?being selected first into the mode: #{null j:W_j?-t} is equal in distributionto #{null j:W_j?t}.(FDP) ?(t)?(#{j:W_j?-t})/(#{j:W_j?t} )?(#{null j:W_j?-t})/(#{j:W_j?t} )?(#{nullj:W_j?t})/(#{j:W_j?t} )=:FDPNote that the inequality is usually tightsince most impactful signals will be selected earlier than their knockoffs,i.e. #{j:?_j?0 and W_j?-t} is small (only one red square in the example offigure ?). Hence (FDP) ?(t) can be used as an estimate of FDPunder knockoff filter whose magnitude is upper-bounded, according to thedefinition of threshold t. This result therefore inspires us to control aquantity that is very close to FDR:Theorem: For q?0,1, the knockoff methodsatisfies E(#{j:?_j=0 and j?S ?})/(#{j:j?S ? }+q^(-1) )?q,where expectation is taken over theGaussian noise while treating X and X ? fixed.

This quantity converges to the real FDR asymptotically,since q^(-1) on the denominator would have little impact as the model sizeincreases. However, we would still like to manage to control the exact FDR, andthis could be achieved by setting the threshold in a slightly conservative wayas following (Conservative meaning that T_+?T):(Knockoff+) T_+=min?{t?W:(1+#{j:W_j?-t})/(#{j:W_j?t} )?q}, with model S ?={j:W_j?T_+}The additional “1” on the numerator isessential to derive FDR control theory when there are extremely fewdiscoveries.Theorem: For q?0,1, the knockoff+ methodsatisfies FDR=E(#{j:?_j=0 and j?S ?})/(#{j:j?S ? } )?q,where expectation is taken over theGaussian noise while treating X and X ? fixed.