Title: | Fitting Mixed Models with Known Covariance Structures |
---|---|
Description: | The main functions are 'emmreml', and 'emmremlMultiKernel'. 'emmreml' solves a mixed model with known covariance structure using the 'EMMA' algorithm. 'emmremlMultiKernel' is a wrapper for 'emmreml' to handle multiple random components with known covariance structures. The function 'emmremlMultivariate' solves a multivariate gaussian mixed model with known covariance structure using the 'ECM' algorithm. |
Authors: | Deniz Akdemir, Okeke Uche Godfrey |
Maintainer: | Deniz Akdemir <[email protected]> |
License: | GPL-2 |
Version: | 3.1 |
Built: | 2025-02-19 04:28:45 UTC |
Source: | https://github.com/cran/EMMREML |
This function estimates the parameters of the model
where is the
vector of response variable,
is a
known design matrix of fixed effects,
is a
known design matrix of random effects,
is
vector of fixed effects coefficients and
and
are independent variables with
and
correspondingly. It also produces the BLUPs and some other useful statistics like large sample estimates of variances and PEV.
emmreml(y, X, Z, K,varbetahat=FALSE,varuhat=FALSE, PEVuhat=FALSE, test=FALSE)
emmreml(y, X, Z, K,varbetahat=FALSE,varuhat=FALSE, PEVuhat=FALSE, test=FALSE)
y |
|
X |
|
Z |
|
K |
|
varbetahat |
TRUE or FALSE |
varuhat |
TRUE or FALSE |
PEVuhat |
TRUE or FALSE |
test |
TRUE or FALSE |
Vu |
Estimate of |
Ve |
Estimate of |
betahat |
BLUEs for |
uhat |
BLUPs for |
Xsqtestbeta |
|
pvalbeta |
pvalues obtained from large sample theory for the fixed effects. We report the pvalues adjusted by the "padjust" function for all fixed effect coefficients. |
Xsqtestu |
|
pvalu |
pvalues obtained from large sample theory for the BLUPs. We report the pvalues adjusted by the "padjust" function. |
varuhat |
Large sample variance for the BLUPs. |
varbetahat |
Large sample variance for the |
PEVuhat |
Prediction error variance estimates for the BLUPs. |
loglik |
loglikelihood for the model. |
n=200 M1<-matrix(rnorm(n*300), nrow=n) K1<-cov(t(M1)) K1=K1/mean(diag(K1)) covY<-2*K1+1*diag(n) Y<-10+crossprod(chol(covY),rnorm(n)) #training set Trainset<-sample(1:n, 150) funout<-emmreml(y=Y[Trainset], X=matrix(rep(1, n)[Trainset], ncol=1), Z=diag(n)[Trainset,], K=K1) cor(Y[-Trainset], funout$uhat[-Trainset])
n=200 M1<-matrix(rnorm(n*300), nrow=n) K1<-cov(t(M1)) K1=K1/mean(diag(K1)) covY<-2*K1+1*diag(n) Y<-10+crossprod(chol(covY),rnorm(n)) #training set Trainset<-sample(1:n, 150) funout<-emmreml(y=Y[Trainset], X=matrix(rep(1, n)[Trainset], ncol=1), Z=diag(n)[Trainset,], K=K1) cor(Y[-Trainset], funout$uhat[-Trainset])
The main functions are emmreml, and emmremlMultiKernel.emmreml solves a mixed model with known covariance structure using the EMMA algorithm in Kang et.al. (2008). emmremlMultiKernel is a wrapper for emmreml to handle multiple random components with known covariance structures. The function emmremlMultivariate solves a multivariate gaussian mixed model with known covariance structure using the ECM algorithm in Zhou and Stephens (2012).
Package: | EMMREML |
Type: | Package |
Version: | 3.1 |
Date: | 2015-07-20 |
License: | GPL-2 |
Deniz Akdemir, Okeke Uche Godfrey
Maintainer: Deniz Akdemir <[email protected]>
Efficient control of population structure in model organism association mapping. Kang, Hyun Min and Zaitlen, Noah A and Wade, Claire M and Kirby, Andrew and Heckerman, David and Daly, Mark J and Eskin, Eleazar. Genetics, 2008.
Genome-wide efficient mixed-model analysis for association studies. Zhou, Xiang and Stephens, Matthew. Nature genetics, 2012.
This function is a wrapper for the emmreml to fit Gaussian mixed model with multiple mixed effects with known covariances. The model fitted is where
is the
vector of response variable,
is a
known design matrix of fixed effects,
is a
known design matrix of random effects for
,
is
vector of fixed effects coefficients and
and
are independent variables with
and
correspondingly. The function produces the BLUPs for the
dimensional random effect
.
The variance parameters for random effects are estimated as
where
are the kernel weights. The function also provides some useful statistics like large sample estimates of variances and PEV.
emmremlMultiKernel(y, X, Zlist, Klist, varbetahat=FALSE,varuhat=FALSE, PEVuhat=FALSE, test=FALSE)
emmremlMultiKernel(y, X, Zlist, Klist, varbetahat=FALSE,varuhat=FALSE, PEVuhat=FALSE, test=FALSE)
y |
|
X |
|
Zlist |
list of random effects design matrices of dimensions |
Klist |
list of known relationship matrices of dimensions |
varbetahat |
TRUE or FALSE |
varuhat |
TRUE or FALSE |
PEVuhat |
TRUE or FALSE |
test |
TRUE or FALSE |
Vu |
Estimate of |
Ve |
Estimate of |
betahat |
BLUEs for |
uhat |
BLUPs for |
weights |
Estimates of kernel weights |
Xsqtestbeta |
A |
pvalbeta |
pvalues obtained from large sample theory for the fixed effects. We report the pvalues adjusted by the "padjust" function for all fixed effect coefficients. |
Xsqtestu |
A |
pvalu |
pvalues obtained from large sample theory for the BLUPs. We report the pvalues adjusted by the "padjust" function. |
varuhat |
Large sample variance for the BLUPs. |
varbetahat |
Large sample variance for the |
PEVuhat |
Prediction error variance estimates for the BLUPs. |
loglik |
loglikelihood for the model. |
####example #Data from Gaussian process with three #(total four, including residuals) independent #sources of variation n=80 M1<-matrix(rnorm(n*10), nrow=n) M2<-matrix(rnorm(n*20), nrow=n) M3<-matrix(rnorm(n*5), nrow=n) #Relationship matrices K1<-cov(t(M1)) K2<-cov(t(M2)) K3<-cov(t(M3)) K1=K1/mean(diag(K1)) K2=K2/mean(diag(K2)) K3=K3/mean(diag(K3)) #Generate data covY<-2*(.2*K1+.7*K2+.1*K3)+diag(n) Y<-10+crossprod(chol(covY),rnorm(n)) #training set Trainsamp<-sample(1:80, 60) funout<-emmremlMultiKernel(y=Y[Trainsamp], X=matrix(rep(1, n)[Trainsamp], ncol=1), Zlist=list(diag(n)[Trainsamp,], diag(n)[Trainsamp,], diag(n)[Trainsamp,]), Klist=list(K1,K2, K3), varbetahat=FALSE,varuhat=FALSE, PEVuhat=FALSE, test=FALSE) #weights funout$weights #Correlation of predictions with true values in test set uhatmat<-matrix(funout$uhat, ncol=3) uhatvec<-rowSums(uhatmat) cor(Y[-Trainsamp], uhatvec[-Trainsamp])
####example #Data from Gaussian process with three #(total four, including residuals) independent #sources of variation n=80 M1<-matrix(rnorm(n*10), nrow=n) M2<-matrix(rnorm(n*20), nrow=n) M3<-matrix(rnorm(n*5), nrow=n) #Relationship matrices K1<-cov(t(M1)) K2<-cov(t(M2)) K3<-cov(t(M3)) K1=K1/mean(diag(K1)) K2=K2/mean(diag(K2)) K3=K3/mean(diag(K3)) #Generate data covY<-2*(.2*K1+.7*K2+.1*K3)+diag(n) Y<-10+crossprod(chol(covY),rnorm(n)) #training set Trainsamp<-sample(1:80, 60) funout<-emmremlMultiKernel(y=Y[Trainsamp], X=matrix(rep(1, n)[Trainsamp], ncol=1), Zlist=list(diag(n)[Trainsamp,], diag(n)[Trainsamp,], diag(n)[Trainsamp,]), Klist=list(K1,K2, K3), varbetahat=FALSE,varuhat=FALSE, PEVuhat=FALSE, test=FALSE) #weights funout$weights #Correlation of predictions with true values in test set uhatmat<-matrix(funout$uhat, ncol=3) uhatvec<-rowSums(uhatmat) cor(Y[-Trainsamp], uhatvec[-Trainsamp])
This function estimates the parameters of the model
where is the
matrix of response variable,
is a
known design matrix of fixed effects,
is a
known design matrix of random effects,
is
matrix of fixed effects coefficients and
and
are independent matrix variate variables with
and
correspondingly. It also produces the BLUPs for the random effects G and some other statistics.
emmremlMultivariate(Y, X, Z, K,varBhat=FALSE,varGhat=FALSE, PEVGhat=FALSE, test=FALSE,tolpar=1e-06, tolparinv=1e-06)
emmremlMultivariate(Y, X, Z, K,varBhat=FALSE,varGhat=FALSE, PEVGhat=FALSE, test=FALSE,tolpar=1e-06, tolparinv=1e-06)
Y |
|
X |
|
Z |
|
K |
|
varBhat |
TRUE or FALSE |
varGhat |
TRUE or FALSE |
PEVGhat |
TRUE or FALSE |
test |
TRUE or FALSE |
tolpar |
tolerance parameter for convergence |
tolparinv |
tolerance parameter for matrix inverse |
Vg |
Estimate of |
Ve |
Estimate of |
Bhat |
BLUEs for |
Gpred |
BLUPs for |
XsqtestB |
|
pvalB |
pvalues obtained from large sample theory for the fixed effects. We report the pvalues adjusted by the "padjust" function for all fixed effect coefficients. |
XsqtestG |
|
pvalG |
pvalues obtained from large sample theory for the BLUPs. We report the pvalues adjusted by the "padjust" function. |
varGhat |
Large sample variance for BLUPs. |
varBhat |
Large sample variance for the elements of B. |
PEVGhat |
Prediction error variance estimates for the BLUPs. |
l=20 n<-15 m<-40 M<-matrix(rbinom(m*l,2,.2),nrow=l) rownames(M)<-paste("l",1:nrow(M)) beta1<-rnorm(m)*exp(rbinom(m,5,.2)) beta2<-rnorm(m)*exp(rbinom(m,5,.1)) beta3<- rnorm(m)*exp(rbinom(m,5,.1))+beta2 g1<-M%*%beta1 g2<-M%*%beta2 g3<-M%*%beta3 e1<-sd(g1)*rnorm(l) e2<-(-e1*2*sd(g2)/sd(g1)+.25*sd(g2)/sd(g1)*rnorm(l)) e3<-1*(e1*.25*sd(g2)/sd(g1)+.25*sd(g2)/sd(g1)*rnorm(l)) y1<-10+g1+e1 y2<--50+g2+e2 y3<--5+g3+e3 Y<-rbind(t(y1),t(y2), t(y3)) colnames(Y)<-rownames(M) cov(t(Y)) Y[1:3,1:5] K<-cov(t(M)) K<-K/mean(diag(K)) rownames(K)<-colnames(K)<-rownames(M) X<-matrix(1,nrow=1,ncol=l) colnames(X)<-rownames(M) Z<-diag(l) rownames(Z)<-colnames(Z)<-rownames(M) SampleTrain<-sample(rownames(Z),n) Ztrain<-Z[rownames(Z)%in%SampleTrain,] Ztest<-Z[!(rownames(Z)%in%SampleTrain),] ##For a quick answer, tolpar is set to 1e-4. Correct this in practice. outfunc<-emmremlMultivariate(Y=Y%*%t(Ztrain), X=X%*%t(Ztrain), Z=t(Ztrain), K=K,tolpar=1e-4,varBhat=FALSE, varGhat=FALSE, PEVGhat=FALSE, test=FALSE) Yhattest<-outfunc$Gpred%*%t(Ztest) cor(cbind(Ztest%*%Y[1,],Ztest%*%outfunc$Gpred[1,], Ztest%*%Y[2,],Ztest%*%outfunc$Gpred[2,],Ztest%*%Y[3,],Ztest%*%outfunc$Gpred[3,])) outfuncRidgeReg<-emmremlMultivariate(Y=Y%*%t(Ztrain),X=X%*%t(Ztrain), Z=t(Ztrain%*%M), K=diag(m),tolpar=1e-5,varBhat=FALSE,varGhat=FALSE, PEVGhat=FALSE, test=FALSE) Gpred2<-outfuncRidgeReg$Gpred%*%t(M) cor(Ztest%*%Y[1,],Ztest%*%Gpred2[1,]) cor(Ztest%*%Y[2,],Ztest%*%Gpred2[2,]) cor(Ztest%*%Y[3,],Ztest%*%Gpred2[3,])
l=20 n<-15 m<-40 M<-matrix(rbinom(m*l,2,.2),nrow=l) rownames(M)<-paste("l",1:nrow(M)) beta1<-rnorm(m)*exp(rbinom(m,5,.2)) beta2<-rnorm(m)*exp(rbinom(m,5,.1)) beta3<- rnorm(m)*exp(rbinom(m,5,.1))+beta2 g1<-M%*%beta1 g2<-M%*%beta2 g3<-M%*%beta3 e1<-sd(g1)*rnorm(l) e2<-(-e1*2*sd(g2)/sd(g1)+.25*sd(g2)/sd(g1)*rnorm(l)) e3<-1*(e1*.25*sd(g2)/sd(g1)+.25*sd(g2)/sd(g1)*rnorm(l)) y1<-10+g1+e1 y2<--50+g2+e2 y3<--5+g3+e3 Y<-rbind(t(y1),t(y2), t(y3)) colnames(Y)<-rownames(M) cov(t(Y)) Y[1:3,1:5] K<-cov(t(M)) K<-K/mean(diag(K)) rownames(K)<-colnames(K)<-rownames(M) X<-matrix(1,nrow=1,ncol=l) colnames(X)<-rownames(M) Z<-diag(l) rownames(Z)<-colnames(Z)<-rownames(M) SampleTrain<-sample(rownames(Z),n) Ztrain<-Z[rownames(Z)%in%SampleTrain,] Ztest<-Z[!(rownames(Z)%in%SampleTrain),] ##For a quick answer, tolpar is set to 1e-4. Correct this in practice. outfunc<-emmremlMultivariate(Y=Y%*%t(Ztrain), X=X%*%t(Ztrain), Z=t(Ztrain), K=K,tolpar=1e-4,varBhat=FALSE, varGhat=FALSE, PEVGhat=FALSE, test=FALSE) Yhattest<-outfunc$Gpred%*%t(Ztest) cor(cbind(Ztest%*%Y[1,],Ztest%*%outfunc$Gpred[1,], Ztest%*%Y[2,],Ztest%*%outfunc$Gpred[2,],Ztest%*%Y[3,],Ztest%*%outfunc$Gpred[3,])) outfuncRidgeReg<-emmremlMultivariate(Y=Y%*%t(Ztrain),X=X%*%t(Ztrain), Z=t(Ztrain%*%M), K=diag(m),tolpar=1e-5,varBhat=FALSE,varGhat=FALSE, PEVGhat=FALSE, test=FALSE) Gpred2<-outfuncRidgeReg$Gpred%*%t(M) cor(Ztest%*%Y[1,],Ztest%*%Gpred2[1,]) cor(Ztest%*%Y[2,],Ztest%*%Gpred2[2,]) cor(Ztest%*%Y[3,],Ztest%*%Gpred2[3,])