In these tests, we will test compare the new iregnet against the old version. The vital factor contains censoring type and distribution. The four parts below are different censoring type. The benchmark tool we used is microbenchmark. Gaussian is the main distribution in these four part. There is four steps in this part of the test.
Step 1, set the largest number of row & col, use get_xy() function from iregnet to get X & y for our test.
Step 2, change the length of row or col, it depend on the specific test
Step 3, set the distribution as gaussian and chose one kind of censoring types, use microbenchmark to test new/old iregnet at the same time,
Step 4, get the mean/min/max data from the result of microbenchmark, add them to the data.table. jump to step 2 until the X reaching target length.
Step 5, use ggplot to print the result data.table and save the image.
Here are all the codes during every tests.
Every tests below will list the censoring types, distribution and which one's length is auto increase (row or col).
Distribution | Censoring type | Length auto increase |
---|---|---|
Gaussian | None | Row |
Note: This test contains glmnet package and we specially increase the length of row to one million.
Distribution | Censoring type | Length auto increase |
---|---|---|
Gaussian | Right | Row |
Distribution | Censoring type | Length auto increase |
---|---|---|
Gaussian | Left | Row |
Distribution | Censoring type | Length auto increase |
---|---|---|
Gaussian | Interval | Row |
Distribution | Censoring type | Length auto increase |
---|---|---|
Logistic | None | Row |
Distribution | Censoring type | Length auto increase |
---|---|---|
Logistic | Right | Row |
Distribution | Censoring type | Length auto increase |
---|---|---|
Logistic | Left | Row |
Distribution | Censoring type | Length auto increase |
---|---|---|
Logistic | Interval | Row |
Distribution | Censoring type | Length auto increase |
---|---|---|
Gaussian | None | Col |
Distribution | Censoring type | Length auto increase |
---|---|---|
Gaussian | Right | Col |
Distribution | Censoring type | Length auto increase |
---|---|---|
Gaussian | Left | Col |
Distribution | Censoring type | Length auto increase |
---|---|---|
Gaussian | Interval | Col |
Distribution | Censoring type | Length auto increase |
---|---|---|
Logistic | None | Col |
Distribution | Censoring type | Length auto increase |
---|---|---|
Logistic | Right | Col |
Distribution | Censoring type | Length auto increase |
---|---|---|
Logistic | Left | Col |
Distribution | Censoring type | Length auto increase |
---|---|---|
Logistic | Interval | Col |
In these tests, we will test compare the new iregnet against the old version. However, this time we will test in real data set. Such as Prostate, Ovarian and iregnet::penalty.learning.
Also, test codes, distributions and the data set will be listed below in every test.
Distribution : Gaussian
Data Set : Prostate
Censoring Type : None censoring type
Test code :
library(oldiregnet)
library(iregnet)
library(microbenchmark)
library(glmnet)
library(testthat)
alpha <- 0.6
data(prostate,package="ElemStatLearn")
pros <- subset(prostate,select=-train,train==TRUE)
ycol <- which(names(pros)=="lpsa")
X.unscaled <- as.matrix(pros[-ycol])
y.unscaled <- pros[[ycol]]
M <- matrix(
colMeans(X.unscaled), nrow(X.unscaled), ncol(X.unscaled), byrow=TRUE)
X.centered <- X.unscaled - M
sd.vec <- apply(X.unscaled, 2, sd)
S <- diag(1/sd.vec)
X.scaled <- X.centered %*% S
dimnames(X.scaled) <- dimnames(X.unscaled)
m <- mean(y.unscaled)
sigma <- sd(y.unscaled)
y.scaled <- (y.unscaled - m)/sigma
X <- X.scaled
y <- y.scaled
fit_s <- survreg(Surv(y, rep(1, length(y))) ~ X, dist = "gaussian")
fit_i <- iregnet(X, cbind(y, y), "gaussian", maxiter=1e5, thresh=1e-7, standardize=F, alpha=alpha, scale=1, estimate_scale=F)
lambda_path <- fit_i$lambda * (fit_i$scale ** 2)
fit_g <- glmnet(X, y, "gaussian", lambda = lambda_path, standardize=F, maxit=1e5, thresh=1e-7, alpha=alpha)
expect_equal(as.double(fit_s$coefficients), as.double(fit_i$beta[, fit_i$num_lambda]), tolerance = 1e-3)
expect_equal(as.double(fit_i$beta), as.double(coef(fit_g)), tolerance=1e-3)
time.result <- microbenchmark(iregnet(X, cbind(y, y), "gaussian", maxiter=1e5, thresh=1e-7, standardize=F, alpha=alpha, scale=1, estimate_scale=F),
oldiregnet(X, cbind(y, y), "gaussian", maxiter=1e5, thresh=1e-7, standardize=F, alpha=alpha, scale=1, estimate_scale=F),
glmnet(X, y, "gaussian", lambda = lambda_path, standardize=F, maxit=1e5, thresh=1e-7, alpha=alpha),
times=1000L)
time.result
Test Result :
Unit: milliseconds
Function | min | lq | mean | median | uq | max | neval |
---|---|---|---|---|---|---|---|
iregnet | 4.52781 | 4.658523 | 4.865495 | 4.837288 | 4.890895 | 8.955306 | 1000 |
old-iregnet | 11.44788 | 11.493286 | 11.763516 | 11.543399 | 11.697040 | 16.923785 | 1000 |
glmnet | 1.642537 | 1.767328 | 1.984859 | 1.878334 | 2.080702 | 6.842187 | 1000 |
Distribution : Gaussian
Data Set : Ovarian
Censoring Type : Right censoring type
Test code :
library(oldiregnet)
library(iregnet)
library(microbenchmark)
library(testthat)
data("ovarian")
x <- cbind(ovarian$ecog.ps, ovarian$rx)
fit_s <- survreg(Surv(futime, fustat) ~ x, data = ovarian, dist = "gaussian")
fit_i <- iregnet(x, Surv(ovarian$futime, ovarian$fustat), family="gaussian", alpha=1, intercept = T, threshold=1e-4)
expect_equal(fit_s$coefficients,
fit_i$beta[, fit_i$num_lambda], tolerance = 1e-3)
time.result <- microbenchmark(iregnet(x, Surv(ovarian$futime, ovarian$fustat), family="gaussian", alpha=1, intercept = T, threshold=1e-4),
oldiregnet(x, Surv(ovarian$futime, ovarian$fustat), family="gaussian", alpha=1, intercept = T, threshold=1e-4),
times=1000L)
time.result
Test Result :
Unit: milliseconds
Function | min | lq | mean | median | uq | max | neval |
---|---|---|---|---|---|---|---|
iregnet | 2.836270 | 2.875178 | 2.956248 | 2.890251 | 2.960390 | 7.504668 | 1000 |
old-iregnet | 2.987433 | 3.009679 | 3.092105 | 3.026380 | 3.086982 | 6.876124 | 1000 |
Distribution : Gaussian
Data Set : penalty.learning
Censoring Type : Interval censoring type
Test code :
library(oldiregnet)
library(iregnet)
library(microbenchmark)
library(testthat)
data(penalty.learning)
chrom.vec <- sub(":.*", "", rownames(penalty.learning$X.mat))
table(chrom.vec)
train.chroms <- c("chr1", "chr9")
sets <-
list(train=chrom.vec %in% train.chroms,
validation=! chrom.vec %in% train.chroms)
X.train <- penalty.learning$X.mat[sets$train,]
y.train <- penalty.learning$y.mat[sets$train,]
fit <- iregnet(
X.train, y.train,
unreg_sol=FALSE,
standardize=TRUE,
debug=1,
maxiter=1e5)
time.result <- microbenchmark(iregnet(X.train, y.train, unreg_sol=FALSE, standardize=TRUE, debug=1, maxiter=1e5),
oldiregnet(X.train, y.train, unreg_sol=FALSE, standardize=TRUE, debug=1, maxiter=1e5),
times=10L)
time.result
Test Result :
Unit: seconds
Function | min | lq | mean | median | uq | max | neval |
---|---|---|---|---|---|---|---|
iregnet | 6.536424 | 6.551682 | 6.561503 | 6.566411 | 6.573030 | 6.578601 | 10 |
old-iregnet | 8.626131 | 8.660122 | 9.365527 | 8.662445 | 8.667893 | 15.649150 | 10 |
From the benchmark figures and data, we can find that new iregnet is always better than previous version in performance across every test. The speed of iregnet is actually improved. Also the new iregnet get closer to glmnet in none censoring type test with gaussian distribution. We can draw a conclusion that iregnet get better performances with speed improvements after the optimization.