iregnet - Benchmark

Part one : Performance between new/old iregnet in random dateset (Test 1 - Test 16)

Test Process

In these tests, we will test compare the new iregnet against the old version. The vital factor contains censoring type and distribution. The four parts below are different censoring type. The benchmark tool we used is microbenchmark. Gaussian is the main distribution in these four part. There is four steps in this part of the test.
Step 1, set the largest number of row & col, use get_xy() function from iregnet to get X & y for our test.
Step 2, change the length of row or col, it depend on the specific test
Step 3, set the distribution as gaussian and chose one kind of censoring types, use microbenchmark to test new/old iregnet at the same time,
Step 4, get the mean/min/max data from the result of microbenchmark, add them to the data.table. jump to step 2 until the X reaching target length.
Step 5, use ggplot to print the result data.table and save the image.
Here are all the codes during every tests.
Every tests below will list the censoring types, distribution and which one's length is auto increase (row or col).

Test Result

Test 1

Distribution Censoring type Length auto increase
Gaussian None Row

Note: This test contains glmnet package and we specially increase the length of row to one million.

Test 2

Distribution Censoring type Length auto increase
Gaussian Right Row

Test 3

Distribution Censoring type Length auto increase
Gaussian Left Row

Test 4

Distribution Censoring type Length auto increase
Gaussian Interval Row

Test 5

Distribution Censoring type Length auto increase
Logistic None Row

Test 6

Distribution Censoring type Length auto increase
Logistic Right Row

Test 7

Distribution Censoring type Length auto increase
Logistic Left Row

Test 8

Distribution Censoring type Length auto increase
Logistic Interval Row

Test 9

Distribution Censoring type Length auto increase
Gaussian None Col

Test 10

Distribution Censoring type Length auto increase
Gaussian Right Col

Test 11

Distribution Censoring type Length auto increase
Gaussian Left Col

Test 12

Distribution Censoring type Length auto increase
Gaussian Interval Col

Test 13

Distribution Censoring type Length auto increase
Logistic None Col

Test 14

Distribution Censoring type Length auto increase
Logistic Right Col

Test 15

Distribution Censoring type Length auto increase
Logistic Left Col

Test 16

Distribution Censoring type Length auto increase
Logistic Interval Col

Part Two : Performance between new/old iregnet in specific data set (Test 17 - Test 19)

Test Process

In these tests, we will test compare the new iregnet against the old version. However, this time we will test in real data set. Such as Prostate, Ovarian and iregnet::penalty.learning.
Also, test codes, distributions and the data set will be listed below in every test.

Test Result

Test 17

Distribution : Gaussian
Data Set : Prostate
Censoring Type : None censoring type
Test code :

    library(oldiregnet)
    library(iregnet)
    library(microbenchmark)
    library(glmnet)
    library(testthat)
    alpha <- 0.6

    data(prostate,package="ElemStatLearn")
    pros <- subset(prostate,select=-train,train==TRUE)
    ycol <- which(names(pros)=="lpsa")
    X.unscaled <- as.matrix(pros[-ycol])
    y.unscaled <- pros[[ycol]]
    M <- matrix(
      colMeans(X.unscaled), nrow(X.unscaled), ncol(X.unscaled), byrow=TRUE)
    X.centered <- X.unscaled - M
    sd.vec <- apply(X.unscaled, 2, sd)
    S <- diag(1/sd.vec)
    X.scaled <- X.centered %*% S
    dimnames(X.scaled) <- dimnames(X.unscaled)
    m <- mean(y.unscaled)
    sigma <- sd(y.unscaled)
    y.scaled <- (y.unscaled - m)/sigma

    X <- X.scaled
    y <- y.scaled

    fit_s <- survreg(Surv(y, rep(1, length(y))) ~ X, dist = "gaussian")
    fit_i <- iregnet(X, cbind(y, y), "gaussian", maxiter=1e5, thresh=1e-7, standardize=F, alpha=alpha, scale=1, estimate_scale=F)

    lambda_path <- fit_i$lambda * (fit_i$scale ** 2)

    fit_g <- glmnet(X, y, "gaussian", lambda = lambda_path, standardize=F, maxit=1e5, thresh=1e-7, alpha=alpha)

    expect_equal(as.double(fit_s$coefficients), as.double(fit_i$beta[, fit_i$num_lambda]), tolerance = 1e-3)
    expect_equal(as.double(fit_i$beta), as.double(coef(fit_g)), tolerance=1e-3)

    time.result <- microbenchmark(iregnet(X, cbind(y, y), "gaussian", maxiter=1e5, thresh=1e-7, standardize=F, alpha=alpha, scale=1, estimate_scale=F),
                                  oldiregnet(X, cbind(y, y), "gaussian", maxiter=1e5, thresh=1e-7, standardize=F, alpha=alpha, scale=1, estimate_scale=F),
                                  glmnet(X, y, "gaussian", lambda = lambda_path, standardize=F, maxit=1e5, thresh=1e-7, alpha=alpha),
                                  times=1000L)
    time.result

Test Result :
Unit: milliseconds

Function min lq mean median uq max neval
iregnet 4.52781 4.658523 4.865495 4.837288 4.890895 8.955306 1000
old-iregnet 11.44788 11.493286 11.763516 11.543399 11.697040 16.923785 1000
glmnet 1.642537 1.767328 1.984859 1.878334 2.080702 6.842187 1000

Test 18

Distribution : Gaussian
Data Set : Ovarian
Censoring Type : Right censoring type
Test code :

    library(oldiregnet)
    library(iregnet)
    library(microbenchmark)
    library(testthat)
    data("ovarian")
    x <- cbind(ovarian$ecog.ps, ovarian$rx)

    fit_s <- survreg(Surv(futime, fustat) ~ x, data = ovarian, dist = "gaussian")
    fit_i <- iregnet(x, Surv(ovarian$futime, ovarian$fustat), family="gaussian", alpha=1, intercept = T, threshold=1e-4)

    expect_equal(fit_s$coefficients,
                 fit_i$beta[, fit_i$num_lambda], tolerance = 1e-3)

    time.result <- microbenchmark(iregnet(x, Surv(ovarian$futime, ovarian$fustat), family="gaussian", alpha=1, intercept = T, threshold=1e-4),
                                  oldiregnet(x, Surv(ovarian$futime, ovarian$fustat), family="gaussian", alpha=1, intercept = T, threshold=1e-4),
                                  times=1000L)
    time.result

Test Result :
Unit: milliseconds

Function min lq mean median uq max neval
iregnet 2.836270 2.875178 2.956248 2.890251 2.960390 7.504668 1000
old-iregnet 2.987433 3.009679 3.092105 3.026380 3.086982 6.876124 1000

Test 19

Distribution : Gaussian
Data Set : penalty.learning
Censoring Type : Interval censoring type
Test code :

    library(oldiregnet)
    library(iregnet)
    library(microbenchmark)
    library(testthat)
    data(penalty.learning)

    chrom.vec <- sub(":.*", "", rownames(penalty.learning$X.mat))
    table(chrom.vec)
    train.chroms <- c("chr1", "chr9")
    sets <-
      list(train=chrom.vec %in% train.chroms,
           validation=! chrom.vec %in% train.chroms)
    X.train <- penalty.learning$X.mat[sets$train,]
    y.train <- penalty.learning$y.mat[sets$train,]
    fit <- iregnet(
      X.train, y.train,
      unreg_sol=FALSE,
      standardize=TRUE,
      debug=1,
      maxiter=1e5)

    time.result <- microbenchmark(iregnet(X.train, y.train, unreg_sol=FALSE, standardize=TRUE, debug=1, maxiter=1e5),
                                  oldiregnet(X.train, y.train, unreg_sol=FALSE, standardize=TRUE, debug=1, maxiter=1e5),
                                  times=10L)

    time.result

Test Result :
Unit: seconds

Function min lq mean median uq max neval
iregnet 6.536424 6.551682 6.561503 6.566411 6.573030 6.578601 10
old-iregnet 8.626131 8.660122 9.365527 8.662445 8.667893 15.649150 10

Summary of the Benchmark

From the benchmark figures and data, we can find that new iregnet is always better than previous version in performance across every test. The speed of iregnet is actually improved. Also the new iregnet get closer to glmnet in none censoring type test with gaussian distribution. We can draw a conclusion that iregnet get better performances with speed improvements after the optimization.

2017-07-29 18:55 161
Comments
Write a Comment