Gsoc2017 Speed optimizations for iregnet

For The Easy Test

Test environment: OS X Yosemite 10.10.5
Main R package used: microbenchmark, glmnet, ElemStatLearn

The code in test procedure:

library(ElemStatLearn)
library(microbenchmark)
library(iregnet)
library(glmnet)

x.model = as.matrix(scale(prostate[,1:8]))

res.for.lasso <- microbenchmark(res.for.iregnet<-iregnet(x.model, matrix(prostate$lpsa, 97, 2), alpha=1), res.for.glmnet<-glmnet(x.model, prostate$lpsa), times=1000L)

plot(res.for.lasso)
plot(res.for.glmnet)
plot(res.for.iregnet)

Test Result:

res.for.lasso

res.for.glmnet

res.for.iregnet

Summary

According to the test result, two functions return a approximate result but function glmnet was almost Seven times(not precise) faster than function iregnet in handling a lasso problem with no censored data(using prostate cancer data set).


For The Medium Test

Test environment: OS X Yosemite 10.10.5
Main R package used: Rperform

The code in test procedure:

setwd("~/Desktop/iregnet-Rperform/")
library(Rperform)

plot_branchmetrics(test_path = "tests/testthat/test_elemStatsLearn.r", metric = "time", branch1 = "optimize", branch2 = "master", save_data = F, save_plots = F)

plot_branchmetrics(test_path = "tests/testthat/get_xy.r", metric = "time", branch1 = "optimize", branch2 = "master", save_data = F, save_plots = F)

plot_branchmetrics(test_path = "tests/testthat/test_log_dist.r", metric = "time", branch1 = "optimize", branch2 = "master", save_data = F, save_plots = F)

plot_branchmetrics(test_path = "tests/testthat/test_penaltyLearning.r", metric = "time", branch1 = "optimize", branch2 = "master", save_data = F, save_plots = F)

plot_branchmetrics(test_path = "tests/testthat/test_predict.r", metric = "time", branch1 = "optimize", branch2 = "master", save_data = F, save_plots = F)

plot_branchmetrics(test_path = "tests/testthat/test_survival_glmnet.r", metric = "time", branch1 = "optimize", branch2 = "master", save_data = F, save_plots = F)

plot_branchmetrics(test_path = "tests/testthat/test_validation.r", metric = "time", branch1 = "optimize", branch2 = "master", save_data = F, save_plots = F)

plot_branchmetrics(test_path = "tests/testthat/test_zeros.r", metric = "time", branch1 = "optimize", branch2 = "master", save_data = F, save_plots = F)

Test Result:

get_xy.r

test_elemStatsLearn.r

test_log_dist.r

test_penaltyLearning.r

test_predict.r

test_survival_glmnet.r

test_validation.r

test_zeros.r

Summary

In this test, the results from Rperform::plot_branchmetrics which compare timings for the commits on the optimize branch against the master branch show a good performance of optimize branch.

Optimizations in optimize branch result in speed improvements among almost the whole test above and compare to the master branch we can find that some vital part of c++ code in iregnet_fit.cpp was optimized.The Hard Test below will mention the whole speed analysis of the iregnet R package.


For The Hard Test

Test environment: OS X Yosemite 10.10.5
Profiler Tool: Google-gperftools
Code Profiler: CPU profiler (1000 interrupt / second)
Analysis Tools: pprof

Test Data:

Group Number of variables(row) Number of objects(col)
The 10 10000
Group 10 100000
One 10 1000000
--- --- ---
The 10 1000
Group 100 1000
Two 1000 1000

The code in test procedure:

#Package iregnet version 2016.11.02 - branch master
library(iregnet) 

#Prepare X&y (variables and objects is the test data)
set.seed(10)
X <- matrix(rnorm(variables*objects), objects, variables)
y <- matrix(rnorm(objects*2), objects, 2)
y <- t(apply(y, 1, sort))

#Calling the target function
#Use num_lambda=100 and family=gaussian as default parameter
iregnet(X, y, alpha = 0.5)

Profiler Result:

Group One:

Outputs Top entries in text form

Outputs Graph images

Group Two:

Outputs Top entries in text form

Outputs Graph images

Conclusion of the Profiler Result

According to the text&image we can find that

  1. Most of time separate in iregnet.so and libsystem_m.dylib node.
  2. When more variables used, time spending in iregnet.so during the whole procedure apparently increase.
  3. Adding more objects, the time of iregnet.so spent still close to half(even more than half).

The iregnet.so file is linked to the iregnet.o file which is compiled by iregnet_fit.cpp file.
The libsystem_m.dylib is a dynamic library containing most of mathematics operation in OS X.
Obviously most of the time spend in iregnet R package is the iregnet_fit.cpp file and for more specific information we need to analyze in this c++ file deeply.

Profiling the c++ code

In this way, I set up the code to time the concrete part in iregnet_fit.cpp file.I have been created a new branch called profiler with timing code in iregnet_fit.cpp in my iregnet repository forked from anujkhare/iregnet.

I separate the code of iregnet_fit.cpp into several main parts and use the clock() function to record the them.
In the end of the procedure, the code will calculate the time spend in the percentage of all the parts and print the result.

Then build&install this local package and run two test Rscript(test_elemStatsLearn.r & test_log_dist.r) to analyze the result.

Profiling Result:

Timing in test_elemStatsLearn.r (four times)

Timing in test_log_dist.r (four distribution)
gaussian

loggaussian

logistic

loglogistic

Conclusion of the Analysis Result

As the images show, we can learn that in the main process most of the time(Nearly 100%) spend in lambad loop, so

I add the code to profiling the lambda loop and the this time I find that several parts are the mainly process which spend much time during the whole procedure.

Below are the parts of code taking the most time in the iregnet R package:
(sort by desc,according to mainly result,not absolute)

Process 2
/* calling the function defined in distributions.cpp */
loglik = compute_grad_response(w, z, &scale_update, REAL(y), REAL(y) + n_obs, eta, scale, status, n_obs, transformed_dist, NULL, debug==1 && m == 0);
Process 3
sol_num = sol_denom = 0;
for (ull i = 0; i < n_obs; ++i) {
  eta[i] = eta[i] - X(i, k) * beta[k]; 
  sol_num += (w[i] * X(i, k) * (z[i] - eta[i])) / n_obs;
  sol_denom += (w[i] * X(i, k) * X(i, k)) / n_obs;
}
sol_num *= -1; sol_denom *= -1;
Process 6
for (ull i = 0; i < n_obs; ++i) {
  eta[i] = eta[i] + X(i, k) * beta[k];
}
Process 4
if (intercept && k == 0) {
  beta_new = sol_num / sol_denom;
} else {
  beta_new = soft_threshold(sol_num, lambda_seq[m] * alpha) /
             (sol_denom + lambda_seq[m] * (1 - alpha));
}
Process 5
double abs_change = fabs(beta_new - beta[k]);
if (abs_change > threshold) {
  if(debug==1 && max_iter==n_iters[m])printf("iter=%d lambda=%d beta_%lld not converged, abs_change=%f > %f=threshold\n", n_iters[m], m, k, abs_change, threshold);
  flag_beta_converged = 0;
  beta[k] = beta_new;
}

Summary

In this test,I firstly use the Google-gperftools(CPU profiler) to find out which mainly process taking much time in the iregnet R package,and then I set up clock() function in separated part of iregnet_fit.cpp to figure out the specific C++ code that taking the most time.

In order to speed optimizations for iregnet,the part of c++ code above in iregnet_fit.cpp need to be focused and also the function defined in distributions.cpp is equally important.

2017-01-26 23:56 271
Comments
Write a Comment