Performance of LASSO when one or more covariate(s) is/are Missing Not at Random(MNAR)

About

This is my M.Sc. final year project.

I did this project under the supervision of my mentor Dr. Sumanta Adhya, WBSU.

In this project, I have tried to see that, how LASSO will perform the variable selection tasks under the multicollinearity situation when the data is affected by the missing values where the missingness is not at random. I have investigated different LASSO solutions from simulated data sets and trying to find a method that will benefit us in this situation. In this project, I have proposed a new methodology, “Inverse Probability Weighted Logistic Lasso Estimation” which gives a better solution than complete case analysis under the MNAR mechanism.

here I have compared a total of five Lasso solution techniques, that is, “LASSO on Original Data set(when all known)”, “LASSO on Complete Data set(removing all missing observations)”, “IPW-LASSO on Complete Data set using known(actual) missing probabilities”, “IPW-LASSO on Complete Data set using estimated(MLE) missing probabilities”, and “IPW-LASSO on Complete Data set using estimated(Logistic LASSO) missing probabilities”. And, have shown that “IPW-LASSO on Complete Data set using estimated(Logistic LASSO) missing probabilities” is the better solution than, simple complete case analysis; when the missing mechanism is MNAR.

Keywords : MNAR, Logistic Regression, LASSO, IPW, IPW-LASSO.

Click the Slide button above to see the project presentation.

Click the Report button above to see the project document.

Click the github button above to see the R code.

Rajesh Majumder
Rajesh Majumder
PhD Student, Statistician, Research Assistant