There are several graphics available for visualizing missing data including the VIM
package. However, I wanted a plot specifically for looking at the nature of missingness across variables and a clustering variable of interest to support data preparation in multilevel propensity score models (see the multilevelPSA
package). The following examples uses data from the Programme of International Student Assessment (PISA; see pisa
package).
The required packages can be downloaded from github. Note that the pisa
package is approximately 75mb.
> require(devtools)
> install_github('multilevelPSA', 'jbryer')
> install_github('pisa', 'jbryer')
The following will setup the data to be plotted. There is a pisa.setup.R
script included in the multilevelPSA
package that is included to assist with a demo there. Among many things, it creates a vector psa.cols
that defines the variables of interest in performing a propensity score analysis. These are the variables where missingness needs to be addressed.
> require(multilevelPSA)
> require(pisa)
> data(pisa.student)
> pkgdir = system.file(package='multilevelPSA')
> source(paste(pkgdir, '/pisa/pisa.setup.R', sep=''))
> student = pisa.student[,psa.cols]
> student$CNT = as.character(student$CNT)
And finally, to create the graphic use the plot.missing
command.
> plot.missing(student[,c(4:48)], student$CNT)