R

Converting a list to a data frame

There are many situations in R where you have a list of vectors that you need to convert to a data.frame. This question has been addressed over at StackOverflow and it turns out there are many different approaches to completing this task. Since I encounter this situation relatively frequently, I wanted my own S3 method for as.data.frame that takes a list as its parameter. I should note that it only works with atomic vectors (i.

Comparing two data frames with different number of rows

I posted a question over on StackOverflow on an efficient way of comparing two data frames with the same column structure, but with different rows. What I would like to end up with is an n x m logical matrix where n and m are the number of rows in the first and second data frames, respectively; and the value at the *i*th row and *j*th column indicates whether all the values from row i from data frame one is equal to row j from data frame two.

Version 1.0 of sqlutils available on CRAN

Version 1.0 of sqlutils has been released to CRAN. The sqlutils package is designed to manage a library of SQL files. This package grew out of the needs of an Office of Institutional Research where the vast majority of analysis is conducted on data from our Student Information System (SIS) which is stored in an Oracle database. A lot of our analyses and reports are derived from the same types of datasets but from easily extracted parameters (e.

Interactive SQL in R

I recently taught a very basic introduction to SQL workshop and needed a way to have participants interact with SQL statements. Obviously there are lots of tools to interface with a database, but since we are all R users I thought it would be nice to be able interact without leaving R. Although this interface is fairly basic, the fact that we can type in a SQL statement and get the results as an R data frame provides all the advantages of having data in R.

Reading Codebook Files in R

One issue I continuously encounter when starting to work with a new dataset is that of the codebook. In general, I prefer to load a codebook into R like any other data source, specifically as a data frame. And ideally, one data frame to provides the variable names with descriptions and any other meta data available, and a separate list of named vectors that can be used to recode factors.

Function for Generating LaTeX Tables with Decimal Aligned Numbers

The xtable package is tremendously useful for generating LaTeX tables from data frames. It is also pretty easy to customize the output to handle some special cases of LaTeX formatting. The xtable.decimal function will create a LaTeX table where numeric columns will be vertically aligned on the decimal point. In addition to specifying the LaTeX alignment code it will also create appropriate column titles so that the column name spans the two resulting columns.

Using (R) Markdown, Jekyll, & Github for a Website

Introduction Markdown has been growing in popularity for writing documents on the web. With the introduction of R Markdown (see also Jeromy Anglim’s post on getting started with R Markdown) and knitr, R Markdown has simplified the publishing of R analysis on the web. I recently converted my website from Wordpress to Jekyll. Jekyll is a “static site generator” and is the framework used by GitHub Pages. You can view the complete source for this website on Github at https://github.

Fifty Shades of Grey in R

My wife went out to her book group tonight and their book of the month was 50 Shades of Grey. Sadly, I could think of is that plotting 50 shades in R would be a neat exercise. require(ggplot2) grey50 <- data.frame( x = rep(1:10, 5), y = rep(1:5, each=10), c = unlist(lapply(seq(10,255,5), FUN=function(x) { rgb(x,x,x, max=255) })), t = unlist(lapply(seq(10,255,5), FUN=function(x) { ifelse(x > ²⁵⁵⁄₂, ‘black’, ‘white’) })) ) ggplot(grey50, aes(x=x, y=y, fill=c, label=c, color=t)) + geom_tile() + geom_text(size=4) + scale_fill_identity() + scale_color_identity() + ylab(NULL) + xlab(NULL) + theme(axis.

Fun with coin flips

We all know that the odds of flipping an unbiased coin is 50% heads, 50% tails. But what happens if you do this a lot of times. Do you expect the same number of heads and tails? What if we took a cumulative sum where heads = +1 and tails = -1. What would that sum be? Here is a function that will do this n times and plot it.

Visualizing Missing Data

There are several graphics available for visualizing missing data including the VIM package. However, I wanted a plot specifically for looking at the nature of missingness across variables and a clustering variable of interest to support data preparation in multilevel propensity score models (see the multilevelPSA package). The following examples uses data from the Programme of International Student Assessment (PISA; see pisa package). The required packages can be downloaded from github.