When I went to school we were always taught the “i before e, except after c” rule for spelling. But how accurate is this rule? Kevin Marks tweeted today the following:
»@uberfacts: There are 923 words in the English language that break the “I before E” rule. Only 44 words actually follow that rule.« Science
— Kevin Marks (@kevinmarks) March 25, 2013 Not sure where he came up with that result, but seems simple enough to verify.
There are many situations in R where you have a list of vectors that you need to convert to a data.frame. This question has been addressed over at StackOverflow and it turns out there are many different approaches to completing this task. Since I encounter this situation relatively frequently, I wanted my own S3 method for as.data.frame that takes a list as its parameter. I should note that it only works with atomic vectors (i.
I posted a question over on StackOverflow on an efficient way of comparing two data frames with the same column structure, but with different rows. What I would like to end up with is an n x m logical matrix where n and m are the number of rows in the first and second data frames, respectively; and the value at the *i*th row and *j*th column indicates whether all the values from row i from data frame one is equal to row j from data frame two.
Version 1.0 of sqlutils has been released to CRAN. The sqlutils package is designed to manage a library of SQL files. This package grew out of the needs of an Office of Institutional Research where the vast majority of analysis is conducted on data from our Student Information System (SIS) which is stored in an Oracle database. A lot of our analyses and reports are derived from the same types of datasets but from easily extracted parameters (e.
I recently taught a very basic introduction to SQL workshop and needed a way to have participants interact with SQL statements. Obviously there are lots of tools to interface with a database, but since we are all R users I thought it would be nice to be able interact without leaving R. Although this interface is fairly basic, the fact that we can type in a SQL statement and get the results as an R data frame provides all the advantages of having data in R.
One issue I continuously encounter when starting to work with a new dataset is that of the codebook. In general, I prefer to load a codebook into R like any other data source, specifically as a data frame. And ideally, one data frame to provides the variable names with descriptions and any other meta data available, and a separate list of named vectors that can be used to recode factors.
The xtable package is tremendously useful for generating LaTeX tables from data frames. It is also pretty easy to customize the output to handle some special cases of LaTeX formatting. The xtable.decimal function will create a LaTeX table where numeric columns will be vertically aligned on the decimal point. In addition to specifying the LaTeX alignment code it will also create appropriate column titles so that the column name spans the two resulting columns.
Introduction Markdown has been growing in popularity for writing documents on the web. With the introduction of R Markdown (see also Jeromy Anglim’s post on getting started with R Markdown) and knitr, R Markdown has simplified the publishing of R analysis on the web. I recently converted my website from Wordpress to Jekyll. Jekyll is a “static site generator” and is the framework used by GitHub Pages. You can view the complete source for this website on Github at https://github.
My wife went out to her book group tonight and their book of the month was 50 Shades of Grey. Sadly, I could think of is that plotting 50 shades in R would be a neat exercise.
require(ggplot2) grey50 <- data.frame( x = rep(1:10, 5), y = rep(1:5, each=10), c = unlist(lapply(seq(10,255,5), FUN=function(x) { rgb(x,x,x, max=255) })), t = unlist(lapply(seq(10,255,5), FUN=function(x) { ifelse(x > 255/2, 'black', 'white') })) ) ggplot(grey50, aes(x=x, y=y, fill=c, label=c, color=t)) + geom_tile() + geom_text(size=4) + scale_fill_identity() + scale_color_identity() + ylab(NULL) + xlab(NULL) + theme(axis.
We all know that the odds of flipping an unbiased coin is 50% heads, 50% tails. But what happens if you do this a lot of times. Do you expect the same number of heads and tails? What if we took a cumulative sum where heads = +1 and tails = -1. What would that sum be? Here is a function that will do this n times and plot it.