The str
function is perhaps the most useful function in R. It provides great information about the structure of some object. When I teach R, especially for those coming from SPSS, the str
function for data frames provides the information they are use to seeing on the variable view tab. However, sometimes I want to display the information str
returns in a better format (e.g. as an HTML or LaTeX table). I wrote a function, strtable
that provides the information str.data.frame
does but returns the results as a data.frame
. This provides much more flexibility for controlling how the output is formatted. Specifically, it will return a data.frame
with four columns: variable
, class
, levels
, and examples
.
The function can be sourced from Gist using the devtools
package.
devtools::source_gist('4a0a5ab9fe7e1cf3be0e')
For the first example, we’ll use the iris
data frame.
data(iris)
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
The strtable
has five parameters:
n
the first n element to showwidth
maximum width in characters for the examples to shown.levels
the first n levels of a factor to show.width.levels
maximum width in characters for the number of levels to show.factor.values
function defining how factor examples should be printed. Possible values areas.character
oras.integer
.
print(strtable(iris), na.print='')
## variable class levels
## Sepal.Length numeric
## Sepal.Width numeric
## Petal.Length numeric
## Petal.Width numeric
## Species Factor w/ 3 levels "setosa", "versicolor", "virginica"
## examples
## 5.1, 4.9, 4.7, 4.6, ...
## 3.5, 3, 3.2, 3.1, ...
## 1.4, 1.4, 1.3, 1.5, ...
## 0.2, 0.2, 0.2, 0.2, ...
## "setosa", "setosa", "setosa", "setosa", ...
print(strtable(iris, factor.values=as.integer), na.print='')
## variable class levels
## Sepal.Length numeric
## Sepal.Width numeric
## Petal.Length numeric
## Petal.Width numeric
## Species Factor w/ 3 levels "setosa", "versicolor", "virginica"
## examples
## 5.1, 4.9, 4.7, 4.6, ...
## 3.5, 3, 3.2, 3.1, ...
## 1.4, 1.4, 1.3, 1.5, ...
## 0.2, 0.2, 0.2, 0.2, ...
## 1, 1, 1, 1, ...
Here’s a second example using the diamonds
data from the ggplot2
package.
data(diamonds)
str(diamonds)
## 'data.frame': 53940 obs. of 10 variables:
## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
print(strtable(diamonds), na.print='')
## variable class levels
## carat numeric
## cut Factor w/ 5 levels "Fair", "Good", "Very Good", "Premium", ...
## color Factor w/ 7 levels "D", "E", "F", "G", ...
## clarity Factor w/ 8 levels "I1", "SI2", "SI1", "VS2", ...
## depth numeric
## table numeric
## price integer
## x numeric
## y numeric
## z numeric
## examples
## 0.23, 0.21, 0.23, 0.29, ...
## "Ideal", "Premium", "Good", "Premium", ...
## "E", "E", "E", "I", ...
## "SI2", "SI1", "VS1", "VS2", ...
## 61.5, 59.8, 56.9, 62.4, ...
## 55, 61, 65, 58, ...
## 326, 326, 327, 334, ...
## 3.95, 3.89, 4.05, 4.2, ...
## 3.98, 3.84, 4.07, 4.23, ...
## 2.43, 2.31, 2.31, 2.63, ...
print(strtable(diamonds, factor.values=as.integer), na.print='')
## variable class levels
## carat numeric
## cut Factor w/ 5 levels "Fair", "Good", "Very Good", "Premium", ...
## color Factor w/ 7 levels "D", "E", "F", "G", ...
## clarity Factor w/ 8 levels "I1", "SI2", "SI1", "VS2", ...
## depth numeric
## table numeric
## price integer
## x numeric
## y numeric
## z numeric
## examples
## 0.23, 0.21, 0.23, 0.29, ...
## 5, 4, 2, 4, ...
## 2, 2, 2, 6, ...
## 2, 3, 5, 4, ...
## 61.5, 59.8, 56.9, 62.4, ...
## 55, 61, 65, 58, ...
## 326, 326, 327, 334, ...
## 3.95, 3.89, 4.05, 4.2, ...
## 3.98, 3.84, 4.07, 4.23, ...
## 2.43, 2.31, 2.31, 2.63, ...
Here’s the source code from Gist: