str Implementation for Data Frames | Jason Bryer

str Implementation for Data Frames

The str function is perhaps the most useful function in R. It provides great information about the structure of some object. When I teach R, especially for those coming from SPSS, the str function for data frames provides the information they are use to seeing on the variable view tab. However, sometimes I want to display the information str returns in a better format (e.g. as an HTML or LaTeX table). I wrote a function, strtable that provides the information str.data.frame does but returns the results as a data.frame. This provides much more flexibility for controlling how the output is formatted. Specifically, it will return a data.frame with four columns: variable, class, levels, and examples.

The function can be sourced from Gist using the devtools package.

devtools::source_gist('4a0a5ab9fe7e1cf3be0e')

For the first example, we’ll use the iris data frame.

data(iris)
str(iris)
## 'data.frame':	150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

The strtable has five parameters:

  • n the first n element to show
  • width maximum width in characters for the examples to show
  • n.levels the first n levels of a factor to show.
  • width.levels maximum width in characters for the number of levels to show.
  • factor.values function defining how factor examples should be printed. Possible values are as.character or as.integer.
print(strtable(iris), na.print='')
##      variable              class                              levels
##  Sepal.Length            numeric
##   Sepal.Width            numeric
##  Petal.Length            numeric
##   Petal.Width            numeric
##       Species Factor w/ 3 levels "setosa", "versicolor", "virginica"
##                                     examples
##                      5.1, 4.9, 4.7, 4.6, ...
##                        3.5, 3, 3.2, 3.1, ...
##                      1.4, 1.4, 1.3, 1.5, ...
##                      0.2, 0.2, 0.2, 0.2, ...
##  "setosa", "setosa", "setosa", "setosa", ...
print(strtable(iris, factor.values=as.integer), na.print='')
##      variable              class                              levels
##  Sepal.Length            numeric
##   Sepal.Width            numeric
##  Petal.Length            numeric
##   Petal.Width            numeric
##       Species Factor w/ 3 levels "setosa", "versicolor", "virginica"
##                 examples
##  5.1, 4.9, 4.7, 4.6, ...
##    3.5, 3, 3.2, 3.1, ...
##  1.4, 1.4, 1.3, 1.5, ...
##  0.2, 0.2, 0.2, 0.2, ...
##          1, 1, 1, 1, ...

Here’s a second example using the diamonds data from the ggplot2 package.

data(diamonds)
str(diamonds)
## 'data.frame':	53940 obs. of  10 variables:
##  $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
print(strtable(diamonds), na.print='')
##  variable              class                                      levels
##     carat            numeric
##       cut Factor w/ 5 levels "Fair", "Good", "Very Good", "Premium", ...
##     color Factor w/ 7 levels                     "D", "E", "F", "G", ...
##   clarity Factor w/ 8 levels              "I1", "SI2", "SI1", "VS2", ...
##     depth            numeric
##     table            numeric
##     price            integer
##         x            numeric
##         y            numeric
##         z            numeric
##                                    examples
##                 0.23, 0.21, 0.23, 0.29, ...
##  "Ideal", "Premium", "Good", "Premium", ...
##                     "E", "E", "E", "I", ...
##             "SI2", "SI1", "VS1", "VS2", ...
##                 61.5, 59.8, 56.9, 62.4, ...
##                         55, 61, 65, 58, ...
##                     326, 326, 327, 334, ...
##                  3.95, 3.89, 4.05, 4.2, ...
##                 3.98, 3.84, 4.07, 4.23, ...
##                 2.43, 2.31, 2.31, 2.63, ...
print(strtable(diamonds, factor.values=as.integer), na.print='')
##  variable              class                                      levels
##     carat            numeric
##       cut Factor w/ 5 levels "Fair", "Good", "Very Good", "Premium", ...
##     color Factor w/ 7 levels                     "D", "E", "F", "G", ...
##   clarity Factor w/ 8 levels              "I1", "SI2", "SI1", "VS2", ...
##     depth            numeric
##     table            numeric
##     price            integer
##         x            numeric
##         y            numeric
##         z            numeric
##                     examples
##  0.23, 0.21, 0.23, 0.29, ...
##              5, 4, 2, 4, ...
##              2, 2, 2, 6, ...
##              2, 3, 5, 4, ...
##  61.5, 59.8, 56.9, 62.4, ...
##          55, 61, 65, 58, ...
##      326, 326, 327, 334, ...
##   3.95, 3.89, 4.05, 4.2, ...
##  3.98, 3.84, 4.07, 4.23, ...
##  2.43, 2.31, 2.31, 2.63, ...

Here’s the source code from Gist:

Related

comments powered by Disqus