This function will extract time labeled variables (or any variables with consistent naming) by name from a data frame in the order in which they appear in the data frame. This is especially useful in cases when one has a longitudinal or time-series data set where each row (subject) has many occasions of measurements for each measure. Code and examples after the jump.
frame_by_name <- function(data, word){ .selected_names <- names(data)[grep(word, names(data))] as.data.frame(sapply(.selected_names, function(.name){ data[[.name]] })) }
Let's generate a data set where a depression and anxiety variable has been measured multiple times per subject:
my_data<- data.frame( id=c(1:50), depression1=rnorm(50), anxiety1=rnorm(50), depression2=rnorm(50), anxiety2=rnorm(50), depression3=rnorm(50), anxiety3=rnorm(50) ) my_data
Let's say I want to do stuff with the depression variables (like plot them) without doing something like this:
depression <- data.frame( depression1=my_data$depression1, depression2=my_data$depression2, depression3=my_data$depression3 )
as this could get old fast with more occasions of measurement per subject. Instead, just run the following:
depression <- frame_by_name(my_data, "depression") depression
Now, you can do fun things with the data, for example, transform it into univariate (stacked/tall/long) form (uses this function in the second half of the recipe).