Select time variables in a (multivariate/wide) data frame by name

No votes yet

This function will extract time labeled variables (or any variables with consistent naming) by name from a data frame in the order in which they appear in the data frame. This is especially useful in cases when one has a longitudinal or time-series data set where each row (subject) has many occasions of measurements for each measure. Code and examples after the jump.

frame_by_name <- function(data, word){
	.selected_names <- names(data)[grep(word, names(data))]
	as.data.frame(sapply(.selected_names, function(.name){
		data[[.name]]
	}))
}

Let's generate a data set where a depression and anxiety variable has been measured multiple times per subject:

my_data<- data.frame(
    id=c(1:50),
    depression1=rnorm(50),
    anxiety1=rnorm(50),
    depression2=rnorm(50),
    anxiety2=rnorm(50),
    depression3=rnorm(50),
    anxiety3=rnorm(50)
)
my_data

Let's say I want to do stuff with the depression variables (like plot them) without doing something like this:

depression <- data.frame(
    depression1=my_data$depression1,
    depression2=my_data$depression2,
    depression3=my_data$depression3
)

as this could get old fast with more occasions of measurement per subject. Instead, just run the following:

depression <- frame_by_name(my_data, "depression")
depression

Now, you can do fun things with the data, for example, transform it into univariate (stacked/tall/long) form (uses this function in the second half of the recipe).