This one came from the R-Help mailing list, so I thought I'd document the challenge and solutions here. The inquirer had a data set with 3 variables, V1, V2 and V3. If there were multiple observations having the same values on V1 and V2, they wanted to keep the one with the largest value of V3, and delete the others.
So, in the example below, they wanted to delete the first observation:
V1 V2 V3 3 3 1 3 3 4
Four solutions were given:
Peter's:
l <- split(d$V3, list(d$V1,d$V2)) ixl <- lapply(l, function(x) { if ((n <- nrow(x)) == 2) seq_len(n) != which.min(x) else rep(TRUE, n) }) ix <- unsplit(ixl, list(d$V1,d$V2)) d[ix,]
Greg's:
my.df2 <- my.df[order(my.df$V3, decreasing=TRUE),] my.df3 <- my.df2[ !duplicated( my.df2[,c('V1','V2')] ), ]
Jim's:
x <- cbind(V1=sample(1:3,20,TRUE), V2=sample(1:3,20,TRUE), V3=sample(20)) x V1 V2 V3 [1,] 2 2 1 [2,] 1 2 6 [3,] 3 2 10 [4,] 3 1 11 [5,] 3 2 5 [6,] 3 2 7 [7,] 2 1 19 [8,] 3 3 13 [9,] 1 3 2 [10,] 3 3 20 [11,] 3 3 18 [12,] 2 1 4 [13,] 3 2 3 [14,] 3 2 12 [15,] 3 1 17 [16,] 2 3 9 [17,] 2 3 8 [18,] 1 1 16 [19,] 3 2 15 [20,] 3 3 14 x.max <- do.call('rbind', by(x, list(x[,1], x[,2]), function(.sub){ + .sub[which.max(.sub[,3]),] + })) x.max V1 V2 V3 18 1 1 16 7 2 1 19 15 3 1 17 2 1 2 6 5 2 2 1 19 3 2 15 9 1 3 2 16 2 3 9 10 3 3 20
Sundar's solution, which is functionally the same as Greg's:
## some sample data d <- read.table(textConnection("V1 V2 V3 3 3 2 3 3 4 3 3 1 3 2 1 3 2 5"), header = TRUE)
## the code d <- d[rev(do.call("order", d)), ] d <- d[!duplicated(d[1:2]), ] d