This problem was posed on the R-help mailing list:
"I have a problem defining a pattern to replace the digits using for example 'sub'. Removing the ".tif" part works fine using"
sub('.tif',"",x)
"but how do I get rid of the four preceding digits?"
This is a perfect example of when regular expressions come in handy; in fact, the inquirer was already using a pattern ".tif" in the sub function. Except, the period "." before tif is the match-any-character operator in POSIX regex. So, the function above said: substitute the pattern, any character followed by a "t" followed by an "i" followed by an "f" as follows with a nothing ("") in the vector x.
The first thing we do is escape the period, or change the "." to "\\." (in R, you must use \\ to escape characters) which now matches specifically the period character. And then precede that with a match of four "[[:digit:]]"s.
x <- c("060907_17_3_5_1_1_2909.tif", "060907_17_3_5_2_1_2910.tif", "060907_17_3_5_3_1_2911.tif") sub('[[:digit:]][[:digit:]][[:digit:]][[:digit:]]\\.tif', '', x) [1] "060907_17_3_5_1_1_" "060907_17_3_5_2_1_" "060907_17_3_5_3_1_"
This can be stated more succinctly by following the "[[:digit:]]" with the interval operator and a count.
sub('[[:digit:]]{4}\\.tif', '', x) [1] "060907_17_3_5_1_1_" "060907_17_3_5_2_1_" "060907_17_3_5_3_1_"
Note, another way to solve this problem would be to do as Paul suggested:
substr("060907_17_3_5_1_1_2909.tif", start = 1, stop = 18)
But this requires, (1) counting when you need to stop the substring, and (2) having the exact same length of characters every time (i.e. test_2909.tif would not match).