R and regex: find all occurrences

In R there are many functions that work with a pattern written as a regular expression. Today I needed to deal with one of these functions: str_locate_all (doc) from stringr

My goal was to find "223777_at [Chip: U133B]" in a series of strings like the following one:

text <- "11753227_s_at [Chip: PrimeView]; 223777_at [Chip: HT_HG-U133B]; 223777_PM_at [Chip: U133_Plus_PM]; 48336_at [Chip: U95B]; 223777_at [Chip: GeneProfilingArray]; g13477210_3p_at [Chip: U133_X3P]; MmugDNA.4759.1.S1_at [Chip: Rhesus]; 11753227_s_at [Chip: HG-U219]; ADXECADA.19261_s_at [Chip: Xcel]; ADXECRS.13279_at [Chip: Xcel]; ADXECRS.13279_x_at [Chip: Xcel]; 223777_at [Chip: U133B]; 223777_at [Chip: U133_Plus_2]; RC_T49570_at [Chip: Hu35KsubB]"

The way to find the location (in my case all the locations) of the pattern number slash at white-space bracket Chip two-points U133B brecket follows:

str_locate_all(text, pattern="[0-9]+_at \\[Chip: U133B\\]")

This returns a list of matrix with the locations (start and end point) of all the occurrences found by the given pattern. Take care, so the brackets need to be escaped by double slash.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: