Playing with TCGA .CEL files and TCGA Barcodes

Today I want a file relating the names of the .CEL files from TCGA, the barcodes for this samples and the definition of the sample type in the three available forms (numeric, short and description). An example, the following:

filename        barcode sampletype_numeric      sampletype_short        sampletype_desc
TCGA_666_A01_0070X01.CEL   TCGA-ZZ-A6AW-01A-01A-X00D-AB    01      TP      Primary solid Tumor
TCGA_666_A02_0070X01.CEL   TCGA-ZZ-A6AW-10A-01A-X00G-AB    10      NB      Blood Derived Normal
TCGA_666_A03_0070X01.CEL   TCGA-ZZ-A6AW-01A-01A-X00D-AB    01      TP      Primary solid Tumor

In order to provide a reusable way to create this file I wrote the following function in R:

tcga_barc_tbl <- function(typefile, mapfile, outfile="TABLE_TCGA.TSV") {
    nnD <- read.delim(mapfile, header=TRUE)
    stD <- read.delim(typefile, header=TRUE, sep=",")

    x <- lapply(as.character(nnD[ , "barcode.s."]), function(x){ 
            wd <- strsplit(strsplit(x, ",")[[1]], "-")[[1]][4]
            substr(wd, 1, nchar(wd)-1)

    df <- data.frame(filename=nnD[ , "filename"], barcode=nnD[ , "barcode.s."], 
        sampletype_short=stD[as.numeric(unlist(x)), "Short.Letter.Code"],
        sampletype_desc=stD[as.numeric(unlist(x)), "Definition"])

    if ( {
    } else {
        write.table(df, file=outfile, quote=FALSE, sep="t", row.names=FALSE)

The function tgca_barc_tbl needs the FILE_SAMPLE_MAP.txt and the sampleType.csv. The first one comes in the .tar.gz file when downloading from TGCA – Data Matrix. The second one can be generated at codeTableReport

Feel free to use it at your own.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: