SnpMatrix and SnpMatrix Container from Affymetrix Genome Wide SNP 6.0

The class SnpMatrix comes from the package snpStatss, available at Bioconductor [1,2]. So, first of all, we load this package:

library(snpStats)

Then we load the calls file. The calls file is obtained using the tool apt-probeset-genotype from the Affymetrix Power Tools (APT) [3, 4]. Since we are processing .CEL files from Affymetrix Genome Wide SNP 6.0, the APT uses the calling algorithm called birdseed (v2) so the produced file will be called birdseed-v2.calls.txt. Follows the call to the APT I used :

apt-probeset-genotype 
    -o results_dir 
    -c ~/lib_affy6/GenomeWideSNP_6.cdf 
    --set-gender-method cn-probe-chrXY-ratio 
    --chrX-probes ~/lib_affy6/GenomeWideSNP_6.chrXprobes 
    --chrY-probes ~/lib_affy6/GenomeWideSNP_6.chrYprobes 
    --special-snps ~/lib_affy6/GenomeWideSNP_6.specialSNPs 
    --read-models-birdseed ~/lib_affy6/GenomeWideSNP_6.birdseed-v2.models 
    -a birdseed-v2 
    CELs/*.CEL

You must understand that my library files are stored at ~/lib_aff6 and than the generated files by the APT will be stored at results_dir.

Now, let’s load the file containing the calls:

gnds <- read.delim("birdseed-v2.calls.txt", header=TRUE, comment.char="#")

After loading the genotypes it is the time to load the annotation information [5]:

annotation.file <- "~/GenomeWideSNP_6/GenomeWideSNP_6.na33.annot.csv"
ann <- read.csv(annotation.file, comment.char="#", header=TRUE)
ann <- ann[ , c("Probe.Set.ID", "dbSNP.RS.ID", "Chromosome", "Physical.Position", "Allele.A", "Allele.B", "Strand")]

We just need to keep the annotation information for the markers the APT was able to call so we can reduce the rows on the ann with:

rownames(ann) <- ann$Probe.Set.ID
ann <- ann[gnds$probeset_id, ]

Finally we can remove the first column of ann, the one containing the probeset’s id and rename the columns:

ann <- ann[ , -1]
colnames(ann) <- c("dbsnp_rs_id", "chromosome",  "position", "allele_a", "allele_b",  "strand")

For the SnpMatrix‘s side we create it in two steps: first remove the probeset’s id column and second coercing the data.frame to a matrix and to a SnpMatrix:

snpm <- new("SnpMatrix", as(gnds[,-1], "matrix"))

And, at the end, we create the SnpMatrix Container and save it into a .rda file:

smc <- list()
smc$genotype <- snpm
smc$map <- ann

save(smc, file="controls.rda")

##References##

  • [1] Bioconductor: link
  • [2] snpStats: link
  • [3] Affymetrix Power Tools: link
  • [4] Library Files: link
  • [5] Annotation File: link
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: