From SnpSet to MAP and PED files for Affymetrix Genome Wide SNP

The following post is restricted to CEL files from Affymetrix GenomeWideSNP 5.0 and GenomeWideSNP 6.0.

The idea is to generate a MAP and a PED file from an amount of CEL files from one of those two technologies (in the post GenomeWideSNP 6.0).

First of all we must know the content of each file. The fields in a MAP file are:

  • Chromosome
  • Marker ID
  • Genetic distance
  • Physical position

While in a PED file they are:

  • Family ID
  • Sample ID
  • Paternal ID
  • Maternal ID
  • Sex (1=male; 2=female; other=unknown)
  • Affection (0=unknown; 1=unaffected; 2=affected)
  • Genotypes (space or tab separated, 2 for each marker. 0=missing)

Now, lets see what we have:

[carles@apollo affy6]$ ls
AR00.CEL  AR02.CEL  AR04.CEL  AR06.CEL  AR08.CEL
AR01.CEL  AR03.CEL  AR05.CEL  AR07.CEL  AR09.CEL

Ten CEL files of GenomeWideSNP 6.0 which are need to be converted to plink format.

To start with the conversion we open R and we load the two packages we will use:

library(crlmm)
library(genomewidesnp6Crlmm)

Next we load the CEL names:

> celfiles <- list.celfiles()
> celfiles
 [1] "AR00.CEL" "AR01.CEL" "AR02.CEL" "AR03.CEL" "AR04.CEL" "AR05.CEL"
 [7] "AR06.CEL" "AR07.CEL" "AR08.CEL" "AR09.CEL"

We create a SnpSet with the crlmm method:

crlmmresult <- crlmm(celfiles, verbose=TRUE)

Now we load the annotation file [1] for the technology:

ann <- read.csv("GenomeWideSNP_6.na32.annot.csv", skip=21, row.names=1, as.is=TRUE)

And we use it to create the content of the MAP file:

mapfile <- annot[ , c("Chromosome", "dbSNP.RS.ID", "ChrX.pseudo.autosomal.region.1", "Physical.Position")]

For PED file we clean the annotations to get only the ones we need. For that we request the genotypes to the SnpSet using the method calls:

calls <- calls(crlmmresult)
ann <-ann [rownames(calls), ]

We are going to need two matrices to store the values of the alleles:

alleleA <- alleleB <- matrix("", nrow=nrow(calls), ncol=ncol(calls), dimnames=dimnames(calls))

We look for the alleles on annotations and we check the strand to get the correct ones:

allA <- ann$Allele.A
allComplA <- c("A","C","G","T")[match(allA, c("T", "G", "C", "A"))]
forwA <- ifelse( ann$Strand=="+", allA, allComplA )

allB <- ann$Allele.B;
allComplB <- c("A","C","G","T")[match(allB, c("T", "G", "C", "A"))]
forwB <- ifelse(ann$Strand=="+", allB, allComplB)

With the correct alleles we fill the matrices:

for(ii in 1:ncol(calls)) {
    alleleA[ , ii] <- ifelse(calls[ , ii] < 3, forwA, forwB)
    alleleB[ , ii] <- ifelse(calls[ , ii] < 2, forwA, forwB)
}

Now we create a variable called ped to store all the content of the PED file and we put in it the alelles:

ped <- matrix("",ncol=nrow(alleleA) * 2 + 6, nrow=ncol(alleleA))
for(ii in 1:ncol(calls)) {
  ped[ii, (1:nrow(alleleA)) * 2 + 5] <- alleleA[ , ii]
  ped[ii, (1:nrow(alleleB)) * 2 + 6] <- alleleB[ , ii]
}

To fill the first 6 columns of the PED file I loaded an “info file” attached to my CEL files with the following content:

> info
      family_id      samples_id patern_id mattern_id gender_id affetion
 [1,] "homo sapiens" "AR00"     "0"       "0"        "1"       "0"
 [2,] "homo sapiens" "AR02"     "0"       "0"        "1"       "0"
 [3,] "homo sapiens" "AR04"     "0"       "0"        "1"       "0"
 [4,] "homo sapiens" "AR06"     "0"       "0"        "1"       "0"
 [5,] "homo sapiens" "AR08"     "0"       "0"        "1"       "0"
 [6,] "homo sapiens" "AR01"     "0"       "0"        "2"       "0"
 [7,] "homo sapiens" "AR03"     "0"       "0"        "2"       "0"
 [8,] "homo sapiens" "AR05"     "0"       "0"        "2"       "0"
 [9,] "homo sapiens" "AR07"     "0"       "0"        "2"       "0"
[10,] "homo sapiens" "AR09"     "0"       "0"        "2"       "0"

So to fill the first 6 columns is as easy as:

ped[ , 1] <- info[ , "family_id"]
ped[ , 2] <- info[ , "samples_id"]
ped[ , 3] <- info[ , "patern_id"]
ped[ , 4] <- info[ , "matern_id"]
ped[ , 5] <- info[ , "gender_id"]
ped[ , 6] <- info[ , "affetion"]

Finally we only need to save the variables mapfile and ped to disk with:

write.table(mapfile, file="Affy6.map", quote=FALSE, row.names=FALSE, col.names=FALSE)
write.table(ped, file="Affy6.ped", quote=FALSE, row.names=FALSE, col.names=FALSE )

References

  • [1] Annotation files for Affymetrix arrays: link
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: