The defining feature of oligonucleotide expression arrays is the use of several probes to assay each targeted transcript. This is a bonanza for the statistical geneticist, who can create probeset summaries with specific characteristics.
There are now several methods available for summarizing probe level data from the popular Affymetrix GeneChips, and it can be difficult to identify the method best suited to a given inquiry.
We have developed a graphical tool to evaluate summaries of Affymetrix probe level data. Plots and summary statistics offer a picture of how an expression measure performs in several important areas. This picture facilitates the comparison of competing expression measures and the selection of methods suitable for a specific investigation. The key is a benchmark consisting of one or two spike-in studies and, optionally, a dilution study (details below). Because the truth is known for these data, it is possible to identify statistical features of the data for which the expected outcome is known in advance. Those features highlighted in our suite of graphs are justified by questions of biological interest, and motivated by the presence of appropriate data.
In conjunction with the release of a graphics toolbox as part of the Bioconductor Project, we have created this web-based tool.
We invite all interested parties to put their probe summary methods to the test in a friendly competition. See the submission form below. Download the benchmark data and develop one or more probe summaries. Return the expression-level data, and we'll tell you how you did on this set of tasks. The new assessments (and the original assessments ) show how everyone is doing.
Summaries need not be serious attempts at a complete expression measure. The submission form contains a check-box for exclusion from the competition. If you are interested in normalization, run competing normalization procedures, take a simple average over probes in a set and see how the different methods do. The goal is threefold. In addition to vetting the toolbox and competing for bragging rights, this will be an opportunity to systematically examine the strengths and weaknesses of the various approaches to probeset summary.
For more details, read the manuscript [pdf].
Affymetrix's Spike-in hgu95a Experiment CEL files [gzip-compressed tar-archive]
Affymetrix's Spike-in hgu133a Experiment CEL files [gzip-compressed tar-archive]
Request Gene Logic's Dilution Experiment CEL files [available only on CD/DVD]
In the event of problems, contact Gene Logic directly by telephone or e-mail.
x
is matrix with probe set IDs
as rownames
and the filenames as colnames
, the call
write.table(data.frame(x, check.names=FALSE), file="filename.csv", sep=",", col.names=NA, quote=FALSE)
should do the trick.
Submission form
This website designed and maintained by Rafael A. Irizarry and Harris A. Jaffee |