MICROARRAY TECHNOLOGY FOR GENETIC ENGINEERS / BI
by Cara Goldenberg
"Hi-tech Microarray Technology Holds Future Breakthroughs for Genetic
Engineers, and Future Earnings For Business Strategists"
Thanks to enduring efforts of committed educational institutions like
Stanford University and MIT, complete genomes of many species have already
been mapped; now, with a working draft of the human genome in place, and
seemingly infinite avenues for experimentation, it is the private sector's
turn to cash in. Aspiring multi-billion dollar companies like Affymetrix and
Incyte Technologies are competing head-to-head to continually develop, update
and market instrumentation that will permit cutting-edge researchers to gather
their data in a manner commensurate with the sophistication of their
experimental design itself (reported price tag: $175,000 for microarray
slides, and the tools that collect, store, and analyze the data). Included are
microarray chips, typically glass slides which, for between $500 and $2000 per
chip (chips are not reusable) contain the entire genome of the species being
examined. But what good is data generated by these hi-tech, hi-price
techniques if no reliable data analysis method exists? Data is only as
valuable as its interpreter, and with the volume of data generated by
microarray techniques, no single researcher or team of researchers could be
expected to make "heads or tails" without the aid of a computer. But it
doesn't appear that any computer has yet been able to make any significant
breakthrough through analysis of the volumes of data generated by microarrays.
The potential for gain seems infinite, yet these invaluable data sources
remain "untapped."
A void has existed for some time in microarray data analysis. Despite
efforts to create more diverse and cost-effective alternatives to microarray
collection, only greater options regarding data collection methodology have
emerged. Those chiefly responsible for the effort include geneticist Patrick
Brown, his former graduate student Joseph DeRisi, and a bioinformatics expert
Michael Eisen, all from Stanford University. Brown, along with engineering
student Dari Shalon, devised a substantially less expensive way of generating
microarrays in the mid-90s geared toward studying gene expression patterns in
yeast.
While advances in data collection have been rapid, data analysis methods
remain limited and lag the recent advances made in data collection. The
primary existing method is termed clustering. The principle: Given a set of
data points -- gene data points map expression levels across certain
environmental conditions -- each having a set of attributes, and a similarity
measure among them, clustering maps data points in one cluster that are more
similar to one another, and maps data points in separate clusters that are
less similar to one another (similarity measures include Euclidean Distance if
attributes are continuous and other Problem-specific Measures).
While clustering methods are not without value, a more intuitive, insightful
model for data analysis is required. According to Eisen, "What is needed
instead is a holistic approach to analysis of genomic data that focuses on
illuminating order in the entire set of observations, allowing biologists to
develop an integrated understanding of the process being studied." According
to Eisen, "An important test of the value of this approach comes when we
examine the identity of the clustered genes at varying levels of identity."
Thus, if expression patterns were explicitly sought amongst genes with known
roles of coexpression (inter-dependence), data results would be more intuitive
and readily interpretable. Moreover, genes yet unidentified but with
similarly-observed behavior as one or several genes already identified could
be inferred to be involved in that particular process, and thus labeled and
classified - the ultimate goal of any gene-expression data based experiment.
The upshot - two lessons for two very different types of professionals. For
the genomic scientist, a more intuitive, sophisticated method of microarray
data analysis is in the works, with the potential to yield clear-cut,
definitive results, easily interpreted by even the business leader. And for
the business leader, remember this - if your data analysis software yields
results that are not readily interpretable, the impact is fleeting at best.
Time and resources invested in developing a more easily interpretable analysis
will pay dividends, rendering today's purely quantitative methods obsolete,
and paving the path for significantly more meaningful, qualitative data
analysis.
Cara Goldenberg is a member of the Research and Development team at Virtual
Gold, a data mining and business intelligence company in Hartsdale, NY. Cara
is currently working on projects related to data mining of bioinformatics data
as she prepares to enter her junior year as a chemistry major at Yale
University, where she is pursuing a Bachelor of Science/ Master's Degree. Her
previous areas of research include Canine Von Willebrand's Disease, a fatal
bleeding disorder in dogs (at Michigan State University); and the development
of Nuclear Magnetic Resonance-based experimental techniques (at Yale). Cara
also has previous work experience at Chase Manhattan Bank and the New York
City Charter Revision Commission under Mayor Rudolph Giuliani. Her e-mail
address is cara.goldenberg@yale.edu.
For more information, see www.virtualgold.com
|