
Features:
CRUISING WITH THE TOP DOWN
by J. William Bell NCSA
University of Illinois and NCSA researchers, using an uncommon mass
spectrometer and the Alliance's Condor computing system, craft a new method of
identifying proteins and characterizing changes in those proteins.
It should probably come as little surprise that James Watson--of Watson and
Crick and their vision of DNA's double-helical structure--understands
proteomics' power. At a biotechnology symposium in early 2003, Watson referred
to DNA as the "script" and proteins as the "actors."
Neil Kelleher, an assistant chemistry professor at the University of Illinois
at Urbana-Champaign, couldn't agree more.
"The realization of our society's expectations for 21st century medicine
depends on further insights into biology at the level of proteins," he says,
"not just DNA." These insights will rely on "a new, dominant methodology for
the collection and interpretation of proteomic data."
Kelleher and his research team hope their "top-down" approach to identifying
and characterizing proteins will become that new method. An advanced mass
spectrometer, the Alliance's Condor computing system at the University of
Wisconsin, and NCSA's expertise have all been integral to this nascent
approach's development.
Humans Are Too Complex
Ian Brooks, an NCSA research programmer on the project who is also a
biochemist, explains the importance of Watson's distinction between actor and
script: "For a simple organism--a bacteria, for example--you only need their
genome [to address central questions about cell biology on a whole-system
level]. For that organism, there is a one-to-one correspondence between a gene
and the protein that it expresses."
"Humans are too complex," he continues. "Their gene sequence doesn't tell you
what proteins are going to be made. Many proteins come from a single gene."
Though there's much to be learned from the gene sequence (see "A science of
big numbers" in this issue of Access), it tells researchers far less when
they're studying mammals. In fact, the Kelleher group has found that about 10
percent of proteins are chemically different than the proteins expected when
analyzing the gene sequence alone.
In complex critters like mammals, proteins might vary from those typically
expressed due to variations in DNA transcription or RNA splicing during
expression or due to chemical alterations to proteins' amino acids following
production. Thus, many different proteins can result from any single snippet
of genetic code. Once expressed, and in some cases altered, the proteins go
about their business--whether they're enzymes that regulate biochemical
activity, hormones such as insulin, or the structural basis for your body's
hair, muscles, and skin.
The Kelleher team is out to characterize what are called post-translational
modifications. Post-translational modifications change proteins' properties
chemically by chopping up the proteins or "decorating" certain amino acids
within the protein. Tacking on a phosphate, for example, fundamentally changes
the protein. It might make an active enzyme of the protein or flip a molecular
logic switch from 0 to 1.
According to Brooks, there are about 300 known post-translational
modifications that can alter a protein and therefore influence its function or
location in a cell. Formal, direct analyses have been "limited to date and
totally unsystematic," says Kelleher.
A system such as the Kelleher team's would allow researchers to characterize
all of the possible modifications that a specific protein might undergo. It
would also allow researchers to catalog the various modification states so the
next team through the breach wouldn't have to duplicate the effort.
Separate, Sort, And Identify
To begin the identification and characterization, cells are broken apart.
Hundreds of incumbent proteins are separated by mass using gel
electrophoresis, and a special soap allows further, improved separation using
liquid chromatography. The proteins are sorted into groups of about five to 15
similarly sized particles. Proteins that are expressed from the same section
of the genetic sequence but that differ due to post-translational
modifications tend to fall in the same group.
Electrophoresis, which separates molecules using electrical current, and
liquid chromatography, which separates molecules by passing them through
sticky pores, are nothing special for chemists and biologists. Accusations of
ordinariness, however, should probably end there.
Once the proteins are separated into like-sized sets, they are fed into a
Fourier-transform mass spectrometer with a 9.4 tesla superconducting magnet.
There are fewer than six research labs in the world with this type of
instrument, according to Kelleher, and only the Kelleher team is pursuing the
top-down approach to identifying and characterizing proteins.
The mass spectrometer ionizes the proteins and sets them spinning in the
instrument's magnetic field. With a bit of computational analysis, the
instrument's read on the spin can be converted into a mass measurement because
the ionized protein's spin in the gas phase is a function of its mass. The
faster the protein ion spins, the lower its mass. Masses, in turn, can be
compared to a database and equated to a particular type of protein.
Of Traditional Character
Identification of proteins, therefore, is a straightforward enterprise that
can be carried out using a more ho-hum mass spectrometer. Characterizing the
post-translational modifications--tracing the steps between the protein
initially expressed by the gene and the protein at work, or at times wreaking
havoc, in your body--is a far more daunting prospect.
In a traditional characterization scheme, a protein is broken into many small
pieces, typically less than 20 amino acids long. Various enzymes chew the
protein into these pieces after separation but before the sample enters the
mass spectrometer. Researchers predict the type and size of the resulting
fragments based on the make-up of the protein and the types of enzymes used.
They compare these predicted values to the values that result from the mass
spectrometer. If the values match, that section of the protein is assumed to
be unmodified. If they do not, it is assumed a modification has taken place.
This approach is powerful and time-tested. But since many fragments go
missing, fingering modifications is difficult. The smaller the piece, the less
likely it is to be detected and properly identified. The method also tends to
go south if numerous fragments have been modified because it becomes more
difficult to determine which modified fragment is really the cousin of the
predicted fragment.
Screaming Through Samples, Screaming For Condor
With the new mass spectrometer online, the Kelleher team is eschewing
tradition and using their top-down approach. Kelleher, then a PhD candidate,
was part of the Cornell University team that first proposed the method in a
1999 Journal of the American Chemical Society article.
Inside the mass spectrometer, an infrared laser cuts each protein ion into two
pieces (in almost every case). Predicted and derived masses are compared as
they are in traditional methods. Because the fragments are much larger and,
most importantly, because the fragments include one of the protein sequence's
terminal ends, it is easier to determine what modifications have taken place
where. By comparing data gathered from cutting different copies of the same
protein in different places, the team can characterize multiple modifications
more readily and do so with markedly better accuracy.
The Kelleher team first used the system to study the proteins of Methanococcus
jannaschii, an autotrophic bacteria that lives at the bottom of the ocean, and
Saccharomyces cerevisiae, also known as baker's yeast. Results were published
in 2002 in the journals Nature Biotechnology and Analytical Chemistry,
respectively.
Today, the targets are proteins from human cells. The team hopes to process
some 100 million cells, identifying and characterizing the modifications of
any protein that occurs more than 1,000 times in each cell and is less than
600 amino acids long. A Web portal, called ProSightPTM, serves as a
clearinghouse for the data and provides tools for others doing protein
analysis.
Once it's running at full capacity, the team's mass spectrometer will create
about one gigabyte of data per day and will operate 24-7. With numbers like
that, traditional methods of converting the newly produced data into masses
and equating those masses to particular proteins and protein fragments are not
an option.
The calculations that go into the analysis of an individual sample are not
intense or time-consuming. "You can do it on a fast desktop in less than 10
minutes," says Brooks. "But, by the end of the year, they're going to be
producing five to seven hundred datasets a day…At that rate, you're looking at
one to four days of computing time if the calculations are run serially."
Rather than letting all that excess mass spectrometer capacity go to waste,
Kelleher and company teamed up with NCSA to port the analysis software, called
THRASH, to run on the Alliance's Condor system, which pools idle time on
desktop systems to allow for high-throughput computing. One of Kelleher's
graduate students, Jeff Johnson, completed part of the work--working with
Brooks and Peter Andrews of Eastern Illinois University--in a matter of days.
"It was a problem that just screamed out for Condor," says Brooks. "You don't
need a huge amount of memory. The jobs have short run times and are easily
crunched. What you want here is not one computer that can crunch one big
problem, but a lot of computers that can crunch lots of little ones."
As a result of this natural fit, the portion of the analysis that converts the
raw mass spectrometric data into database-ready queries is screaming along on
Condor. With some 200 processors working in tandem, analyses that might have
taken days to complete are now finished in 30 minutes. Other portions of the
analysis process are likely to be moved to Condor in the future, according to
Kelleher and Brooks, providing plenty of opportunities for interaction between
NCSA and Kelleher's rapidly maturing group.
Relevant URLs: --Access story: http://access.ncsa.uiuc.edu/Stories/topdown/
|