HPCwire
 The global publication of record for High Performance Computing / October 10, 2003: Vol. 12, No. 40

  |  Table of Contents  |  

Features:

CRUISING WITH THE TOP DOWN
by J. William Bell NCSA

University of Illinois and NCSA researchers, using an uncommon mass spectrometer and the Alliance's Condor computing system, craft a new method of identifying proteins and characterizing changes in those proteins.

It should probably come as little surprise that James Watson--of Watson and Crick and their vision of DNA's double-helical structure--understands proteomics' power. At a biotechnology symposium in early 2003, Watson referred to DNA as the "script" and proteins as the "actors."

Neil Kelleher, an assistant chemistry professor at the University of Illinois at Urbana-Champaign, couldn't agree more.

"The realization of our society's expectations for 21st century medicine depends on further insights into biology at the level of proteins," he says, "not just DNA." These insights will rely on "a new, dominant methodology for the collection and interpretation of proteomic data."

Kelleher and his research team hope their "top-down" approach to identifying and characterizing proteins will become that new method. An advanced mass spectrometer, the Alliance's Condor computing system at the University of Wisconsin, and NCSA's expertise have all been integral to this nascent approach's development.

Humans Are Too Complex

Ian Brooks, an NCSA research programmer on the project who is also a biochemist, explains the importance of Watson's distinction between actor and script: "For a simple organism--a bacteria, for example--you only need their genome [to address central questions about cell biology on a whole-system level]. For that organism, there is a one-to-one correspondence between a gene and the protein that it expresses."

"Humans are too complex," he continues. "Their gene sequence doesn't tell you what proteins are going to be made. Many proteins come from a single gene." Though there's much to be learned from the gene sequence (see "A science of big numbers" in this issue of Access), it tells researchers far less when they're studying mammals. In fact, the Kelleher group has found that about 10 percent of proteins are chemically different than the proteins expected when analyzing the gene sequence alone.

In complex critters like mammals, proteins might vary from those typically expressed due to variations in DNA transcription or RNA splicing during expression or due to chemical alterations to proteins' amino acids following production. Thus, many different proteins can result from any single snippet of genetic code. Once expressed, and in some cases altered, the proteins go about their business--whether they're enzymes that regulate biochemical activity, hormones such as insulin, or the structural basis for your body's hair, muscles, and skin.

The Kelleher team is out to characterize what are called post-translational modifications. Post-translational modifications change proteins' properties chemically by chopping up the proteins or "decorating" certain amino acids within the protein. Tacking on a phosphate, for example, fundamentally changes the protein. It might make an active enzyme of the protein or flip a molecular logic switch from 0 to 1.

According to Brooks, there are about 300 known post-translational modifications that can alter a protein and therefore influence its function or location in a cell. Formal, direct analyses have been "limited to date and totally unsystematic," says Kelleher.

A system such as the Kelleher team's would allow researchers to characterize all of the possible modifications that a specific protein might undergo. It would also allow researchers to catalog the various modification states so the next team through the breach wouldn't have to duplicate the effort.

Separate, Sort, And Identify

To begin the identification and characterization, cells are broken apart. Hundreds of incumbent proteins are separated by mass using gel electrophoresis, and a special soap allows further, improved separation using liquid chromatography. The proteins are sorted into groups of about five to 15 similarly sized particles. Proteins that are expressed from the same section of the genetic sequence but that differ due to post-translational modifications tend to fall in the same group.

Electrophoresis, which separates molecules using electrical current, and liquid chromatography, which separates molecules by passing them through sticky pores, are nothing special for chemists and biologists. Accusations of ordinariness, however, should probably end there.

Once the proteins are separated into like-sized sets, they are fed into a Fourier-transform mass spectrometer with a 9.4 tesla superconducting magnet. There are fewer than six research labs in the world with this type of instrument, according to Kelleher, and only the Kelleher team is pursuing the top-down approach to identifying and characterizing proteins.

The mass spectrometer ionizes the proteins and sets them spinning in the instrument's magnetic field. With a bit of computational analysis, the instrument's read on the spin can be converted into a mass measurement because the ionized protein's spin in the gas phase is a function of its mass. The faster the protein ion spins, the lower its mass. Masses, in turn, can be compared to a database and equated to a particular type of protein.

Of Traditional Character

Identification of proteins, therefore, is a straightforward enterprise that can be carried out using a more ho-hum mass spectrometer. Characterizing the post-translational modifications--tracing the steps between the protein initially expressed by the gene and the protein at work, or at times wreaking havoc, in your body--is a far more daunting prospect.

In a traditional characterization scheme, a protein is broken into many small pieces, typically less than 20 amino acids long. Various enzymes chew the protein into these pieces after separation but before the sample enters the mass spectrometer. Researchers predict the type and size of the resulting fragments based on the make-up of the protein and the types of enzymes used. They compare these predicted values to the values that result from the mass spectrometer. If the values match, that section of the protein is assumed to be unmodified. If they do not, it is assumed a modification has taken place.

This approach is powerful and time-tested. But since many fragments go missing, fingering modifications is difficult. The smaller the piece, the less likely it is to be detected and properly identified. The method also tends to go south if numerous fragments have been modified because it becomes more difficult to determine which modified fragment is really the cousin of the predicted fragment.

Screaming Through Samples, Screaming For Condor

With the new mass spectrometer online, the Kelleher team is eschewing tradition and using their top-down approach. Kelleher, then a PhD candidate, was part of the Cornell University team that first proposed the method in a 1999 Journal of the American Chemical Society article.

Inside the mass spectrometer, an infrared laser cuts each protein ion into two pieces (in almost every case). Predicted and derived masses are compared as they are in traditional methods. Because the fragments are much larger and, most importantly, because the fragments include one of the protein sequence's terminal ends, it is easier to determine what modifications have taken place where. By comparing data gathered from cutting different copies of the same protein in different places, the team can characterize multiple modifications more readily and do so with markedly better accuracy.

The Kelleher team first used the system to study the proteins of Methanococcus jannaschii, an autotrophic bacteria that lives at the bottom of the ocean, and Saccharomyces cerevisiae, also known as baker's yeast. Results were published in 2002 in the journals Nature Biotechnology and Analytical Chemistry, respectively.

Today, the targets are proteins from human cells. The team hopes to process some 100 million cells, identifying and characterizing the modifications of any protein that occurs more than 1,000 times in each cell and is less than 600 amino acids long. A Web portal, called ProSightPTM, serves as a clearinghouse for the data and provides tools for others doing protein analysis.

Once it's running at full capacity, the team's mass spectrometer will create about one gigabyte of data per day and will operate 24-7. With numbers like that, traditional methods of converting the newly produced data into masses and equating those masses to particular proteins and protein fragments are not an option.

The calculations that go into the analysis of an individual sample are not intense or time-consuming. "You can do it on a fast desktop in less than 10 minutes," says Brooks. "But, by the end of the year, they're going to be producing five to seven hundred datasets a day…At that rate, you're looking at one to four days of computing time if the calculations are run serially."

Rather than letting all that excess mass spectrometer capacity go to waste, Kelleher and company teamed up with NCSA to port the analysis software, called THRASH, to run on the Alliance's Condor system, which pools idle time on desktop systems to allow for high-throughput computing. One of Kelleher's graduate students, Jeff Johnson, completed part of the work--working with Brooks and Peter Andrews of Eastern Illinois University--in a matter of days.

"It was a problem that just screamed out for Condor," says Brooks. "You don't need a huge amount of memory. The jobs have short run times and are easily crunched. What you want here is not one computer that can crunch one big problem, but a lot of computers that can crunch lots of little ones."

As a result of this natural fit, the portion of the analysis that converts the raw mass spectrometric data into database-ready queries is screaming along on Condor. With some 200 processors working in tandem, analyses that might have taken days to complete are now finished in 30 minutes. Other portions of the analysis process are likely to be moved to Condor in the future, according to Kelleher and Brooks, providing plenty of opportunities for interaction between NCSA and Kelleher's rapidly maturing group.

Relevant URLs: --Access story: http://access.ncsa.uiuc.edu/Stories/topdown/


Top of Page

  |  Table of Contents  |