About TCAW - Subscription Info
December 2000
Vol. 9, No. 12, pp. 15–16.
Computers in Chemistry
A Sophisticated Star Search

photo of satellite dishDistributed computing uses idle PC time to support projects like SETI.

Distributed computing, as a concept, has been around for decades. Even in the early days of computing, the idea of parceling a large project into many smaller projects to be tackled by separate processors was attractive, at least theoretically. In 2000, the theory has collided with a reality in which millions of near-gigaflop personal computers sit mostly idle in homes, college dorm rooms, and businesses. Distributed computing projects are attracting millions of participants, studying everything from intergalactic life to protein folding. Although music sharing via Napster is probably the largest distributed project to date, there’s plenty of distributed activity with more serious intent.

Is Anybody Out There?
By far, the most popular distributed computing project is SETI@home, with more than 1 million eager participants. SETI stands for the Search for Extraterrestrial Intelligence, and this organization has searched, using different means, for alien communication. The primary focus of SETI is the search for alien radio communication by analyzing radio signals, with Fourier transform, collected by the mammoth Arecibo radio telescope in Puerto Rico and others for repetitive or distinct signals, which could be extraterrestrial communications.

The basic computing problem here is simple—there’s a lot of sky up there and a lot of radio spectrum. So the SETI@home project assigns a small portion of the sky, and a particular radio frequency range, to volunteers. Volunteers sign up at http://setiathome.berkeley.edu, download the SETI program, and are assigned small bundles of radio data to process. The software, when it’s running, acts as a screensaver, displaying graphical results of the analysis. SETI software is available for Windows, Macintosh, and Linux-based systems.

So far, there’s been radio silence from the little green men, but the SETI display on a computer screen is regarded as the mark of a cool user in dorm rooms from Boston to Tokyo. Interest in life elsewhere in the universe is so strong that SETI has attracted many more volunteers than originally anticipated, encouraging SETI administration to expand the frequency spectrum being studied.

Light-Years to Angstroms
Back on earth, on a much smaller scale, proteins assemble from a chain of amino acids into a compact three-dimensional form through a process called “folding”. One specific three-dimensional form is often essential for biological function and other three-dimensional structures that are possible with the same chain have different properties. The folding process involves so many intramolecular interactions that to date only a handful of folding sequences have been successfully simulated by computer.

To better understand how proteins fold, Stanford University’s Pande Group is hoping for computing help by taking the same approach as SETI@home. The group would like Internet users to put their PCs (in Windows or Linux) to work running a massively distributed protein folding simulation.

The Folding@Home program is designed to run simulations of how proteins fold. “Once you know what the shape is, you can guess at what it does,” said Vijay Pande, assistant professor of chemistry at Stanford University and principal investigator on the project. “By doing these folding simulations, we can understand which genes activate particular proteins and perhaps how to treat diseases that emanate from the dangerous and deadly ones.”

Simple proteins assemble in about 10,000 nanoseconds, and one 400-MHz computer can simulate 1 nanosecond of assembly time in about a day, Pande said. This means 1000 participants would be needed to simulate a simple protein fold in 10 days. More complex proteins can take far longer.

This new project used a software developer’s kit (SDK) for building its service, called the Mithral Client-Server SDK, which is designed to quickly build a SETI@home-like project. Some of the details of Folding@Home are still undergoing peer review, but the latest information can be found on the Folding@Home Web site (www.stanford.edu/group/pandegroup/Cosm/).

2^3021377-1: Largest Prime
Distributed computing is making an impact on other areas of pure science as well. Roland Clarkson, a student at California State University Dominguez Hills, discovered the 37th Mersenne prime number early in 1998. The new prime number is 909,526 digits long, and was discovered using a 200-MHz Pentium computer part-time for 46 days (Mersenne primes have the form 2^n – 1). Clarkson is one of more than 4000 worldwide volunteers participating in the Great Internet Mersenne Prime Search (GIMPS). The GIMPS project was started in January 1996 and has discovered 10 primes to date using Windows-based software.

Recognizing that prime numbers don’t have the widespread appeal of extraterrestrial life, the Electronic Frontier Foundation is offering a $100,000 award to the first person to discover a 10 million-digit prime number. Tempering enthusiasm for this work is the warning that a 500-MHz PC will take a full year to test a single Mersenne exponent with only 1 chance in 250,000 of finding such a prime number. The details of how GIMPS will handle this award can be found at www.mersenne.org/prize.htm.

Distributed Answers
There’s scarcely an area of science that doesn’t need more computing power, so distributed computing is spreading rapidly into new areas. Here are just a few examples:

Code Breaking. Encryption is a favorite topic at www.distributed.net. Working on the RC5-64 protocol, its assembled PCs can try out code keys at 105.11 gigakeys per second. This phenomenal rate is the main reason why this site, in January 2000, found the winning key in the CS-Cipher encryption contest.

Climate Dynamics. The site www.climate-dynamics.rl.ac.uk/ is devoted to distributed computing of atmospheric effects and is operated by the Rutherford Appleton Laboratory (RAL). Climate dynamics research at RAL focuses on the use of global observing systems to study the climate system as a single entity. RAL is specifically interested in the use of observational data to evaluate climate models, such as those that simulate climate change from natural and human factors. The distributed program Casino-21 lets users model the ocean–atmosphere interface and near-surface ocean using remote sensing data.

AIDS Research. To accelerate the discovery of drugs to fight HIV and AIDS, Entropia, Inc., and the Olson laboratory at the Scripps Research Institute, have partnered to offer millions of people a way to join the fight through distributed computing. Through Entropia’s FightAIDS@Home project (www.fightaidsathome.org), Scripps scientists will have a computational system to model the evolution of drug resistance, a major problem in AIDS research.

The AIDS virus is a particularly difficult target because of its ability to mutate over time. The virus has a “sloppy” way of copying its genome from one generation to the next, leading to a large number of viruses with slightly different characteristics. As a result, work in the Olson laboratory has focused on computational approaches to model the evolution of drug resistance and design more robust drugs. This work places greater demands on computational drug docking because trial drugs must be docked with not one but potentially millions of variants of the viral proteins. By using Entropia’s distributed Internet computing grid, the Scripps researchers can process calculations on these millions of variants.

Artificial Intelligence Research. A very curious distributed project is available at http://golem03.cs-i.brandeis.edu/download.html. The Golem@Home Project is an artificial intelligence experiment that creates (once again, displayed as a screensaver) the evolution of mechanical structures called “creatures”. When Golem is activated, it evolves bodies and brains of electromechanical robots and animates some on-screen. Occasionally, if a network connection is available, one or a few evolved creatures might migrate from one user’s computer to another Golem screen saver that happens to be active on the net. Creatures born on a user’s computer are copyrighted to that user. To minimize human intervention in this experiment, users have little control over the evolution of these creatures. They will evolve autonomously.

So, Where’s Chemistry?
Although quantum chemistry was one of the first important areas in the development of distributed computing, for several reasons it has lagged in the “@Home” style of computing. First, there are many more devotees of extraterrestrial life than there are fans of self-consistent field or MINDO calculations. Quantum calculations can’t realistically expect millions of volunteers. Second, the majority of distributed computing applications in chemistry have a proprietary aspect and are deliberately kept off the Internet for reasons of corporate security. But FightAids@Home and Folding@Home show that biochemistry has found a niche, and other chemical applications will likely follow, lured by the promise of near-infinite computing power.

 


Charles Seiter, Ph.D., is a former chemistry professor who joined the personal computer revolution in the 1980s in northern California. He has designed various scientific software applications and has written 20 books on computing, and he contributes regularly to PC World and Macworld. Comments and questions for the author can be addressed to the Editorial Office by e-mail at tcaw@acs.org, by fax at 202-776-8166 or by post at 1155 16th Street, NW; Washington, DC 20036.

Return to Top || Table of Contents

   
 CASChemPortChemCenterPubs Page