Computer, Heal Thyself Download PDF
IBM is developing computer systems that monitor themselves and repair glitches as they arise—which could dramatically cut the cost of network maintenance.
Computer networks are fragile and temperamental creatures. They’re prone to unpredictable software glitches and mechanical failures. They’re vulnerable to traffic bottlenecks and hostile intrusions. They’re difficult to diagnose when things go wrong. They’re also extremely expensive, because it takes a lot of people to keep all those finicky machines humming. According to the Standish Group, tech department salaries account for as much as 45 percent of the total cost of running large computing clusters—the labyrinth of application servers, workstations, storage systems, and peripherals that lies at the heart of all networked businesses.
Researchers at IBM think there’s a better way to keep these systems running. “If the demand for IT management continues at the current rate, soon everyone will be a systems administrator,” jokes Robert Morris, director of IBM’s Almaden Research Center in San Jose. Morris believes that tedious, labor-intensive tasks such as updating software, modifying settings, formatting drives, recovering lost data, and optimizing network traffic should take place automatically, behind the scenes, in much the same way that the human autonomic nervous system monitors and adjusts the activity of the heart, lungs, and circulatory system without any conscious effort. With that idea in mind, IBM has launched an ambitious initiative to develop hardware and software systems that can take care of themselves.
“It’s a holistic approach,” says Morris, who is IBM’s leading evangelist for autonomic computing. “Systems will self-manage and self-repair, learning from mistakes, always aware of what can help them get things done.”
It will take IBM researchers several years to create fully automated computer systems. But autonomic technology is already moving out of the lab and into the real world. Marketed as Project eLiza—IBM’s brandspeak for autonomic computing products—several of IBM’s newest servers use autonomic systems to configure themselves on networks and order replacement parts when things go wrong.
For IBM, the human autonomic system is more than just a metaphor. In a fully realized autonomic computing setup, a complex network will take orders from a software “brain” that responds to changing conditions by following programmed directives and operational protocols. Software agents—the antibodies of a digital immune system—will watch for performance degradations and automatically rewrite corrupt code. Like wounds that heal by forming protective scabs, servers will draw from banks of spare parts to repair crashed storage systems, instinctively reinstalling files to shield them from harm. Inventory databases could anticipate market conditions and optimize themselves on the fly, ordering stock and adjusting prices as competition heats up— much as the human body’s “fight or flight” response increases heart rate and blood pressure during stressful situations.
Donna Dillenberger, a senior scientist at IBM, is developing an autonomic technology called enterprise workload management, or eWLM. This is networking software built around a set of general performance guidelines, which it follows independently, adapting as conditions change. “Say there’s a disaster and you want to make sure that victims can reach their insurance company before less critical customers,” Dillenberger explains. “In the case of a bottleneck, eWLM would automatically change the settings to improve quality of service.”
Autonomic technology will also appear in advanced hardware designs. One of the key building blocks will be a network of cheap, duplicative storage devices known as a RAID (redundant array of independent discs) system. When one disc in a RAID system malfunctions, the affected data is automatically transferred to a healthy backup disc.
A massive, 1,000-terabyte storage system known as CISB (collective intelligent storage bricks) takes this idea a step further by linking a number of RAID systems in what amounts to a self-maintaining storage network of storage networks. Each of these would tie together microprocessors, some memory, and a network of RAID systems. They could grow and adapt instantly, recognizing when new drives come online, formatting them, adding software, integrating them with the rest of the system, and rebuilding missing data—without calling on humans for help. IBM researchers hope to build a prototype CISB system by the end of the year; commercial versions could appear as early as 2004.
All well and good. But there are dangers associated with building proactive machines and computer networks that operate independently. HAL 9000, the murderous mainframe made famous in Stanley Kubrick’s classic film 2001: A Space Odyssey , is probably the best-known illustration of the downside of self-aware, self-maintaining computer systems. Yet such tales are not just the stuff of science fiction. “Remember the stock market crash of 1987?” asks Benjamin Kuipers, a professor of computer science at the University of Texas at Austin. In part, automated trading programs caused the 1987 crash when they began dumping shares as the market fell. Kuipers believes in the potential of autonomic computing, but he warns that “an eventual catastrophe” akin to Black Monday might prove difficult to foresee.
To minimize these risks, self-checks and safety protocols can be inserted into autonomic systems. And, of course, these systems will always require some human oversight.
That may be comforting news for IT workers and systems administrators who worry that autonomic computing could put them out of a job. If autonomic computing successfully reduces the number of human baby-sitters required to keep complex computer systems running, it may also make some IT professionals more valuable than ever before. Much like the rigorously trained workers who man the control booths in today’s automated steel mills, some IT specialists will handle higher-level tasks that encompass strategic planning and sales forecasting. “The IT worker of the future will need to be fairly broad and more system-oriented,” says IBM’s Morris. “They’ll have to get better at understanding business needs.” Autonomic computing may indeed reduce the overall cost associated with maintaining computer networks. But for many of the best IT professionals, it could also lead to promotions instead of pink slips.
Copyright © Michael Behar. All Rights Reserved.