As technology advances, the scientific fields will be facing a “data avalanche” that the Internet will have a large role in dealing with, Jim Gray said in a recent on-campus presentation in front of a mix of graduate students and faculty.
“The Internet’s going to be flooded with scientific data,” said Gray, manager of Microsoft’s eScience Group. “How’s it going to work? What language will it be in?”
During the presentation, which is part of the PERNET Computer Science Graduate Seminar Series, Gray said the scientific community has produced almost a terabyte worth of scientific data this year, and it will soon approach a petabyte. These are huge amounts of data. In comparison, an 80-gigabyte Apple iPod can hold about 20,000 songs. If the iPod had a terabyte or petabyte capacity, it would be able to hold about 250,000 or 250 million songs respectively.
The newest branch of computation science will be one that Gray called X-info – X stands for a specific science, like "bio-info" for biology – which involves building the right software and algorithms for computers to effectively analyze and share increasingly larger amounts of data using the Internet.
Professor Dragutin Petkovic, chair of the computer science department at SF State, said he agreed with Gray’s ideas.
“He addresses a very important topic,” Petkovic said. “We collaborate with him on several projects that deal with X-info … and we train students in those projects.”
The skills Gray said he thought students should have when they’re finished with school seemed to catch the room by surprise.
“I got my Ph.D. in a year and a half, and it was the biggest mistake of my life,” he said with a grin. “Learn your core computer science stuff, but take philosophy. A lot of stuff companies will pay you to learn, but while you’re here you have to learn how to get along with people and how to play as a team.”
Gray said almost every field is bringing in an X-info department to help organize data. A good X-info team takes three kinds of computer scientists: those who are good with computer-human interface, those good with building the system (or “plumbers”), and those who are good at mining the large amounts of data and making it usable.
“And then you need a bullshitter like me to get those three groups together,” he said with a laugh.
One problem with computers today, Gray said, was that they are good at finding needles in haystacks, but lousy at finding haystacks. That means computers have problems with sorting through large amounts of data and finding patterns.
Another problem Gray mentioned was a lack of standard representation.
“How can I represent a galaxy on a computer? Or an ant?” he asked. “If it’s between you and me, we can talk, but how do I get a computer to communicate to another computer what I mean?”
The network infrastructure for the burgeoning amount of data needs to be built, Gray said, and one example of a step forward is the World-Wide Telescope project, also known as the Virtual Observatory.
As technology advances in the astrophysical community, the amount of raw data essentially doubles each year. The Virtual Observatory project – a collaboration between the astrophysical community, Microsoft, NASA and others – connects various worldwide telescopes’ data on the Internet. Professionals, hobbyists, and kids can use the Sloan Digital Sky Survey (http://cas.sdss.org/dr5/en/) to find out about millions of points in the universe.
Astronomy data is perfect for this type of project, Gray said, because there’s no commercial value to it, most of it is public, it’s going to constantly increase, and it’s a “nerd’s dream.”
Even with technology moving at warp speed, Gray said he thinks people should be able to slow down their lives.
“I’m a bit of a lunatic, but I believe that if you live until 2050, you’ll live forever. Medicine will be so advanced that it won’t let you die,” he said. “Go and wander for a year in Europe because you’re going to live forever, so pace yourself.”