Analysis & Commentary:JOURNEY TO THE INTERNET'S UNKNOWN REGIONSAs reported by Tim McDonald, you heard it a couple of years ago: The World Wide Web is far deeper and more vast than previously imagined. That is terrific news. But where are all those fascinating, hidden pockets of information that are supposedly out there, and why can't we find them on Yahoo? "No one can argue with the fact the Web is huge and it continues to grow at an astronomical rate," Giga Information Group analyst Laura Ramos said. "Really, what people are dealing with is how to harvest the valuable information out there, because there is a lot of good stuff and there is a lot of junk," Ramos said. Billions More DocumentsExperts estimate that the "surface Web" contains 1 billion to 2 billion documents, while the "deep Web" could contain as many as 550 billion. Put another way, the surface Web contains about 19 terabytes of information, while the deep Web contains about 7,500 terabytes. A terabyte is a measure of data storage. One terabyte is the equivalent of about 1,600 CDs or 1,000 gigabytes. There are more than 200,000 deep Web sites, more than half of which are located in topic-specific databases. About 95 percent of information on the deep Web is available to the public and is not subject to subscription fees. Why Is It Hidden?Many surfers cannot find these sites, however, because each page generally is not linked to many other pages. Full-text search engines get their listings in one of two ways: Site developers can submit addresses to a search engine, asking to be indexed; or a search engine can use "spiders," which depend on links from existing sites to discover new ones. While there is a huge amount of information on the deep Web, much of it is valuable primarily to researchers, scholars and the merely curious, so it may have few, if any, links. Without such links, search engines can find such sites only by chance. Also, more and more information is being stored by governments, universities and corporations in monster databases. These databases cannot be accessed by conventional search engines, which identify "static" pages rather than the "dynamic" pages used by large databases. Information in such databases can be accessed only by a direct query. Theoretically, search engines create and maintain their own databases in an effort to index the entire Web. But even the biggest and best search engines can index only between one-third and one-half of all publicly available documents. Matching Users to EnginesThere are more than 3,000 search engines on the Web, and many of them claim to mine the deep Web. The Big Hub, for example, claims to have an index of more than 3,000 topic-specific, searchable databases in more than 300 categories. Beaucoup.com, which calls itself the "ultimate source of free information," has links to more than 2,500 databases and directories. The Document Delivery Service offers a search for services that retrieve publicly available documents, including government reports, patents and military information. And the WebSearch Alliance directory tries to help match users to the right tool amid the tangle of deep Web search engines. 'Ultimate Personal Search Tool?'Of course, there are other deep Web search tools that offer their services for a fee. Bright Planet, a South Dakota company that first effectively publicized the deep Web, has various products to help businesses find and retrieve relevant data. These products include Deep Query Manager, Deep Web Directory and Complete Planet, which features a free deep Web search site that will delve into 100,000 of the 200,000 deep Web databases. Bright Planet also has LexiBot, which searches among 2,200 deep Web databases and search engines simultaneously -- a number the company claims is more than double the specialty resources available in any other tool. "The LexiBot is the ultimate personal search tool," Bright Planet spokesperson Brian Bjerke said. "Two thousand two hundred deep Web databases and search engines is a huge leap forward in numbers. But [just] as important is the high quality of information retrieved." Enterprise Crossroads"I think we're at a very interesting point in the [enterprise] search market," Giga's Ramos said. "A year ago, it looked like search was going to be a kind of commodity, with a lot of e-commerce applications. Now, if you look at the way the market has changed, there is a new complexion here. I think we're on the verge of the second generation of search. "The first focused on, 'How do I get a handle on all the content that's out there, either inside my company or on the Internet?'" Ramos noted. "The second generation will focus on that, but more on context -- mainly, what are the users trying to do when they're searching, and how do I bring back more relevant information to them?" Semantic WebThere is also research ongoing related to the "semantic Web." Today's Web is basically a "publishing medium," a huge warehouse in which text and images are stored. Semantic Web proponents want to turn the Web into a more interactive place where information can he interpreted and exchanged, and where software agents roam from page to page, performing sophisticated tasks for users. Instead of merely displaying information on screens, computers will "understand" what they are displaying, according to supporters of the semantic Web. "Ultimately, we'll be able to utilize a series of helpers to help us manage our day-to-day activities and automate a lot of the things we do -- calendaring, coordination, resource discovery -- things like that," Eric Miller, head of the W3C Semantic Web's Activity effort at MIT, said. Buried TreasureSo, as it turns out, many of the Internet's unknown regions are accessible after all. Given a reasonable investment of time and patience, a curious surfer can discover new territory, especially with the help of sites that allow surfers to dive into the deep Web in search of buried treasure. But scouring the deep Web requires time and effort. In the end, many of the Web's useful pages may remain buried unless new search technologies are invented to unlock its isolated depths. |