Northedge, R. (2007). Google and beyond: Information retrieval on the World Wide Web. The Indexer 25:192-195.
What were the early days of the Internet like? How does its structure show how we find information our information today? Richard Northedge dives into these questions in his brief, but informative article about websites, the problems that come with a large internet, and search engines.
The beginning of search engines was in 1990, with a program called Archie. In 1991, a man named Tim Berners Lee created the World Wide Web that held general directories and search engines for the Internet. Northedge continues in explaining that in 1994, David Filo and Jerry Yang created the still-used search engine Yahoo! Directory. The original site was a simple directory of their own lists of websites, but quickly grew to be a large search engine. However, in 1998 Larry Page and Sergey Brin, who were recent graduates at Stanford came up with and launched Google, another search engine that eventually flew past Yahoo! in usage. Why did this happen? Clay Shirky, a web commentator, says that Yahoo! declined because it did not use enough human classification techniques. There are generally four guidelines that fall into this category:
- Indexing works best with a small collection of written texts on a particular subject (or, as Shirky uses, “corpus”)
- the text is “fixed and unchanging”
- formally defined categories, and probably the most difficult variable,
- the users and classifiers on the search engine have a shared vocabulary and knowledge of the subject in question.
Having these guidelines for search engines helps to see the need for organization and strict rules for categorizing information; it can also be said that the further away a search engine is from these rules or its domain, the worse off searching will be for humans on the Internet. But there lies the problem. Search engines at that time did not have those standards. How do we overcome this dilemma, then? The World Wide Web is cast with large collections, and humans are not that knowledgeable about every subject all the time. Northedge continues, saying that the Internet is essentially a huge library that houses all kinds of information, and a search engine is its librarian. How do we determine which search engine is best? What constitutes a good mode of extracting information? Like a good librarian, the quality of search engine is measured by the size of the collection, its speed at gaining and providing the information to the user, its availability, and finally, its accuracy. A good search engine surpasses a directory in service due to its unlimited availability as well as its open search option instead of a limited vocabulary index.
According to Northedge, what makes Google so exceptional is its program called “Googlebot” that analyzes and saves billions of web pages by searching through indexes. It is only one part of Google that does this, so it is similar to having multiple librarians working on different tasks that all help get the information to the user. How does Google give relevant information? Northedge mentions the google program PageRank created by Page and Brin, stating, “PageRank provides a way for Google to put more relevant pages at the top of its lists of search results. It makes use of thousands of individual human decisions without requiring direct human input into the relevancy calculations” (2007).
The way search engines are being used and are set up, however, are quickly changing. With more people using the internet for browsing and using “tagging”, it is easier than ever to find information. However, this open organization also lends itself to junk web pages and spam, which causes problems and clogs up the internet. Now researchers are trying to figure out how to limit the scope of the internet to only relevant information. Northedge predicts, “The search engines of the future will have computer-generated indexes, but the data contained in those indexes may well be driven by datasets produced by human indexing techniques and human linguistic research” (2007).
This article is so relevant to our needs in this day in age where almost all of our information come from the internet, or online databases. Northedge brings the example of search engines and relates them back to LIS that gives me an image I’ve never seen before in looking at the internet. Information specialists in our field are currently going through the same issues in trying to find a way to give better information to the users. I now feel I have a better understanding of the complicated world of the internet, a service which our generation so greatly takes for granted.