Summary for Lecture #11- Rafferty

Rafferty, P. (2001). The representation of knowledge in library classification schemes. Knowledge Organization 28:180-91.

Pauline Rafferty, English lecturer and professor, begins her monster of an article by delving into the history of subject classifications, which were predominantly man centered.  In the old days (1800’s), these systems were mainly based on philosophical and positive concepts, where the process of science comes a more evolved man and system of information.  These ideas were meant to be universal; however, we now know, as Rafferty explains, that this view of classification simply isn’t the case.  These organizers and librarians always strove for the ideal order of things. But what is ideal? The answer is not a simple one.

In the late 19th and 20th centuries, classification systems were very much intertwined with philosophical ideals.  In 1876, Melvil Dewey produced the Dewey Decimal System, alongside with many other classification systems created by others.  During that time, it was assumed that there were general classification schemes, which meant that there was a specific order to things.  It began with the main, most important classes (philosophy, religion, science, etc.), and was based on the natural ebb and flow of function and rationality.  In other words, the order of things  was next to godliness.   Rafferty herself said it best, “The discourse of the classification schemes sets limits, rules, and regulations about what and how things can be referred to within libraries, and this has consequences in wider social terms because libraries are primary institutions of learning and of acculturation” (2001).  There was also an emphasis on notational language,  which is based on symbols and was assumed to be an international “language” understood in all nations.  The problem arises when many of the symbols meant have other meaning in other purposes in other languages.

In 1939, Henry Evelyn Bliss wrote The Organization of Knowledge in Libraries, which challenged the philosophy of the metaphysical based classifications set by Dewey.  Bliss believed that classifications should be based on “scientific and educational consensus” and should become increasingly more detailed.  This was a turning point in history- Dewey was now beginning to be criticized for being biased towards a culturally determined viewpoint.  Belgian bibliographers Paul Otlet and Henri La Fontaine developed a variant on the Dewey Decimal System called the Universal Decimal System.  It is much more complicated that DDC, not only in the spine, but in the indexes as well.  Otlet really wanted to find a way to make notational language universally understood.  He wanted to take the “fluff” out of the way and get to the bare bones of the text to categorize it correctly.

As always, technology poses new questions and challenges for the world of classification.  Hypertexts, which are links to other texts, audio files, and images, are used and are being developed more and more.  It was not until the very end of the article that I began to realize what Rafferty was leading up to, and it makes perfect sense.  Perhaps one way to look at classification, both in the library and the surrounding world, is to see that notations, subjects, works, and texts, are all related to each other, but removing a central, “most important” idea puts everything on a level playing field, where knowledge is easier to access and closer to being obtainable for all.

Summary for Lecture #10- Northedge

Northedge, R. (2007). Google and beyond: Information retrieval on the World Wide Web. The Indexer 25:192-195.

What were the early days of the Internet like?  How does its structure show how we find information our information today?  Richard Northedge dives into these questions in his brief, but informative article about websites, the problems that come with a large internet, and search engines.

The beginning of search engines was in 1990, with a program called Archie.  In 1991, a man named Tim Berners Lee created the World Wide Web that held general directories and search engines for the Internet.  Northedge continues in explaining that in 1994, David Filo and Jerry Yang created the still-used search engine Yahoo! Directory.  The original site was a simple directory of their own lists of websites, but quickly grew to be a large search engine.  However, in 1998 Larry Page and Sergey Brin, who were recent graduates at Stanford came up with and launched Google, another search engine that eventually flew past Yahoo! in usage.  Why did this happen?  Clay Shirky, a web commentator, says that Yahoo! declined because it did not use enough human classification techniques.  There are generally four guidelines that fall into this category:

  1. Indexing works best with a small collection of written texts on a particular subject (or, as Shirky uses, “corpus”)
  2. the text is “fixed and unchanging”
  3. formally defined categories, and probably the most difficult variable,
  4. the users and classifiers on the search engine have a shared vocabulary and knowledge of the subject in question.

Having these guidelines for search engines helps to see the need for organization and strict rules for categorizing information; it can also be said that the further away a search engine is from these rules or its domain, the worse off searching will be for humans on the Internet. But there lies the problem.  Search engines at that time did not have those standards.  How do we overcome this dilemma, then? The World Wide Web is cast with large collections, and humans are not that knowledgeable about every subject all the time.  Northedge continues, saying that the Internet is essentially a huge library that houses all kinds of information, and a search engine is its librarian.  How do we determine which search engine is best? What constitutes a good mode of extracting information?  Like a good librarian, the quality of search engine is measured by the size of the collection, its speed at gaining and providing the information to the user, its availability, and finally, its accuracy.  A good search engine surpasses a directory in service due to its unlimited availability as well as its open search option instead of a limited vocabulary index.

According to Northedge, what makes Google so exceptional is its program called “Googlebot” that analyzes and saves billions of web pages by searching through indexes.  It is only one part of Google that does this, so it is similar to having multiple librarians working on different tasks that all help get the information to the user.  How does Google give relevant information?  Northedge mentions the google program PageRank created by Page and Brin, stating, “PageRank provides a way for Google to put more relevant pages at the top of its lists of search results. It makes use of thousands of individual human decisions without requiring direct human input into the relevancy calculations” (2007).

The way search engines are being used and are set up, however, are quickly changing.  With more people using the internet for browsing and using “tagging”, it is easier than ever to find information. However, this open organization also lends itself to junk web pages and spam, which causes problems and clogs up the internet.  Now researchers are trying to figure out how to limit the scope of the internet to only relevant information.  Northedge predicts, “The search engines of the future will have computer-generated indexes, but the data contained in those indexes may well be driven by datasets produced by human indexing techniques and human linguistic research” (2007).

This article is so relevant to our needs in this day in age where almost all of our information come from the internet, or online databases.  Northedge brings the example of search engines and relates them back to LIS that gives me an image I’ve never seen before in looking at the internet. Information specialists in our field are currently going through the same issues in trying to find a way to give better information to the users.  I now feel I have a better understanding of the complicated world of the internet, a service which our generation so greatly takes for granted.