I remember when libraries prided themselves on their information desk. You could call up the local public library and ask, say, "What is the maximum air speed velocity of an African swallow?" and someone would rush to provide you with a definitive answer. As school children we spent countless hours looking up "facts" in the reference section of the library.
We all knew the World Book Encyclopedia was easier to read, but if you wanted the absolute truth, you had to use the Encyclopaedia Britannica. We may have been na?ve, but there was a certainty about the printed word. Not total brainless acceptance, but if it was in the reference section of the library, then it probably spoke the truth. In fact, most nonfiction books could be expected to have a modicum of veracity about them.
There were, of course, the True Ghost Story and Strange Tales books with chapters such as "The Mystery of the Flatwoods Monster," but everyone knew they were not the sort of thing you wrote a book report on. The really wacky stuff was published by small specialty houses, and you knew to distrust them. Vanity presses were the venue of last resort for the truly bizarre. In general, though, a work of nonfiction from a "reputable" publishing house would be expected to be what it was called–nonfiction.
I am not sure when all that changed–maybe it was the many books about John F. Kennedy's death after the unbelievable Warren Commission report. By the mid-'80s, nonfiction came to mean just about whatever you wanted it to. About 1984, I remember picking up a book in Heathrow in preparation for a long flight home. That book was called Holy Blood, Holy Grail.
I always had been interested in medieval history and in the Grail legend in particular. I thought I would be reading a sociological or historical tract. What I was amazed to discover was the authors had assembled some 2,000 years' worth of rumor and legend, threw in a modicum of history, mixed it up with various religious cults, and presented what pretty much amounted to a new history of Christ, the Christian Church, the Crusades, and just about everything else. The book was fascinating and fun to read, but it definitely wasn't "fact"–at best it could be called "theory." I'm not suggesting that particular book was a watershed event, but at least it caused me to rethink the way I treated the written word.
Interestingly enough, the authors of Holy Blood now are engaged in a lawsuit with the author of the puzzlingly popular book The Da Vinci Code. They claim Dan Brown "stole" a lot of his story from their book. What makes that really interesting is if the Holy Blood theories were fact, then Brown couldn't be accused of stealing anything. I would argue the truth is public domain.
Your Point, Sir?
The World Wide Web has become the de facto library for the world. It certainly has replaced the information desk at the public library. The first place I go for any sort of information is the Web. What began as a fail-safe network designed for the transfer and exchange of information has become a large collection of documents or Web pages, all of which contain some sort of information. How we process and use that information will determine the ultimate utility of the Web.
The Argentine writer Jorge Luis Borges wrote a short story called The Library of Babel. He describes an infinite library containing an infinite number of books. Each book contains 410 pages, each page 40 lines, and each line 80 characters, created from 25 different orthographic characters with some punctuation and spaces. Every possible combination of those characters occurs in the library. Thus, the entire sum of language-based knowledge is contained in the library hidden among an infinite amount of nonsense–as in an entire book consisting of the symbol M. The problem becomes finding the "real" knowledge in the midst of all the nonsense.
The Web is a lot like that library. Imagine exabytes (260) of data that may or may not be useful or have any veracity. How do we get information from the Web library? Let's assume I desperately need accurate information on how the distribution for a business continuation life insurance policy is taxed. If I enter the previous 10 words into Google, I find some eight million documents that may provide that information.
How do I know which information to trust? There certainly exists tons of wrong or inaccurate information on the Web regarding U.S. income tax policies. Perhaps you have subscribed to an online library service such as the National Underwriter's Tax Facts (www.taxfactsonline.com). If so, you probably will use that resource for your information. But what if you really don't know where to turn? There is no implicit guarantee any information you find on the Web is correct or accurate. At best you must believe–you must take a leap of faith and assume information published by certain resources is reliable. The question remains, though, how do you first establish that trust?
We generally are not bothered by questions of trust or accuracy as we use the Web. I can plan my entire life using the Web–charting driving routes, making airline and hotel reservations, buying auto insurance, looking up Uncle Bob's phone number. Doing things is what the Web is good at right now. The Web has replaced the telephone and the personal visit for many mundane tasks. I personally don't set foot in a shopping mall unless accompanied by grandchildren. I can purchase everything I require online–OK, I do go to the grocery store. Webvan never made it to my neck of the woods, and even if it did, it doesn't exist anymore anyway.
Doing stuff isn't interesting, though, information is. Let's not forget the whole computer thing started with "information theory" as in Claude Shannon's seminal work on the subject. An interesting initiative along these lines is the Semantic Web. Let's use the Wikipedia definition as an introduction. (This is an example of trust–Wikipedia recently was brought to task for disseminating disinformation.) "The Semantic Web is a project that intends to create a universal medium for information exchange by giving meaning (semantics), in a manner understandable by machines, to the content of documents on the Web."
The Semantic Web is another brainchild of Tim Berners-Lee, who generally is credited with inventing the Web in the first place. (Do you find it curious double last names have a sophisticated connotation, while double first names–Billy Bob–do not?) Anyway, the key here is providing semantics that can be understood by machines. That necessitates including additional information or metadata to Web pages so a machine can determine action based on the metadata. Thus, in theory, I could set a machine (a computer) on a search for my question about business continuation insurance with the stipulation the results returned meet certain predetermined criteria. Those criteria would be such I then would "trust" the veracity of the data returned. Good idea if it works.
The minimum information a machine would need to have about a Web page to catalog it "semantically" is its location and some structured data about that page. Let's take a quick look at a couple of the current standards that will enable the Semantic Web.
Uniform Resource Identifiers
Uniform Resource Identifiers (URIs) are nothing more than a name for a Web resource. One form of URI we all know and love is the Uniform Resource Locator (URL). http://www.nationalunderwriter.com is a URL that both names a resource (a Web page) and also provides a pointer to that Web page. Please note a URI does not have to describe how to find a resource, although it may. There are no standard schemas for URIs as there are for URLs, but for the Web to work, URIs must have some standard format that is capable of being interpreted by a computer.
Resource Description Framework
Naming a resource is fine, but we need to add some meat to that name or it isn't going to do us much good. The Resource Descriptor Framework (RDF) is meta-data specifically designed to be processed by Semantic Web machines. An RDF is a tripartite "sentence" consisting of three URIs that normally are called the subject, the predicate, and the object. The actual RDF standard is XML based. The mechanics are irrelevant. What is important is in the Semantic Web you can attach an RDF to any Web document you choose, and the data contained in that RDF then is "published" on the Semantic Web. Clearly we have a long way to go before this becomes reality.
What Are You Looking For?
Search engines are "machine-based" technology and obviously have many limitations, one of which is the simple fact a machine cannot and does not "think" like a human. Machine search results more often than not are manipulated by the human touch one way or another.
We sell and serve up about 20 gigabytes of data at Highline Media (parent company of Tech Decisions). We have more discussion about search than any other thing. Often the discussion is something such as, Why isn't your search more like Google's? In fact, we do use Google search for most of our public Web pages. But what are users really saying when they want a "Google-like search"? Companies spend huge sums of money and resources on search engine optimization. Others spend their money on paid listings. If you see something above the fold in a search results page, you probably can be assured you have been spoon-fed that listing, either through optimization or payola. It doesn't necessarily mean you found the best possible results for your search.
Machines are stupid. If I type the term penguin books into a search engine, that tool does not really know what information I may want to find. If all I am looking for are some interesting books about penguins, then I may want to be directed to http://www.penguinbooks.com. On the other hand, if I want to check out the venerable publishing house, I need http://www.penguin.com.
A good search result is determined by users finding exactly what they were looking for in the first place . . . which probably means they had a preconceived notion of what that correct result would be . . . which in turn implies a human, who thinks like a human, is structuring results. As far as that 20 gigabytes of data we serve up through subscription services, we certainly have the ability to return above-the-fold searches based upon criteria we set. By manipulating metadata and search weighting, I can force any particular document to the top for a particular search term. If a user types in my name, I can move an article I am particularly fond of to the top of the list. Isn't that the end result of what the big search engines do anyway?
Metadata Schmetadata
There are a fairly limited number of ways to get users to use your data. Most of the ways to accomplish this require human intervention. You knowledgeably must create metadata or otherwise enhance your data to get your Web page "noticed." You may pay for listings or keyword referrals. You may spend all your money on marketing to drive users to your Web sites. At the end of the day, though, it is the perception users have of your company or its content that is going to drive them to your site. My list of "favorites" is pretty short, but those links are all to sites I have grown to trust–using my judgment, not some machine's.
Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader
Your access to unlimited PropertyCasualty360 content isn’t changing.
Once you are an ALM digital member, you’ll receive:
- Breaking insurance news and analysis, on-site and via our newsletters and custom alerts
- Weekly Insurance Speak podcast featuring exclusive interviews with industry leaders
- Educational webcasts, white papers, and ebooks from industry thought leaders
- Critical converage of the employee benefits and financial advisory markets on our other ALM sites, BenefitsPRO and ThinkAdvisor
Already have an account? Sign In Now
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.