Nearly two decades ago, on Christmas Day 1990, Tim Berners-Lee "invented" the World Wide Web when he connected two computers using HTTP via the Internet. I am not quite sure where Al Gore was that day, but at that point, the Internet was in place. Now, 18 years later, the World Wide Web consists of some 15 billion pages. Considering there are only something like 6.8 billion people on the planet–most of whom have no access to the World Wide Web–that is a truly astonishing number.
A couple of years ago, Sir Berners-Lee joined forces with Nigel Shadbolt–professor of artificial intelligence at the University of Southampton–to try to understand the phenomenal growth of the Web and find ways to control and use the Web in meaningful ways. Together, they and their respective universities created a new discipline they call Web science. "The Web Science Research Initiative will allow researchers to take the Web seriously as an object of scientific inquiry, with the goal of helping to foster the Web's growth and fulfill its great potential as a powerful tool for humanity," according to an MIT press release dated Nov. 2, 2006.
The notion the Web can be studied as an object of scientific inquiry is not new. Google and other search providers have been doing that for years. In fact, the amount of time and effort spent developing search and relevancy algorithms is phenomenal. What is new is the idea the Web can or should be controlled and directed to "fulfill its potential." The seemingly random growth of the Web is what makes it so interesting and attractive. The very randomness of content displayed on the Web is what necessitated the development of search and ranking engines in the first place. In fact, one could argue, search engines have made the Web what it is today.
If a Site Has No Links, Does It Exist?
Imagine a Web site created by a hermitlike misanthrope. The site itself has a complicated, obfuscated domain name. The site links to no other sites, and the site creator has no friends or colleagues with whom he will share the URL. The content of the site is irrelevant–it could be anything from drug-addled, demented ramblings to the secret transcripts of the Warren Commission. If no other sites link to it, does the site even exist? Of course, it exists in the same sense a tree falling in the deserted forest creates a sound, but in the context of the World Wide Web, it does not exist if it is not accessed.
However, even without links in or links out, one of the search robots will find that site and index it. Then other search engines will crawl it and index it, and soon our stand-alone site becomes part of the Web. If it really does contain the lost transcripts from the Warren Commission, it may even rise to the top of searches for things such as "conspiracy theory" or "JFK assassination." If it is nonsense, then it will remain obscure and unused. The point here is without search engines and Web bots, many pages on the Web would exist solely for the use of their owner–kind of like a secret diary hidden under my bed.
One of the real issues we run into when we start trying to "study" the Web is the fact there is no single type of Web page. In fact, the only thread that runs through the whole of it is HTML (and other things) is transported over HTTP. The difference between a business site and a personal blog is so great it makes little sense to speak of them as part of a unified "Web." The Web is defined by where it exists and how it is accessed, not by what it does. A bank in a strip mall provides the same services as an online banking site. A conversation among friends over a cup of coffee provides the same information as a discussion group. Amazon is certainly one of the great "success" stories in Web history, but it does not provide any real difference in functionality than a brick-and-mortar bookstore. So here, we have one fact about the Web:
Large parts of the World Wide Web serve only as an alternative delivery mechanism for processes or functions that already exist.
Online shopping and business sites are just a small part of the Web. They generate the most revenue, and they certainly have made it easier to do things, but beyond that, they have not really done anything unique. People who used to catalog shop now shop online. People who used to be visited by an insurance agent to service their policy now do it online–or by phone. The interesting thing is while the use of online goods and services sites grows, brick-and-mortar sites also continue to grow. In most cases, business sites are an adjunct to a real business. There are, of course, questionable, quasi-legal, and unpleasant businesses that exist only because of the Web. While they may merit study in Web science, I am not going to discuss them here.
It's Me!
Social networking is all the rage these days. E-mail and instant messaging have morphed into phenomena such as Facebook and YouTube. Blogging has reached epic proportions. One attends a conference and instead of receiving a handout with reference material from the speakers, we are encouraged to visit their blog for more information. Hundreds of millions of blogs exist, and they are all crawled and linked and rated. To what end? Humankind seems to have some innate need to speak out and say, "Here I am." Services such as twitter allow individuals to share the most boring and mundane moments of their lives with us.
Technology has provided us with incredibly easy ways to communicate, and for some reason we find a need to communicate more and more. Consider the cell phone. Why are people always talking or texting on their phone? Are they exchanging useful information, or are they just talking to hear themselves talk? I do not believe the constant chatting that surrounds us is all that useful. Moreover, I do not believe enabling technologies such as e-mail ultimately make us more productive. The value of every useful e-mail I receive is diminished by the need to filter through all the junk. Even in the workplace where e-mail is supposed to be used for work-related purposes, we need to put rules on our inbox to look at items only where we are the "to:" addressee and maybe find time to wade through the rest on the weekend.
Social networking, while extremely popular, provides little real value.
What about all those "other" Web sites out there? The sites that just provide information. Tens of thousands of sites on how to tie a fly or how to make an omelet or how to get the best seat at the best price on an airliner. Those information sites are the heart of the semantic Web–and those informational sites also are the core of the real value of the Web.
Take away the business use of the Web, take away the ability to communicate using the Web, and what we are left with is billions of pages of information, which is unsorted, unclassified, and unverified. Some of that information is valuable. Software development would slow to a snail's pace overnight if all the online code samples and information would disappear. Some of the information is worthless. There exist Web sites that expound every baseless and senseless idea imaginable. That is the first problem with information gathered on the Web. How do you know the information you are viewing is correct? What constitutes an authoritative source on the Web? Certainly not Wikipedia. While it is very useful and the very epitome of what a Wiki can be, it certainly is not authoritative.
There currently is no common standard for good information on the Web. If there were such a standard, who would enforce it? The proliferation of information available has made it increasingly difficult to separate the wheat from the chaff. The Internet provides so much raw information users are unable to distinguish information from knowledge.
I fear the Web is producing a population of undiscerning consumers of information. I recently had a conversation with an individual who was espousing a certain pop philosophy that in essence says we control our destiny by thinking positive thoughts. She asked me if I understood quantum physics. I stated I had a reasonable understanding of quantum mechanics. Her reply was, "Good. So now you understand what I am talking about." The fact her belief had absolutely nothing to do with quantum mechanics totally escaped her. She was firmly convinced quantum mechanics had something to do with mind control. She read it on the Internet. Pseudo-science and useless information on the Web are both rampant and dangerous.
We need a way to "rate" the value of information provided on the Web.
I spend a lot time helping organizations build company intranet sites. One of the things I encourage is the use of metadata–information attached to other information that can help to classify and identify that data. Properly applied metadata makes it very easy to "find" what you want your users to find. Notice how that was phrased–the metadata is applied with the goal of making information readily available to the user. The owner of the data provides the ancillary information to classify that data.
Right now, the World Wide Web is like the wild, wild West. Millions of terabytes of data exist on the Web with no classification, no organization, and no way to get to that data with the exception of search engines. We are desperately in need of some easy-to-use, extensible system to classify data on the Web so that it can be located using some sort of structured query. One such system exists. Part of a possible semantic Web framework involves Resource Description Framework (RDF), which is layered on top of basic HTML and consists of triples containing a subject, a predicate, and an object. The concept is very simple. "Babe Ruth" belongs to (is) baseball players . There currently exist Web query tools that allow searches using the principles of RDF.
This is a good start, but we need to devise other methods of properly classifying all that information. Imagine a library without a system for classifying and labeling books. Now, imagine the books just dumped randomly into a football stadium. Now, create an algorithm to find any particular bit of information in that stadium.
We need a way to apply meaningful, actionable metadata to the information available on the Web.
I am reminded of the "Library of Babel"–a short story by Jorge Luis Borges. The library consisted of an infinite number of volumes with alphabet symbols printed in every possible combination and permutation. All of humankind's knowledge and literature is contained somewhere in this library. The problem is there is no way to discover where it is. Nor is there a way to separate "good stuff" from nonsense. This actually is what we are facing when we address the problems of the World Wide Web. It contains an immense amount of information. Our challenge is how to get useful knowledge from that wealth of information. Let's hope Web science can provide us with the solution.
Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader
Your access to unlimited PropertyCasualty360 content isn’t changing.
Once you are an ALM digital member, you’ll receive:
- Breaking insurance news and analysis, on-site and via our newsletters and custom alerts
- Weekly Insurance Speak podcast featuring exclusive interviews with industry leaders
- Educational webcasts, white papers, and ebooks from industry thought leaders
- Critical converage of the employee benefits and financial advisory markets on our other ALM sites, BenefitsPRO and ThinkAdvisor
Already have an account? Sign In Now
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.