The entire data processing world breaks down into two mutually exclusive camps: those who do the work and those who talk about it. The first group knows, at least for the present time, it's all about 1s and 0s–ons and offs–and the rest is window dressing. The second camp, composed of IT journalists, consultants, and some corporate IT types, knows it's all about the buzz. They are word meisters who love to use three-letter acronyms and spin yarns about new paradigms. From this group we get terminologies such as service-oriented architecture, n-tier architecture, governance, and taxonomy. They love to co-opt words and terms from other disciplines and give them new life in the IT world. They can speak for 20 minutes about absolutely nothing and still blow the socks off the board of directors. Every so often, though, one of their buzzwords actually makes some sense and deserves a second look. This month we'll take a look at taxonomies.

For me, the word taxonomy conjures up memories of a biology class where life forms were classified according to their domain, kingdom, phylum (or is it phyla?) class, order, family genus, and species. They were given a Latin name based on the preceding classifications and then forevermore were assigned their place in the grand scheme of life.

If we loosely define a taxonomy as a hierarchical structure that can be used to classify or categorize things, then the scientific organization of organisms described above is undoubtedly a solid, well-designed taxonomy. The problem is even though it may be well designed, it's not particularly useful unless your name is Linnaeus. Even when I was immersed in this stuff, it was all Greek to me . . . or should I say Latin?

From an IT perspective, a taxonomy should be considered as a means of classifying documents and data so they readily are available for human use or consumption. There are initiatives that address ways in which data can be optimized for use and access by other computers. The Semantic Web popularized by Tim Berners-Lee of the World Wide Web Consortium (W3C) is the current leading paradigm on that front.

When we speak of taxonomies in this article, we are thinking about classifying the gigabytes or terabytes of data that keep the enterprise running. That data may be a claims form, an employment contract, or a marketing piece intended for use by a potential customer. Whatever it may be, it is of little or no use unless it can be accessed and delivered–and that is where taxonomies and some attendant concepts can help us.

Let's consider an early attempt to classify documents–the Dewey Decimal System. I am sure we all have paid our dues learning how to find information in the school library using the classification system created by Melvil Dewey in the late 19th century. The system enabled arranging books on library shelves in a structured manner that allowed any book to be located readily once its number was known (found in the card catalog).

It provided the added benefit of allowing subject "browsing"–I believe science was in the 500s. Once again, a very nice taxonomy but ultimately not very useful if you discount the value of browsing. I suppose in a smallish library knowing a top-level category may be sufficient for a user to find a book, but for any decent-size library, the card catalog was the way to go.

What real value did the card catalog provide? It offered a key to Dewey's essentially arcane numbering system. The call number was the key to a book's location on the shelves, but the card catalog supplied the link to that call number.

The card catalog was only as good as the librarian. If entries were made just for author, title, and subject, then you may not find what you are looking for. However, if the librarian was really good, then that book on amateur rocketry also may have been listed under "satellites," "telemetry," "solid rocket fuel," "ways to blow up your bedroom," etc. Do you see what we are talking about? Metadata–but in the case of the old school library, the metadata was created separately from the document itself and thus was not permanently attached to it.

Where is all this leading? Most businesses in North America are service industries–virtually all manufacturing has been offshored and outsourced long ago. We no longer create goods in this culture, so we chase the holy dollar by providing services, and those services are totally dependent upon information. Ready access to that information is critical to ensure the success or failure of our enterprise. Providing that access efficiently is dependent upon the following:

o A system that ensures (1) a single "gold" copy of any document is maintained and (2) that specific document or a copy of it is the one that is accessible.

o A taxonomy exists that provides an intuitive and unambiguous way of accessing that document.

o Sufficient metadata is attached to that document so it can be easily located.

The first requirement implies the existence of some sort of content or document management system. There are dozens of systems available that allow an organization to "manage its documents," although consolidation in the industry is reducing the real number of solutions. These range from essential free server add-ons to systems whose costs easily can run into six figures and beyond.

I am not going to discuss document management systems here beyond saying some sort of system is needed. It may be as simple as a shared resource on the network, but it is critical any organization "knows" where the real instance of any particular document resides.

Your taxonomy is a hierarchical data structure designed in such a way users easily can find the document they require. The taxonomy itself is based upon a controlled vocabulary. Language is by its very nature ambiguous–many words and phrases have multiple meanings and implications. A good taxonomy must be based on a carefully defined vocabulary that strives to eliminate ambiguity. This probably will necessitate a good thesaurus that will point users of ambiguous terms to your defined term.

Taxonomies should be based upon natural ways of grouping data. We already use taxonomies even if we don't use the term. People tend to sort their inbound e-mail in different ways. Some have folders for every individual they exchange e-mail with; some sort things based upon the corporate organization chart; some may classify their e-mail based upon projects or some other criteria. So, we are organizers and classifiers by nature, but that does not imply we are efficient in our methodologies. In fact, the examples above show we can't seem to agree even on effective categorization.

Taxonomies are not and should not be written in stone. For any given organization, there should be multiple taxonomies. Documents designed for internal use and made available on a corporate intranet can be expected to conform to a rigid taxonomy driven by principles of corporate governance. Employees can be expected to conform to corporate standards and made to understand the use of the corporate vocabulary. A producer Web site can be expected to have a clear-cut and unambiguous meaning within the context of an insurance company. It may have an entirely different connotation in Hollywood or to the general public.

Internal taxonomies probably are best created by consensus among the various stakeholders in the business process. That does not mean we simply form committees and set about designing better camels. Information architecture should be mandated from above (CEO/CIO) and then created collaboratively while being strongly controlled by an information architect. It essentially is a matter of corporate governance and should be treated as such.

Equally important are the taxonomies we create for our outward-facing Web sites. We must remember we are playing in a different field here, though. You cannot and should not attempt to mandate vocabulary or taxonomy to your customer. It is the behavior of the customer that mandates the taxonomy for an outward-facing Web site, not the corporation–that is, if you want to retain your customer. Don't let that sound intimidating. Most Web sites have elements of a taxonomy customers understand. Consider the "About Us" tab on most Web sites. Users know when they click there, they will find things such as corporate history, maybe a mission statement, contacts, perhaps biographies, and Photoshop-enhanced images of the executive team–the World Wide Web equivalent of vanity license plates.

But there are lots of areas of a Web site that are not so straightforward. You need first to consider what you want your customer to see and experience. This is information that probably is going to come from the marketing or sales groups and not some techie or VP with a preconceived notion of what the customer should be exposed to. Then you need to devise a Web taxonomy based upon what your customers actually do when they click about your Web sites. If your goal is for your customers to ask for more information and average users slog through five pages before they find it, then you have a problem. When considering outward-facing taxonomies, think in terms of a very simple folder structure and create that folder structure in a way that makes sense to your users and gets them where you want them to be. Learn by studying actual user behavior. We have a plethora of Web analytic tools available. Use them.

Which brings us to our final consideration–metadata. Remember metadata provides us with the functionality of the card catalog, only in the electronic world the card catalog is supercharged. Metadata gives us the ability to deliver useful results from search. If all you are working with is a document collection, then all a search is going to provide is a word search, and that has many limitations. For any particular search there is a limited number of preferred documents that should be returned. A brute-force word search offers no mechanism for ensuring the "correct" documents are returned by the search.

Search results can be controlled easily, but they must be controlled by human intervention. That intervention probably is a combination of entered metadata and weighted search results. I often hear business stakeholders ask for intelligent search, and I also often hear sales people tell me their tool will provide intelligent search. But remember I am one of the guys who know it truly is just 1s and 0s. Intelligent search really means nothing more than "return the results I want"–and that can be accomplished but only through human intervention.

So, the next time you are stuck in an endless meeting with some spin jockey bandying about a bunch of seemingly meaningless terms, take solace in the fact you, too, can use taxonomy and governance in a way that really does make sense–and in a way that really will provide value to your organization.

Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader

Your access to unlimited PropertyCasualty360 content isn’t changing.
Once you are an ALM digital member, you’ll receive:

  • Breaking insurance news and analysis, on-site and via our newsletters and custom alerts
  • Weekly Insurance Speak podcast featuring exclusive interviews with industry leaders
  • Educational webcasts, white papers, and ebooks from industry thought leaders
  • Critical converage of the employee benefits and financial advisory markets on our other ALM sites, BenefitsPRO and ThinkAdvisor
NOT FOR REPRINT

© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.