FEB 17, 2012 8:51am ET

Related Links

ACORD, XBRL Seek Business Data Standards
May 16, 2012
SAP Visualizes Next Steps with Analytics, HANA
May 16, 2012
Gartner Lists 10 Disruptive Technologies for Business Information Management
May 16, 2012

Web Seminars

Data Discovery for Big Insights
May 17, 2012
The Big Deal About Big Data Governance
May 22, 2012
Treating Big Data Performance Woes with the Data Replication Cure
May 23, 2012
column

The Celestial Emporium of Benevolent Knowledge

Print
Reprints
Email

Taxonomies are everywhere in information management, but they are hardly ever formally acknowledged and managed.

Most code tables (lookup tables) contain taxonomies - perhaps even the overwhelming majority of them. Yet we seem to have no consistent guidelines on how to design the taxonomies that will be used to populate these tables.

Also poorly understood is the difference between those who create taxonomies and those who have to actually use them. A case in point: Anyone confronted with filling in the average government form will likely come across a question where a choice must be selected from a list of alternatives that are difficult to interpret. No doubt the choices were intelligible to the designers of the form, but that is no guarantee that they can be understood by anyone obliged to complete the form.

Semantic Humor

It is unfortunate that little is taught about taxonomy, or many other aspects of traditional logic, in the educational systems of the West. However, it is possible to find some interesting articles scattered across the literature. One of them is by Jorge Luis Borges, entitled “The Analytical Language of John Wilkins”. In this piece Borges claims to produce a taxonomy from a Chinese encyclopedia called “The Celestial Emporium of Benevolent Knowledge.” This is a hoax - no such encyclopedia ever existed. However, the taxonomy that Borges produces is rather interesting. It purports to classify animals as listed in Figure 1. (Please note that the original essay was in Spanish and several variants of the taxonomy exist in English. The table is my translation from the Spanish.)

What's Wrong with This?

Borges' taxonomy seems rather exotic and has sparked a lot discussion in academic circles. Much of this centers on comparisons of Western and Non-Western thought. For instance, Michel Foucault is quoted by Wikipedia as describing his reaction to the taxonomy “...that shattered, as I read the passage, all the familiar landmarks of thought — our thought, the thought that bears the stamp of our age and our geography..."

I disagree. If the academics who made such comments about Borges' taxonomy had bothered to look into a few of the code tables implemented in databases right here in the U.S., they would have discovered examples of taxonomies at least as bizarre. And they would have been told by the creators of these taxonomies just how logical and necessary they were. I remember one code table I reviewed that was for customer credit level and had entries for "Gold," "Silver," "Bronze," "Employee," and "Suspended." How could "Employee" and "Suspended" be in such a table? But they were, and program logic was built around them.

Even in publicly available taxonomies we can find issues that echo what we see in Borges. For instance, in Figure 2 is a classification of Financial Instrument extracted from the venerable International Monetary Fund.

What is a Taxonomy?

If we are to criticize taxonomies - including what Borges produced - we better know what a taxonomy is. This is difficult to find out because, as noted above, the subject that deals with taxonomies, traditional logic, is hardly taught in the West anymore. However, if we look into texts of traditional logic, we find that taxonomies are formed by breaking a generic concept (a "genus") into more specific concepts ("species") that compose it. There are some rules about how this should be done.

For instance, "the basis of division must remain constant," and "the species must exhaust the genus." This is not the place to get into a treatise on traditional taxonomies, but rather to note that there is literature about how they should be created and governed.

However, beside the traditional top-down method of forming taxonomies (properly called "logical division"), there is also the bottom-up process of classification. This can be used to group any number of objects according to any need we have. For instance, the 10 things I would take out of my house if it were on fire have no commonality among them as such (e.g., the cat, my children's photographs, my PC, etc.) other than my purpose to prevent them being destroyed by fire. In classification we are not bound by the same rules as logical division. Rather, there is some common way in which we deal with the objects in a classification.

Now, Borges' taxonomy does not make any sense from the top-down perspective. However, it might be more allowable as a classification if some purpose could be found for it. And perhaps that is what Borges is hinting at - not that the universe is so constituted that animals fall into his classification, but that some people have a reason (admittedly unknown) for grouping animals in this way. If we could find the reason, we would understand the taxonomy.

Unfortunately, in information management, taxonomies often seem to be little more than grab bags of concepts thrown together, neither dividing up a general concept, nor aggregating distinct concepts for a specific purpose. Hopefully, as semantics progresses, we will see a lot more clarity brought to bear in this area.

Malcolm Chisholm, Ph.D. has over 25 years of experience in enterprise information management and data management and has worked in a wide range of sectors. He specializes in setting up and developing enterprise information management units, master data management, and business rules. His experience includes the financial, manufacturing, government, and pharmaceutical industries. He is the author of How to Build a Business Rules Engine and Managing Reference Data in Enterprise Databases and Definition in Information Management. He writes numerous articles and is a frequent presenter on these topics at industry events. Chisholm runs the websites http://www.bizrulesengine.com, http://www.refdataportal.com and http://www.data-definition.com. Chisholm is the winner of the 2011 DAMA International Achievement Award.

Advertisement

Comments (8)
Malcolm, this article should be required reading for developers, data modellers and yes taxonomists everywhere. If you need another example of the disconnect between what, for example, a business understands and a modeller delivers look at 'types'. The carelessness and utter disregard for business semantics that people display while employing it to describe a domain of discourse is destructive at best.

Classification happens everywhere in the digital world - no exceptions. What the "classifiers" fail to understand is the semantic disconnects they create (and ambiguity, redundancy and generally higher level of cost and risk) when they haphazardly toss objects, entities and relations into a bucket without thinking.

If businesses want to know the primary contributory factor to our current digital landfills they can read this article.

Kudos!

Posted by John O | Friday, February 17 2012 at 11:37AM ET
My son is a history scholar (gaining stature: won this year's AHA award for best undergraduate paper), a student of Foucault, and was familiar with the Borges essay as well. Isn't often he and I get to discuss each other's disciplines in a common forum! Your article provided a unique opportunity to deepen the father-son bond, as well as to key him into the possibility that interesting opportunities might be available to him in IT if he can't make a living in academia (a nagging concern of his parents). Thanks...
Posted by | Friday, February 17 2012 at 11:50AM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.