Learn something new every day More Info... by email
Controlled vocabulary is a concept in computer science and computer programming that involves using only previously agreed upon or approved terms when constructing relational databases, searchable metadata or other systems in which human-readable words are used to mark information for later retrieval. The methodology of using a controlled vocabulary to classify information is in direct contrast to the concept of natural language vocabulary, in which there are no agreed upon terms and all words that are used are instead connected by weighted relations. In addition to the top-level words that are used in a controlled vocabulary, supporting words can be used so synonyms or other terms that are strongly associated with the top-level term can trigger use of the top-level word. The main differences that are measured between natural language systems and controlled vocabulary systems are the relevance of the results of a query using the words, the volume of information returned, and the overall usability of the system.
There are many instances in which a collection of words or terms is used to make information that is arbitrary, constantly changing or disorganized more accessible to users. Search terms within an Internet search engine, a corporate information database, and even a digital research library are all examples of applications through which information can be categorized with metadata terms as opposed to a strict hierarchical structure. The words used to describe an object in such situations build a kind of searchable index of the larger pool of information.
One example of the use of controlled vocabulary can be seen when considering a filing system for a company. Files must be categorized in a way so they are easily and predictably retrievable. If one file deals with cars, then it could be filed under the category "cars". Should another person also have a file that deals with cars, without a controlled vocabulary, the file might be placed under the heading "automobiles", making the two files difficult to find with a single search. When the categories are controlled, then all files dealing with cars would be placed under a single agreed upon heading.
The benefit of using a controlled vocabulary is that information is strictly described in a predictable way. This means that anyone who is aware of the vocabulary will be able to effectively and accurately search for information. A complication with the vocabulary, however, is that the search terms are more difficult, if not impossible, to generate automatically and usually require some human intervention, making it a large task to convert existing databases to use a controlled vocabulary. If the vocabulary is not large enough, then there also is the possibility of a single query bringing up such a large volume of information that it becomes impractical to sort without the use of another querying method.