Skip to content

What is a controlled vocabulary?

A controlled vocabulary is defined as "an organized arrangement of words and phrases used to index content and/or to retrieve content through browsing and searching." It is applicable across various industries that manage information, including academic research, libraries, corporations, and governmental organizations. Common forms of controlled vocabularies include term lists, authority files, and thesauri. Essentially, a controlled vocabulary functions as a structured database of preferred terms that consolidates all terms and phrases representing a concept, typically organized in alphabetical order. This systematic approach enhances the efficiency of information retrieval and ensures consistency in terminology usage across different contexts.

A controlled vocabulary designates preferred terms or phrases for use in surrogate records within retrieval tools, such as library catalogs, while non-preferred terms reference the chosen terms and establish relationships among them. Catalogers or indexers must select terms from this vocabulary when assigning subject headings in bibliographic records to accurately represent the subject matter. Controlled vocabularies facilitate the organization of knowledge for effective retrieval and are employed in various indexing schemes, thesauri, and taxonomies. Unlike natural language vocabularies, controlled vocabularies necessitate the use of predefined, authorized terms selected by the designers of the scheme, ensuring consistency and precision in information retrieval.

Controlled Vocabularies typically include: a term that functions as the preferred term, synonyms, disambiguation from homographs and homonyms, relationships among terms, and cross-references.

What is the purpose of a controlled vocabulary?

The primary objective of vocabulary control is to ensure consistent representation and effective searching of subject matter. This is achieved by managing synonymous terms, distinguishing between homographs and homonyms, and connecting related terms through broader, narrower, and related categories. A controlled vocabulary comprises authorized terms that enhance consistency and uniqueness in resource descriptions, commonly referred to as subject headings, descriptors, or index terms. These terms are systematically organized under a preferred term to facilitate collocation, while also establishing relationships among terms to create a structured network of concepts; for example, identifying that an astrocyte is a type of cell. Types of controlled vocabularies

Controlled vocabularies encompass a spectrum from fundamental term lists to complex machine-readable ontologies. Identifying several commonly utilized controlled vocabularies, along with an understanding of their types and scopes, proves advantageous when selecting appropriate vocabularies for diverse projects.

Term Lists

A term list, also referred to as a pick list, represents a fundamental type of controlled vocabulary comprising an agreed-upon collection of words or phrases utilized to identify specific characteristics of various entities. These lists exclude synonyms or related terms and are most effective when the number of terms is limited, as seen in the case of file formats or object types. Controlled vocabularies can manifest in various forms, with simple term lists typically arranged alphabetically or in a logically discernible order, without an emphasis on semantic relationships. Examples of simple term lists include alphabetical compilations of geographic areas or languages, which can be implemented as pull-down menus in cataloging systems to enhance user accessibility. It is noteworthy that simple term lists do not necessitate a hierarchical structure if they remain concise and intuitive for navigation.

Authority Files

The authority file exemplifies a sophisticated level of complexity in controlled vocabularies, providing a consistent list of terms for resource description while incorporating cross-references from variant or alternate terms. These files frequently contain contextual or biographical information to assist users in disambiguation, presenting preferred terms alongside alternate versions for metadata creators. Moreover, indexing alternate and variant terms within databases significantly enhances resource discoverability, even for users who may not be familiar with the correct terminology. Authority files are especially beneficial for identifying proper name forms.

Taxonomies

In recent years, the term "taxonomy" has evolved into a generic descriptor for controlled vocabulary, particularly within business contexts. From the perspective of Information Sciences, a taxonomy is defined as a hierarchical classification system in which all terms are integrated into a singular structure, characterized by parent/child or broader/narrower relationships. This system enables classification according to a predetermined framework. In contrast to authority files, which solely contain preferred and variant terms, taxonomies incorporate a hierarchy that specifies both broader and narrower terms.

Thesauri

A thesaurus serves as a specialized dictionary that categorizes concepts within a specific domain, assigning each concept a preferred term. It encompasses preferred terms, variant terms, as well as broader and narrower terms, in addition to related terms that may not conform to the same hierarchy. A notable example is the Getty Art & Architecture Thesaurus, which provides entries such as "cellular telephones" and includes alternate terms in various languages, definitions, and hierarchical classifications. This resource is invaluable for researchers and professionals seeking precise terminology in the fields of art and architecture.

Why not keywords?

Searching by keyword alone can result in various semantic challenges. Synonyms may complicate resource descriptions and searches, while many words have multiple meanings and can serve different parts of speech, which most search systems find difficult to differentiate. Furthermore, keywords often lack contextual relationships, leading users to question their connections. Searchers must also consider every possible term related to a concept to access all relevant materials, which necessitates a comprehensive understanding of the subject area.

What terms should be in a controlled vocabulary?

The terms included in a controlled vocabulary should be carefully selected to ensure they are relevant, precise, and useful for the intended purpose. Here are some key considerations for determining which terms to include: