CNI White Paper on Networked Information Discovery and Retrieval
Outline: Chapter 3
Description and Metadata to Support Current NIDR Processes and Goals
Part I. Traditional forms of metadata information for NIDR
- self description by extraction and its limitations
- archie, veronica etc.
- HTML extraction (webcrawlers)
- SGML and DTDs; the use of the TEI header; linking semantics to DTDs.
- the descriptive cataloging tradition
- MARC practices; the 856 field and MARC as surrogate
- TOPNODE, GILS, OCLC Project
- issues around authority files, controlled vocabularies, thesauri, etc. problems of mixed controlled and uncontrolled vocabularies.
simplified cataloging: RFC 1537, the "Dublin dozen" data elements
- combining multiple descriptive cataloging schemes
- attribute and data element mappings and hierarchies
- automatic indexing (IR)
- Essence project
- linkage with authority files (Gypsy); people and locations
- automatic type recognition of objects and use of heuristics
- limitations imposed by file systems without object typing
- nontextual media issues -- images, sound and video
- describing compound and aggregate objects and information spaces: how do you describe a database or other information space. Newsgroups as information spaces.
Part II: Enriching current NIDR with new types of metadata information
- usage data, citation data (links); self description by use.
- nondescriptive information: reviews, bibliographies, etc. from human beings. Pathfinders. Seals of Approval. Extent to which these are part of the "intellectual" infrastructure that supports NIDR processes as opposed to directly integrated with NIDR systems.
- the re-use of "management" metadata in retrieval processes. use in cache management based on access patterns. use in retrieval (expiration dates, regeneration schedules, classification and embargo). to what extent does pure management metadata exist?