Terry Smith
Identifying important research issues concerning the development of digital libraries requires a focused discussion. A useful focus is provided by defining the essential nature of a digital library and by restricting discussion to special classes of collections. A particularly useful focus emerges from a consideration of multimedia collections, since digital technology offers powerful techniques for handling queries involving heterogeneous collections of such materials.
It is helpful to recast the question "What is a Digital Library?" into a set of simpler questions. In particular, one may ask: "What is a library?"
"What is the role of a librarian in a library"? "In what ways does digital technology extend and enhance the nature of a library and the role of a librarian"?
At its heart, a library is a collection of items containing representations of information with some intended meaning. The single most important property characteristerizing whether a collection of informational items belongs to a library is that the collection is organized and managed in a manner that optimizes access to the information for a given class of users. In particular, the organization and management of a library's collections should facilitate the processing of the information in the items and the extraction of useful knowledge represented either explicitly of implicitly in the items.
The major role of a librarian is in organizing and managing the library's collections, and in facilitating the communication of the information between the library and its users. In a "traditional" library, such management and organization involves the creation of catalog information facilitating access to appropriate items in the collections. Cataloging information essentially provides a mapping between the items and abbreviated representations of the items and their content. The management and organization also involves a physical organization of items that accords with the cataloging procedures. It should be noted that important metadata associated with the items in a library's collections concerns the "authenticity" and "validity" of items. Such information may be implicit in the fact that a librarian has decided to add an item to a library's collections.
A digital library may be viewed as a library that has been extended and enhanced by the application of digital technology. Important aspects of a library that may be extended and enhanced include:
In discussing the extensions and enhancements for the four aspects of libraries that may be supported by digital technology, we first provide examples of key extensions, a brief overview of important issues associated with such extensions, and a list of research problems germane to these issues.
Providing digital representations of library items, with all the attendant advantages of such representations, is clearly a major enhancement Digital technology, however, also permits the extension of library collections into new domains. We briefly discuss five examples of such domains and a few associated issues.
"User-centered" collections involve the construction of personalized collections by users and may involve, for example, the reorganization of parts of existing items into new items, as well as their extension with various annotations. A critical issue raised by this possibility relates to the procedures by which certain classes of items are "authenticated" as being part of a "core" library. In general, it appears important that such a core collection be identified in terms of certain admissibility criteria.
Multimedia collections may involve digital representations of
An important issue relating to the items of such collections is that they may require significant levels of intermediate processing or interpretation, such as image or acoustical signal processing, that are not required for collections of traditional textual materials. A second important issue relates to integration of such materials. A relatively simple example demonstrating the need both for intermediate processing and for integration is provided by the following query:
Find all quadsheets containing towns with over half a million inhabitants in the Mississippi Valley that are within 50 miles of Indian burial sites for which the library has digitized photographic records dating from the last century.
A third important issue arising in multimedia collections concerns the need to construct, store, access, and process multiple concrete representations of items. For example, different representations of digitized map information currently exist, with some favoring given information processing operations more than others. A fourth issue concerns metadata about the "lineage" of the information contained in some digital object, since it may embody a complex history of information processing to material from a variety of different sources. Lineage issues are frequently important in determining the value of information in certain applications.
"Procedural" collections involve collections of information-processing operations that may be applied by users or librarians in order to extract information from other library items. Multimedia materials represented in digital form may require the application of a large variety of procedures in order to extract the information required by users. An important issue is how libraries should support, organize, and manage procedural information. A related issue is about "dynamic" collections, by which we mean collections that grow as information is extracted from other items already residing in a library's collections. Such information may be extracted by various procedures stored as library items and applied in some "automated" manner. Finally, we mention "knowledge bases," which we may interpret to be representations of large domains of knowledge, such as those contained in online encyclopedias. Such knowledge bases, when viewed as "ontologies," may be used both as a basis for metadata in a library's catalog and as an information source enhancing a users access to the information in other library items. An important issue is the construction and maintenance of such knowledge bases.
A few of the many important research questions that relate to the issues identified above include:
If there is one single criterion that defines a library, it concerns the generally accepted procedures and protocols for organizing and managing the collections. An important enhancement that is provided by digital technology is the possibility of presenting to the user many alternative ways of viewing any subset of the library's collection. Such dynamic "reorganizations" may be based on the different ways in which library items may be indexed in the catalog according to various criteria that include the medium and format of the item as well as many aspects its content. In the context of digital libraries, such dynamic reorganizations of the catalog may be viewed as providing the user with a variety of different browsing contexts, as if the items had been reorganized on the "stacks." Important issues relate to the set of organizing principles for a library's collection that are supported by a library and the ways of making such multi-organizations interoperable between libraries.
A second and related enhancement involves the great variety of metadata that it is possible to extract, store, and retrieve about items and the various organizations that may be imposed on a library's collections using such metadata. It is useful to categorize metadata according to whether it is domain-dependent or domain-independent and whether, in the latter case, it is related to "low-level" content or to aspects of the origin and representational aspects of the item. Such distinctions are very important in the case of multimedia libraries. Items such as images and maps, for example, have huge numbers of interpretations concerning their content, and require significant extensions of traditional cataloging practices in order to characterize them in a manner that is useful for user access and browsing. An enhancement to traditional libraries that arises in relation to metadata involves the variety of "annotations" that may be stored in association with items. Critical issues that arise concern the classes of metadata to be extracted from library items of different types and the procedures for extracting such metadata.
At the highest level, metadata may be based on "ontologies," or organized sets of concepts concerning both the representational aspects of library items and their content. Ontologies provide the basis for the many indexing schemes that are possible. For example, ontologies that relate to the origin, lineage, format, and representational aspects of library items may be viewed as extensions of the "author catalog," and are important for representing catalog information about the many classes of non-standard items that may occur in multi-media collections and about the multiple representations of such items. Ontologies that relate to the content of library items may be viewed as extensions of the "subject catalog" of traditional libraries. As an example, an important class of ontologies concerning geographical objects provide bases for indexing schemes for library items in terms of the "spatial projection" of the objects to which the items refer. Ontologies may be multiple, overlapping, hierarchically organized, and amenable to object-oriented representations. In particular, they may be used to define "sublibraries." Important issues concern the construction and use of various ontologies and the interoperability of libraries with respect to such ontologies.
Important general issues concern the role of the librarian in constructing the extended metadata and catalogs that are required, as well as issues relating to standards and protocols for metadata extraction, organization, and access.
A few of the research problems relating to the issues identified above include
The extensions and enhancements that digital technology offers in the area of library access relate the universality and ubiquity of the access, the nature of the information accessed, the large variety of means for accessing information, and the extraction of knowledge with the use of procedures that convert information in implicit form to information in explicit form.
Accessible information includes the extended metadata discussed previously, as well as information about other libraries, their catalogs, their collections, and their items. Digital technology makes it possible to access information by any of the extensions to metadata that were discussed above, including access by medium, structure, and content.
Important issues in relation to finding digital objects in integrated, multimedia digital libraries include query languages and support for complex query construction, particularly in the case of complex queries that require synthesis and the application of transformations. Perhaps the single most important criterion is the ability to search content. Other issues include the translation of queries into domain-independent and domain-dependent metadata, the use of similarity matching in answering queries, multidimensional indexing, and the correlation of multimedia items with the use of information concerning the content of the items.
A few of the many important research problems that relate to the issues identified above include
Communications between users and libraries and among libraries themselves are major aspects that assume critical importance in digital extensions of libraries.
Important issues that arise from the preceding discussion involve protocols and standards for data and metadata, high-level languages for users and librarian, and low-level protocols to support library interoperability