Digital Libraries Research Agenda Report

Executive Summary

Interoperability, Scaling, and the Digital Libraries Research Agenda:

A Report on the May 18-19, 1995
IITA Digital Libraries Workshop

I. Infrastructure
II. Research Agendas and Priorities
A. Research in Digital Library Interoperability
B. Research in Describing Objects and Repositories.
C. Research in Collection Management and Organization
D. Research in User Interfaces and Human-Computer Interaction.
E. Economic, Social, and Legal Research Issues
III. Scaling of Digital Library Experiments

This report summarizes the results of a workshop on Digital Libraries held under the auspices of the U.S. Government's Information Infrastructure Technology and Applications (IITA) Working Group in Reston, Virginia on May 18-19, 1995. The objective of the workshop was to refine the research agenda for digital libraries with specific emphasis on issues of scaling and interoperability, and to identify the infrastructure developments needed to make progress on these issues.

Key Findings

I. Infrastructure

In the near term, investment is needed to support both infrastructure development and research in the following areas, with emphasis on rapid deployment of:
A. Common schemes for the naming of digital objects, and the linking of these schemes to protocols for object transmission, metadata, and object type classifications essential. Naming schemes that allow global unique reference represent one of the most important infrastructural components to support large-scale development of digital library systems and resource sharing and interoperation among these systems.
B. A deployed public key cryptosystem infrastructure -- including key servers and appropriate standards -- are essential to progress in digital libraries. This is needed for authentication, privacy, rights management, and payments for the use of intellectual property. Without these services, commercial publishers will remain reluctant to participate in digital library development in the Internet environment.

II. Research Agendas and Priorities

A specific research agenda for digital libraries continues to evolve. Five key areas were identified. The "grand challenge" is interoperability at a deep semantic level, providing digital library users with a coherent view of heterogenous autonomously managed rsources, and many research priorities relate to this long-term objective. Other key themes involve the relationship between traditional library missions and practices and digital libraries, and ways in which traditional library functions will migrate to the digital library environment.

A. Research in Digital Library Interoperability

There is a spectrum of interoperability goals, ranging from those that can be achieved in the near term to longer-term "grand challenge" objectives. At the near-term end of the spectrum is the use of common tools and interfaces that provide a superficial uniformity for navigation and access, but rely almost entirely on human intelligence to provide any coherence of content. At the opposite end of the spectrum is deep semantic interoperability -- the ability of a user to access, consistently and coherently, similar (though autonomously defined and managed) classes of digital objects and services, distributed across heterogeneous repositories, with federating or mediating software compensating for site-by-site variations. It also extends beyond passive digital objects to actual services offered by specific digital library systems. An intermediate position between these two extremes advocates primarily syntactic interoperability (the interchange of metadata and the use of digital object transmission protocols and formats based on this metadata rather than simply common navigation, query and viewing interfaces) as a means of providing limited coherence of content, supplemented by human interpretation. Definition of levels of interoperability and the challenges in achieving them is itself a key research problem. More technical research questions involve protocol design that supports a broad range of interaction types, inter-repository protocols, distributed search protocols and technologies (including the ability to search across heterogeneous databases with some level of semantic consistency), and object interchange protocols. Interoperability is not simply a matter of providing coherence among passive object repositories. Digital library systems offer a range of services, and these services must be projected in an interoperable fashion as well. Existing Internet protocols (such as HTTP, the basis of the World Wide Web) are clearly inadequate. Research must move beyond the current base of deployed protocols and systems. This raises complex questions about how to deploy prototype systems and the tradeoffs between advanced capabilities and ubiquity of access. Finally, users in the networked environment will have access to personal, workgroup, organizational, and public information spaces. Digital libraries can exist at multiple levels in this hierarchy. Users will demand a coherent view across these multiple information spaces, and controlled interoperability among them.

B. Research in Describing Objects and Repositories.

In order to provide a coherent view of collections of digital objects, they must be described in a consistent fashion which can facilitate the use of mechanisms such as protocols that support distributed search and retrieval from disparate sources. Research in description of objects and collections of objects provides the foundation for effective interoperability. Interoperability at the level of deep semantics will require breakthroughs in description as well as retrieval, object interchange, and object retrieval protocols. Issues here include the definition and use of metadata and its capture or computation from objects (both textual and multimedia), the use of computed descriptions of objects, federation and integration of heterogeneous repositories with disparate semantics, clustering and automatic hierarchical organization of information, and algorithms for automatic rating, ranking, and evaluation of information quality, genre, and other properties. Other key issues involved knowledge representation and interchange, and the definition and interchange of ontologies for information context. Research is also needed to understand the strengths and limitations of purely computer-based technologies for describing objects and repositories, and the appropriate roles for the efforts of human librarians and subject experts in the digital library context as a complement to these technology-based approaches is also clearly a central problem.

C. Research in Collection Management and Organization

Collection management and organization research is the area where traditional library missions and practices are reinterpreted for the digital library environment. Progress in this area is essential if digital library collections are to meet successfully the needs of their user communities. Policies and methods for incorporating information resources on the network into managed collections, rights management, payment, and control issues were all identified as central problems in the management of digital collections. Approaches to replication and caching of information and their relationship to collection management in a distributed environment need careful examination. The authority and quality of content in digital libraries is of central concern to the user community. Ensuring and identifying these attributes of content calls for research that spans both technical and organizational issues. Research is also needed to clarify the roles of librarians and institutions in defining and managing collections in the networked environment. The preservation of digital content across multiple generations of hardware and software technologies and standards is essential in the creation of effective digital libraries. This is an extraordinarily difficult research problem which has not received sufficient attention.

D. Research in User Interfaces and Human-Computer Interaction.

While user interfaces and human-computer interaction issues are an extensive field of research in their own right, there are some specific problems that are central to progress in digital libraries. Display of information, visualization, and navigation of large information collections, and linkages to information manipulation/analysis tools were identified as key areas for research. The use of more sophisticated models of user behavior in long-term interactions with digital library systems is a potentially fruitful area for research. The necessity for a more comprehensive understanding of user needs, objectives, and behavior in employing digital library systems was stressed repeatedly as a basis for designing effective systems. Finally, digital library systems must become far more effective in adapting to variations in the capabilities of user workstations and network connections (bandwidth) in presenting appropriate user interfaces. New technologies such as personal digital assistants and nomadic computing models will emphasize this need.

E. Economic, Social, and Legal Research Issues

Digital libraries are not simply technological constructs; they exist within a rich legal, social, and economic context, and will succeed only to the extent that they meet these broader needs. Rights management, economic models for the use of electronic information and billing systems to support these economic models will be needed. User privacy needs to be carefully considered. There are complex policy issues related to collection development and management, and preservation and archiving. Existing library practice may shed some light on these questions. The social context of digital documents, including authorship, ownership, the act of publication, versions, authenticity, and integrity require a better understanding. Research in all of these areas will also be needed if digital libraries are to be successful.

III. Scaling of Digital Library Experiments

The understanding of digital libraries requires large-scale experiments. These must be enabled by funding, infrastructure development, and software deployment strategies. Further, it is vital that support be provided to study the operation and effectiveness of large-scale experiments once they are deployed. The common vision is one of tens of thousands of repositories of digital information that are autonomously managed yet integrated into what users view as a coherent digital library system. We must move rapidly towards an infrastructure that can support and facilitate research towards this common vision. The Internet as a context for deploying digital library systems offers an unprecedented opportunity -- not only technically by providing connectivity to an enormous potential user base, but also culturally, given the Internet community's models and traditions of technology diffusion through the distribution of publicly available prototype software -- to move ahead large-scale experiments. We don't know how to approach scaling as a research question, other than to build upon experience with the Internet. However, attention to scaling as a research theme is essential and may help in further clarifying infrastrcture needs and priorities, as well as informing work in all areas of the research agenda outlined above. For example, reliability questions are poorly understood. In a sufficiently large system, some components will inevitably be out of service during the processing of any given query. There was consensus on the need to enable large-scale deployment projects (in terms of size of user community, number of objects, and number of repositories) and subsequently to fund study the effectiveness and use of such systems. It is clear that limited deployment of prototype systems will not suffice if we are to fully understand the research questions involved in digital libraries.