Digital Library Issues
Environmental Issues   
Digital Library Issues   

Introduction

  This paper is composed of two sections. Section one surveys the context or environment in which libraries are operating. The second section deals with issues related to the collections, systems, users, and organization of a digital library. The motivation for writing this paper has been to develop a framework for the coordination and integration of what I've been reading and thinking about. It has also been a way for me to try to clarify some of these issues.

As a framework, the organization of issues into the categories of environment, content, systems, user, and organization works fairly well. I'm sure my perception of what constitutes pertinent issues will change. My analysis of these issues is sometimes muddled and superficial (but you've got to start somewhere). The paper is also devoid of any citations and references necessary to document many of the ideas and statements made. This is very much a work in progress!

Main Points
  • Information technology is changing the nature of information resources and the systems that deliver them.
  • Information technology is changing our patrons expectations in respect to information systems.

  • Information resources have expanded beyond books and journals and now include Internet publications and digital documents created for classroom support and distance education.

  • Because of the expanded nature of information resources systems and resources will be built and managed by a coalition of campus departments and will require innovated organizational techniques.

  • We must fully employ existing technology and systems while new systems are being built which incorporate current research and distributed system technologies.

  • There is a need for more research on the information seeking behavior of our patrons.


 
Up Arrow

Environmental Issues

The Library And Current Academic Environment   
The Current Technological Environment   
Digital Library Research Environment   
User Environment   
 

The Library And Current Academic Environment

The Library's Role In The Academic Community
The library's main roles on campus revolve around faculty and student research and in a more limited way, the direct support for departmental curricula. The library supports research by maintaining collections, providing access to research tools, and providing assistance in the use of these tools and collections. The library supports the curriculum by maintaining the reserve desk and providing reading material with a curricular focus. The increased use of information technology is impacting both of these areas.

The availability of digital resources from both traditional publishers and the Internet is challenging our concepts of collections and their use for research. The technology in the classroom is creating new types of academic resources that need to be classified and stored. The management of these academic resources share many of the requirements necessary for the management of the resources in the digital library.

The Availability Of Digital Resources
Digital resources have the promise of overcoming the constraints of time and distance. In this age of "access" vs. "ownership" they challenge our concepts of collections. Along with new types of information resources like web pages, traditional library resources such as monographs and journals are changing in form and delivery methods if not in content. The full text of many books are available on the Internet (Project Gutenburg). Traditional monograph publishers are beginning to publish only electronic versions of certain works (Columbia University Press). Much of the STM literature is now available in full text versions (Wiley, Sperlag, etc.).

On-demand Printing Of Monographs
As librarians we know that much of what we purchase, catalog and shelve never circulates. The exact percentage of this material varies with the institution. While it may be argued that the library is serving its archival role in regard to this material, much of it has been either not discovered at all or found not useful by our patrons. The technology for on demand printing is very imminent and is something that the digital library ought to investigate as one of its services.

Electronic Journals And Subscription Costs
The demand for journals by researchers and the costs associated with those journals is by far the single biggest issue in the research library. Journal costs have been rising at a rate of 10 Ð 15 percent a year while budgets have been rising at the rate of inflation. The question is how long can the current situation continue and what, if any, impact electronic publication will have on subscription costs.

Integrated indexing and full text delivery to the group of journals important to undergraduate research are available in most libraries today. Because of this group's wide market appeal, the market has remained competitive. We should continue to see an increase in the number of available titles at reasonable costs.

Integrated indexing and full text delivery of high cost, low use, medical and technical literature is another matter. Increasingly publishers are providing access to these types of publications in digital format. The coordination of indexing and full text delivery of this group of literature is something that vendors are now pursuing. Costs for accessing these digital publications have remained tied to print subscriptions. There is little evidence to suggest that the move to digital delivery of these journals will have a significant effect on lowering their costs or in bringing changes to the business model of scholarly publishing.

What is significant is that we are now beginning to see faculty members create their own low cost peer reviewed venues for publishing their research. These "mavericks" are directly attacking the business model of scholarly publishing!

Internet Resources
There has been much debate by librarians as to the quality of information available on the Internet and its usefulness in research. While the Internet is vast, there is no doubt as to the value of selected Internet resources. We must provide a more systematic approach to these resources.

A very modest proposal would be to select the top twenty sites in the top 100 disciplines. These 2000 sites should be cataloged to the Dublin Core minimum and a web crawler used to provide a full text index. The technical costs of such a proposal are insignificant compared to the human capital required to select and catalog the sites. Nonetheless I think this is a project that we should undertake.

Management Of New Academic Resources
On every campus in the country there is a least one administrator trying to figure out how to employ information technology to improve the "productivity" of the faculty. Distance learning programs are also being developed. Administrators are hoping to employ technology as a way of doing more with less. Often what is happening is that more is being done, but it is being accomplished with more money, more time, and more people. This is the "more with more" model rather than an increased productivity model of "more with less". Regardless of how we feel about these projects they are creating a new kinds of academic resources.

The library has always supported classroom activity through course reserve. The new information resources as described above provide new challenges and opportunities for the library to support classroom activities. As these new information objects increase in number they will need to be classified and stored. The acadmeic computing community, through EDUCOM, has created The "Instructional Management Systems Project" as a method for managing these new resources. This is a very broad and technically sophisticated proposal. It is interesting to the library not only because it suggest the use of the Dublin Core medtadata standards, but because it proposes an open and extendable distributed object infrastructure. This proposal, if implemented on campus by the computing center, will have a direct impact on any digital library project. We in libraries need to understand this proposal and learn how we can add to building this infrastructure on campus.

Understanding what others are doing on campus is important. Library budgets and services represent the "aggregated demand" for information resources by the academic community. Aggregating demand reduces duplication of effort and creates economies of scale. It may be reasonable to expect the digital library to become the locus for expertise in publishing "protocols" and as a provider of management services for electronic publications and the storage of these new academic resources.

The Current Technological Environment

The World Wide Web
The World Wide Web has become the center of the technological universe both on and off campus . The wide acceptance of the web is phenomenal, a truly world wide resource. The web was originally designed for the delivery of static documents and fulfills this role nicely. It was not designed as an application framework or as a multimedia delivery system. While more capable technologies exist for building distributed systems their acceptance rate is nowhere near that of the web. The web's success has whetted our appetites for distributed systems but exemplifies the weakness in the current protocols. We are ready for the next generation of Internet protocols. Unfortunately httpd-NG appears to be more than two years away.

Java
The other rising technological star is Java. While Java's acceptance has been much slower than anticipated its growth continues rapidly. Java has been touted as the way to deliver distributed applets and applications to clients. However, at the present time Java is finding greater acceptance on the server than on the client. The Enterprise JavaBean standard will strengthen JavaÕs position on the server.

Current digital library projects should be developed as web based applications, should consider implementing distributed application technologies such as CORBA, and should recognize the limited state of current client side architectures.

Digital Library Research Environment

Nsf Funded Research
Computer scientists, not librarians, are doing most digital library research. This research is system centered, concerned with the tools and methods of connecting the user and content. There has been very little research on the production of digital content and almost no research on how the user is effected by the use of digital materials.

By definition there has to be a gap between the state of the art (research test-beds) and the state of the practice. This gap is still fairly large in regard to the research funded by the NSF. The NSF is now beginning its second wave of funding and is reconsidering both the size and scope of the suggested projects. It has been recommended that these new projects have a greater distribution on the continuum of near term and long term implementation.

European Research
The research being done by the E-Lib group in Europe has been much closer to the state of the practice. Its focus has not been so much on finding new knowledge as on developing new systems. These projects are using tools that are fairly common such as Z39.50, web robots, and WHOIS++. Because of the nature of these projects, they are a good source of information for digital library projects being implemented by libraries.

User Environment

The library's traditional role has been to collect, organize, provide access to, and facilitate the use of the graphic records created by society. Several things are impacting that mission. The amount of information being published has increased so that no single institution, no matter how great its resources, can maintain, organize and provide access to the entire corpus. How do we provide our patrons adequate access to these records?

The second major factor is that information technology has changed our patron's preferences in their choice of information resources and their expectations as to what information systems should supply. In most cases, electronic access and delivery of a complete information resource are preferred over citations and print.

It is no longer a question of whether we need to respond to these changes but of how urgently we respond and in what manner. Libraries provide a wide array of electronic resources to their patrons. As these electronic resources proliferate, it has become clear that we must rethink how we collect, organize, provide access to, and facilitate the use of these digital documents.

Conversion Of Non-digital Material
Many of the library centered digital library initiatives around the country involve converting documents to digital format. Digitizing material can be an expensive process. Most of the costs associated with conversion projects reside in the classification and cataloging of the new material rather than in the conversion process itself. It follows that the materials selected to be converted must be unique and have potential interest to a significant audience. We must be very careful in what we chose to digitize.

A major component of our existing integrated library systems involves the creation and validation of cataloging and classification information. We must look at ways we can leverage our existing systems when creating new digital content.

The market for digital content is expanding. Many vendors are scrambling to create and make available digital content. For example, UMI is converting much of its microfilm collection to digital format. The availability of these large runs of back-files will prove valuable to many libraries.

 
Up Arrow

Digital Library Issues

Content Centered Issues   System Centered Issues   User Centered Issues   Organizational Issues   
 

Content Centered Issues

The Use Of Structured Documents
The possibilities provided by the structural encoding of documents blur the distinction between data and documents. We usually think of documents as being viewed and data as being processed. Documents can now be processed like data and data can now be exchanged and viewed like documents.

Some libraries such as Virginia, Michigan, Rutgers, and Indiana have gained significant experience in the generation and use of SGML encoded documents. It is reasonable to say that the use of structured documents by the library community as a whole has not been significant. With the introduction of XML and XSL this situation will change.

There are three cost streams related to the use of these types of document: the cost of the tagged text; the cost of storing and accessing the text; and the cost of providing the user with analytic tools and assistance in making the best use of these documents.

The market will provide generalized tools for the use of structured documents. The tools needed for manipulation and analysis of documents are very context specific. If we are to realize any gain in knowledge by the use of these documents we will have to provide a significant investment in the creation of tools and assistance to users.

Xml
SGML use has suffered from complexity and the lack of appropriate tools for the creation and analysis of encoded documents. The wide acceptance of XML and XSL will provide the critical mass necessary to remedy this situation. Already there exist several public domain and commercial XML editors and parsers. XML capable browsers by Netscape and Microsoft will be available in the near future. Along with the tools for creating and viewing XML we are beginning to see tools dedicated to the persistent storage and management of XML documents.

Classes Of Structured Documents
As potential consumers of many of these structured documents it behooves us to understand and possibly influence how publishers encode their documents. This is particularly important with journal publishers. Library literature provides many studies concerning the "type of literature" produced by a discourse community. If these types can be codified and the markup standardized, it will provide better indexing than full text and provide a means for transforming the document from one type to another. The concept of "classes" of documents can also be very powerful in thinking about our own library and campus-created documents shared on the Intranet. Again, this is another area where the library can provide leadership and assistance.

The Need For Meta-data
At the present time meta-data records, which are fielded and can act as a surrogate for the original document, are in short supply for most digital information resources. While the shortcomings of full text searching as an information retrieval mechanism are well known, the lack of quality meta-data has made full text the default mode of searching. As meta-data standards become more widely implemented richer indexing and more sophisticated searching will become common.

One of the promised advantages of digital documents is the reduction of the costs associated with classification and cataloging. The likelyhood of relying solely on machine created indexes is doubtful. Creation of meta-data by the creators of documents and third-party cataloging entities will be necessary for a complete and accurate description. As the types and number of creators of information resources increase, protocols and standards are needed for the consistent description of these resources.

Meta-data Standards
The library community has created a rich bibliographic meta-data standard in MARC. It is known among librarians that MARC is suitable for describing more than just bibliographic resources. Other "resource description communities" are creating simpler or what they consider more appropriate standards. The Dublin Core, GILS, and the Instructional Management System are examples of these new standards.

A problem resulting from multiple description standards is how to search across collections that employ different standards. The library can take the initiative to coordinate the use of these resource description standards. To insure "semantic" interoperability, the standards must be mapped to each other and utilities created to convert from one standard to the other.

The library must also provide assistance to information providers in the appropriate use of these standards and provide assistance in selecting tools that incorporate meta-data in the creation process.

Meta-data Record Exchange
The World Wide Web Consortium has recently approved several recommendations concerning meta-data. The most significant of these is the Resource Description Framework (RDF). This recommendation is significant because it provides the XML syntax for describing schemas related to meta-data records. While this is significant to the resource description communities, it is much more important to the systems that those communities use to process, index, and exchange meta-data records. (In the MARC world this is analogous to the MARC transport record rather than the intellectual content of a MARC record.)

The ability to exchange and process these standard records will not just affect our handling of information sources used by our readers but will impact our business relations with vendors by standardizing the exchange of business documents (catalogs, invoices, etc.).

The library has traditionally provided appropriate collections for its community. In an environment of access rather than ownership the concept of collection becomes unclear. The RDF standard provides the library a basis for constructing "collections" in a highly distributed resource environment. By selectively creating, acquiring, and indexing meta-data records, the library can provide virtual collections. These collections could be of any size, exist for any length of time, and be built for any user community. This is an area of the digital library that we should vigorously pursue.

Knowledge Bases
In his 1965 classic "Libraries of the Future" J.C.R. Licklider predicted that by the year 2000 libraries would mediate the interaction between the user and the fund of knowledge itself and not just the documents that contained this knowledge. While we are nowhere near this stage of development it is an interesting idea.

While not on the scale of a knowledge base as described above, there is the more limited concept of a knowledge base as a collection of heuristics that can be applied to a domain to assist in solving classes of problems. This is often what we do in the reference process. We use certain tools and heuristics in solving bibliographic and search domain problems. Libraries have developed these types of systems with varying success, but use of knowledge bases will definitely be a feature of the digital library.

System Centered Issues

Predicting the future of information technology development and its use is always interesting. While we may not be able to predict the future we have to plan for it. It can be safely said that if we don't plan ahead we will surely be left behind. The following is an overview of my "best thinking at the time" concerning what I see as the major issues in this area.

The Digital Librarian And System Design
If the nature of information resources is to be digital, it follows that the tools necessary to manage the digital library will be digital. If the mission of the digital library is the same as that of the traditional library but with digital resources and digital tools, it poses the question as to the role the traditional library staff should/will have in building a digital library.

Gaining technological skills is not easy, and because of the pace of change in technology, new skills are needed constantly. We must support each other in learning new skills and recruit librarians with strong technical skills.

Does this mean that the librarian has to become a computer scientist, a programmer, a database guru, a cognitive scientist, or practitioner of one of the many other cultures and disciplines that are involved in building the digital library? No, but if there is to be an integrator (and I think this could very well be the digital librarian) then this integrator has to posess some level of competency in these areas.

I find the metaphor of "architect" an appropriate one. The architect's first concern is with the stated needs of the owner and intended use of the structure. However, to arrive at a suitable "design", she must be familiar with the standard practices of the trades involved and must understand the properties and limits of the materials used in building the system. So it is with the designer of an information system.

Due to both their universality and complexity, digital library projects are multi-disciplinary endeavors. Digital library initiatives rely on the skills and thinking of many disciplines. Some disciplines focus on a single aspect of the digital library while others maybe involved with more than one issue. Success depends on a balanced approach that integrates the three areas of collections, systems, and users. Of all the disciplines involved in these projects the digital librarian may be the best person to provide this integrated perspective.

Adaptability And Durability
The pace of change in information technology is very rapid, but the change is not evenly distributed across all areas. While improvements in the hardware area have kept pace with Moore's Law (a doubling of performance every 18 months), this has not been the case with software development. Often information systems take months to design and are obsolete soon after they are implemented. What's the answer to this problem? Higher levels of abstraction.

Today's products and technologies are being implemented with higher levels of abstraction. This abstraction makes it possible for more people to share the burden of building these systems. Users (or their representatives) are increasingly becoming involved in the modeling of the information system. Developers are increasingly integrating resources created by other developers. The result is faster system development, more flexibility, and better usability.

The biggest change here from the traditional library environment is that rather than being a consumer of a system in a turn-key type fashion the library must now be an active partner in system development and share both the glory and blame for the quality of the resulting system.

Interoperability
As the type and number of digital information resources have grown there has been a corresponding increase in the number of systems needed to provide access to these resources. This is particularly true with search services. The number one imperative of the digital library system is to provide the ability for many heterogeneous systems to interoperate. Interoperability is the Holy Grail of system designers and as such won't be easily obtained.

The key to interoperability is for systems to be well described. As is the case of information resource description, meta-data is the answer. System level meta-data suffers from the same problems as resource description meta-data: lack of understanding what needs to be described, standards for describing it, and wide spread acceptance of standards once they have been established. Modern computing practices such as Object Oriented Analysis and Design, Java, CORBA, Z39.50 are providing the tools necessary to build such systems.

Multi-tiered Systems
Library systems have developed over the years from monolithic main frame applications (NOTIS) to client/server systems (most of the current library systems) and recently to distributed multi-tiered systems (DRA's "Taos"). Considering the nature of digital libraries (many distributed information providers and many distributed information users), the current multi-tiered architecture is the appropriate architecture for digital library systems.

The good news is that there are numerous vendor and public domain solutions for assisting in building these types of systems. The bad news is how to pick the right one! There are many factors to be considered in this decision: the number of information resources in the system, the expected level of use, the availability of technical staff, and the shared or leverage resources within the University. For the system to be successful tools have to be chosen and expertise developed.

User Centered Issues

No matter how good the collections and how sophisticated the system, success has to be evaluated by how effective the system is in benefiting and assisting the user. Did the system provide the user an answer to her question? Did it assist in helping her create or develop relationships? Did it provide confirmation or assurance concerning currently assumed knowledge?

In almost every area these issues are under-researched, under-analyzed, and not well understood. Despite this state of affairs explicit statements regarding all areas of user involvement must be made in order to evaluate system performance and effectiveness.

Librarians have a strong tradition of working with users (as compared to many of the other disciplines involved in the digital library) and must build on our research and experience in this area

Organizational Issues

Traits For Success
There has been much discussion about organizational issues and how they will assist or hinder the effectiveness of the library in its transition to the digital world. In my experience these have been the issues that have had the most impact on success:

  • There must exist in the organization a willingness to apply more rigor and systematic analysis to the problems. This means being aware of current research and its relation to the problem and an awareness of research methods or other methods of systematic analysis.

  • The organization must accept that solution-building is iterative and a learning process for the organization and all members involved.

  • There must exist a shared responsibility for success. Rather than strict hierarchical organization, shared responsibility manifests itself by allowing good ideas to come from anywhere and allows information and resources to gravitate towards those that show initiative and capability.


The Possibility Of A Virtual Organization
Due to both their universality and complexity, digital library projects are multi-disciplinary endeavors. and rely on the skills and thinking of many disciplines and departments on campus. It could be very likely that the digtal library may be a "virtual" organization composed of members from different departments on campus and funded by many existing budgets.


 
Created 2/23/99
by Brian Kennison
Counter