Digital Library Issues
Environmental
Issues Digital Library Issues |
|
Introduction |
|
This paper is composed of two sections. Section one surveys
the context or environment in which libraries are operating.
The second section deals with issues related to the
collections, systems, users, and organization of a digital
library. The motivation for writing this paper has been to
develop a framework for the coordination and integration of
what I've been reading and thinking about. It has also
been a way for me to try to clarify some of these issues.
As a framework, the organization of issues into the categories of environment, content, systems, user, and organization works fairly well. I'm sure my perception of what constitutes pertinent issues will change. My analysis of these issues is sometimes muddled and superficial (but you've got to start somewhere). The paper is also devoid of any citations and references necessary to document many of the ideas and statements made. This is very much a work in progress! Main Points
|
|
|
|
Environmental IssuesThe Library And Current Academic EnvironmentThe Current Technological Environment Digital Library Research Environment User Environment |
|
The Library And Current Academic EnvironmentThe
Library's Role In The Academic
Community The availability of digital resources from both traditional publishers and the Internet is challenging our concepts of collections and their use for research. The technology in the classroom is creating new types of academic resources that need to be classified and stored. The management of these academic resources share many of the requirements necessary for the management of the resources in the digital library. The Availability Of Digital ResourcesDigital resources have the promise of overcoming the constraints of time and distance. In this age of "access" vs. "ownership" they challenge our concepts of collections. Along with new types of information resources like web pages, traditional library resources such as monographs and journals are changing in form and delivery methods if not in content. The full text of many books are available on the Internet (Project Gutenburg). Traditional monograph publishers are beginning to publish only electronic versions of certain works (Columbia University Press). Much of the STM literature is now available in full text versions (Wiley, Sperlag, etc.). On-demand
Printing Of Monographs Electronic Journals And Subscription
Costs Integrated indexing and full text delivery to the group of journals important to undergraduate research are available in most libraries today. Because of this group's wide market appeal, the market has remained competitive. We should continue to see an increase in the number of available titles at reasonable costs. Integrated indexing and full text delivery of high cost, low use, medical and technical literature is another matter. Increasingly publishers are providing access to these types of publications in digital format. The coordination of indexing and full text delivery of this group of literature is something that vendors are now pursuing. Costs for accessing these digital publications have remained tied to print subscriptions. There is little evidence to suggest that the move to digital delivery of these journals will have a significant effect on lowering their costs or in bringing changes to the business model of scholarly publishing. What is significant is that we are now beginning to see faculty members create their own low cost peer reviewed venues for publishing their research. These "mavericks" are directly attacking the business model of scholarly publishing! Internet ResourcesThere has been much debate by librarians as to the quality of information available on the Internet and its usefulness in research. While the Internet is vast, there is no doubt as to the value of selected Internet resources. We must provide a more systematic approach to these resources. A very modest proposal would be to select the top twenty sites in the top 100 disciplines. These 2000 sites should be cataloged to the Dublin Core minimum and a web crawler used to provide a full text index. The technical costs of such a proposal are insignificant compared to the human capital required to select and catalog the sites. Nonetheless I think this is a project that we should undertake. Management Of New Academic ResourcesOn every campus in the country there is a least one administrator trying to figure out how to employ information technology to improve the "productivity" of the faculty. Distance learning programs are also being developed. Administrators are hoping to employ technology as a way of doing more with less. Often what is happening is that more is being done, but it is being accomplished with more money, more time, and more people. This is the "more with more" model rather than an increased productivity model of "more with less". Regardless of how we feel about these projects they are creating a new kinds of academic resources. The library has always supported classroom activity through course reserve. The new information resources as described above provide new challenges and opportunities for the library to support classroom activities. As these new information objects increase in number they will need to be classified and stored. The acadmeic computing community, through EDUCOM, has created The "Instructional Management Systems Project" as a method for managing these new resources. This is a very broad and technically sophisticated proposal. It is interesting to the library not only because it suggest the use of the Dublin Core medtadata standards, but because it proposes an open and extendable distributed object infrastructure. This proposal, if implemented on campus by the computing center, will have a direct impact on any digital library project. We in libraries need to understand this proposal and learn how we can add to building this infrastructure on campus. Understanding what others are doing on campus is important. Library budgets and services represent the "aggregated demand" for information resources by the academic community. Aggregating demand reduces duplication of effort and creates economies of scale. It may be reasonable to expect the digital library to become the locus for expertise in publishing "protocols" and as a provider of management services for electronic publications and the storage of these new academic resources. The Current Technological EnvironmentThe World
Wide Web Java Current digital library projects should be developed as web based applications, should consider implementing distributed application technologies such as CORBA, and should recognize the limited state of current client side architectures. Digital Library Research EnvironmentNsf
Funded Research By definition there has to be a gap between the state of the art (research test-beds) and the state of the practice. This gap is still fairly large in regard to the research funded by the NSF. The NSF is now beginning its second wave of funding and is reconsidering both the size and scope of the suggested projects. It has been recommended that these new projects have a greater distribution on the continuum of near term and long term implementation. European ResearchThe research being done by the E-Lib group in Europe has been much closer to the state of the practice. Its focus has not been so much on finding new knowledge as on developing new systems. These projects are using tools that are fairly common such as Z39.50, web robots, and WHOIS++. Because of the nature of these projects, they are a good source of information for digital library projects being implemented by libraries. User EnvironmentThe library's traditional role has been to collect, organize, provide access to, and facilitate the use of the graphic records created by society. Several things are impacting that mission. The amount of information being published has increased so that no single institution, no matter how great its resources, can maintain, organize and provide access to the entire corpus. How do we provide our patrons adequate access to these records?The second major factor is that information technology has changed our patron's preferences in their choice of information resources and their expectations as to what information systems should supply. In most cases, electronic access and delivery of a complete information resource are preferred over citations and print. It is no longer a question of whether we need to respond to these changes but of how urgently we respond and in what manner. Libraries provide a wide array of electronic resources to their patrons. As these electronic resources proliferate, it has become clear that we must rethink how we collect, organize, provide access to, and facilitate the use of these digital documents. Conversion Of Non-digital MaterialMany of the library centered digital library initiatives around the country involve converting documents to digital format. Digitizing material can be an expensive process. Most of the costs associated with conversion projects reside in the classification and cataloging of the new material rather than in the conversion process itself. It follows that the materials selected to be converted must be unique and have potential interest to a significant audience. We must be very careful in what we chose to digitize. A major component of our existing integrated library systems involves the creation and validation of cataloging and classification information. We must look at ways we can leverage our existing systems when creating new digital content. The market for digital content is expanding. Many vendors are scrambling to create and make available digital content. For example, UMI is converting much of its microfilm collection to digital format. The availability of these large runs of back-files will prove valuable to many libraries. |
|
|
|
Digital Library IssuesContent Centered Issues System Centered Issues User Centered Issues Organizational Issues |
|
Content Centered IssuesThe Use
Of Structured Documents Some libraries such as Virginia, Michigan, Rutgers, and Indiana have gained significant experience in the generation and use of SGML encoded documents. It is reasonable to say that the use of structured documents by the library community as a whole has not been significant. With the introduction of XML and XSL this situation will change. There are three cost streams related to the use of these types of document: the cost of the tagged text; the cost of storing and accessing the text; and the cost of providing the user with analytic tools and assistance in making the best use of these documents. The market will provide generalized tools for the use of structured documents. The tools needed for manipulation and analysis of documents are very context specific. If we are to realize any gain in knowledge by the use of these documents we will have to provide a significant investment in the creation of tools and assistance to users. XmlSGML use has suffered from complexity and the lack of appropriate tools for the creation and analysis of encoded documents. The wide acceptance of XML and XSL will provide the critical mass necessary to remedy this situation. Already there exist several public domain and commercial XML editors and parsers. XML capable browsers by Netscape and Microsoft will be available in the near future. Along with the tools for creating and viewing XML we are beginning to see tools dedicated to the persistent storage and management of XML documents. Classes
Of Structured Documents The Need
For Meta-data One of the promised advantages of digital documents is the reduction of the costs associated with classification and cataloging. The likelyhood of relying solely on machine created indexes is doubtful. Creation of meta-data by the creators of documents and third-party cataloging entities will be necessary for a complete and accurate description. As the types and number of creators of information resources increase, protocols and standards are needed for the consistent description of these resources. Meta-data StandardsThe library community has created a rich bibliographic meta-data standard in MARC. It is known among librarians that MARC is suitable for describing more than just bibliographic resources. Other "resource description communities" are creating simpler or what they consider more appropriate standards. The Dublin Core, GILS, and the Instructional Management System are examples of these new standards. A problem resulting from multiple description standards is how to search across collections that employ different standards. The library can take the initiative to coordinate the use of these resource description standards. To insure "semantic" interoperability, the standards must be mapped to each other and utilities created to convert from one standard to the other. The library must also provide assistance to information providers in the appropriate use of these standards and provide assistance in selecting tools that incorporate meta-data in the creation process. Meta-data Record ExchangeThe World Wide Web Consortium has recently approved several recommendations concerning meta-data. The most significant of these is the Resource Description Framework (RDF). This recommendation is significant because it provides the XML syntax for describing schemas related to meta-data records. While this is significant to the resource description communities, it is much more important to the systems that those communities use to process, index, and exchange meta-data records. (In the MARC world this is analogous to the MARC transport record rather than the intellectual content of a MARC record.) The ability to exchange and process these standard records will not just affect our handling of information sources used by our readers but will impact our business relations with vendors by standardizing the exchange of business documents (catalogs, invoices, etc.). The library has traditionally provided appropriate collections for its community. In an environment of access rather than ownership the concept of collection becomes unclear. The RDF standard provides the library a basis for constructing "collections" in a highly distributed resource environment. By selectively creating, acquiring, and indexing meta-data records, the library can provide virtual collections. These collections could be of any size, exist for any length of time, and be built for any user community. This is an area of the digital library that we should vigorously pursue. Knowledge BasesIn his 1965 classic "Libraries of the Future" J.C.R. Licklider predicted that by the year 2000 libraries would mediate the interaction between the user and the fund of knowledge itself and not just the documents that contained this knowledge. While we are nowhere near this stage of development it is an interesting idea. While not on the scale of a knowledge base as described above, there is the more limited concept of a knowledge base as a collection of heuristics that can be applied to a domain to assist in solving classes of problems. This is often what we do in the reference process. We use certain tools and heuristics in solving bibliographic and search domain problems. Libraries have developed these types of systems with varying success, but use of knowledge bases will definitely be a feature of the digital library. System Centered IssuesPredicting the future of information technology development and its use is always interesting. While we may not be able to predict the future we have to plan for it. It can be safely said that if we don't plan ahead we will surely be left behind. The following is an overview of my "best thinking at the time" concerning what I see as the major issues in this area.The
Digital Librarian And System Design Gaining technological skills is not easy, and because of the pace of change in technology, new skills are needed constantly. We must support each other in learning new skills and recruit librarians with strong technical skills. Does this mean that the librarian has to become a computer scientist, a programmer, a database guru, a cognitive scientist, or practitioner of one of the many other cultures and disciplines that are involved in building the digital library? No, but if there is to be an integrator (and I think this could very well be the digital librarian) then this integrator has to posess some level of competency in these areas. I find the metaphor of "architect" an appropriate one. The architect's first concern is with the stated needs of the owner and intended use of the structure. However, to arrive at a suitable "design", she must be familiar with the standard practices of the trades involved and must understand the properties and limits of the materials used in building the system. So it is with the designer of an information system. Due to both their universality and complexity, digital library projects are multi-disciplinary endeavors. Digital library initiatives rely on the skills and thinking of many disciplines. Some disciplines focus on a single aspect of the digital library while others maybe involved with more than one issue. Success depends on a balanced approach that integrates the three areas of collections, systems, and users. Of all the disciplines involved in these projects the digital librarian may be the best person to provide this integrated perspective. Adaptability And DurabilityThe pace of change in information technology is very rapid, but the change is not evenly distributed across all areas. While improvements in the hardware area have kept pace with Moore's Law (a doubling of performance every 18 months), this has not been the case with software development. Often information systems take months to design and are obsolete soon after they are implemented. What's the answer to this problem? Higher levels of abstraction. Today's products and technologies are being implemented with higher levels of abstraction. This abstraction makes it possible for more people to share the burden of building these systems. Users (or their representatives) are increasingly becoming involved in the modeling of the information system. Developers are increasingly integrating resources created by other developers. The result is faster system development, more flexibility, and better usability. The biggest change here from the traditional library environment is that rather than being a consumer of a system in a turn-key type fashion the library must now be an active partner in system development and share both the glory and blame for the quality of the resulting system. InteroperabilityAs the type and number of digital information resources have grown there has been a corresponding increase in the number of systems needed to provide access to these resources. This is particularly true with search services. The number one imperative of the digital library system is to provide the ability for many heterogeneous systems to interoperate. Interoperability is the Holy Grail of system designers and as such won't be easily obtained. The key to interoperability is for systems to be well described. As is the case of information resource description, meta-data is the answer. System level meta-data suffers from the same problems as resource description meta-data: lack of understanding what needs to be described, standards for describing it, and wide spread acceptance of standards once they have been established. Modern computing practices such as Object Oriented Analysis and Design, Java, CORBA, Z39.50 are providing the tools necessary to build such systems. Multi-tiered SystemsLibrary systems have developed over the years from monolithic main frame applications (NOTIS) to client/server systems (most of the current library systems) and recently to distributed multi-tiered systems (DRA's "Taos"). Considering the nature of digital libraries (many distributed information providers and many distributed information users), the current multi-tiered architecture is the appropriate architecture for digital library systems. The good news is that there are numerous vendor and public domain solutions for assisting in building these types of systems. The bad news is how to pick the right one! There are many factors to be considered in this decision: the number of information resources in the system, the expected level of use, the availability of technical staff, and the shared or leverage resources within the University. For the system to be successful tools have to be chosen and expertise developed. User Centered IssuesNo matter how good the collections and how sophisticated the system, success has to be evaluated by how effective the system is in benefiting and assisting the user. Did the system provide the user an answer to her question? Did it assist in helping her create or develop relationships? Did it provide confirmation or assurance concerning currently assumed knowledge?In almost every area these issues are under-researched, under-analyzed, and not well understood. Despite this state of affairs explicit statements regarding all areas of user involvement must be made in order to evaluate system performance and effectiveness. Librarians have a strong tradition of working with users (as compared to many of the other disciplines involved in the digital library) and must build on our research and experience in this area Organizational IssuesTraits
For Success
The Possibility Of A Virtual Organization Due to both their universality and complexity, digital library projects are multi-disciplinary endeavors. and rely on the skills and thinking of many disciplines and departments on campus. It could be very likely that the digtal library may be a "virtual" organization composed of members from different departments on campus and funded by many existing budgets. |
|
|