The Document Architecture for the Cornell Digital Library
RFC 1691

Document Type RFC - Informational (August 1994; No errata)
Last updated 2013-03-02
Stream Legacy
Formats plain text pdf html bibtex
Stream Legacy state (None)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state RFC 1691 (Informational)
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                          W. Turner
Request for Comments: 1691                                           LTD
Category: Informational                                      August 1994

       The Document Architecture for the Cornell Digital Library

Status of this Memo

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.

Abstract

   This memo defines an architecture for the storage and retrieval of
   the digital representations for books, journals, photographic images,
   etc., which are collected in a large organized digital library.

   Two unique features of this architecture are the ability to generate
   reference documents and the ability to create multiple views of a
   document.

Introduction

   In 1989, Cornell University and Xerox Corporation, with support from
   the Commission on Preservation and Access and later Sun Microsystems,
   embarked on a collaborative project to study and to prototype the
   application of digital technologies for the preservation of library
   material.  During this project, Xerox developed the College Library
   Access and Storage System (CLASS), and Cornell developed software to
   provide network access to the CLASS Digital Library.

   Xerox and Cornell University Library staff worked closely together to
   define requirements for storing both low- and high-resolution
   versions of images, so that the low-resolution images could be used
   for browsing over the network and the high-resolution images could be
   used for printing.  In addition, substantial work was done to define
   documents with internal structures that could be navigated.  Xerox
   developed the software to create and store documents, while Cornell
   developed complementary software to allow library users to browse the
   documents and request printed copies over the network.

   Cornell has defined a document architecture which builds on the
   lessons learned in the CLASS project, and is maintaining digital
   library materials in that form.

Turner                                                          [Page 1]
RFC 1691               CDL Document Architecture             August 1994

Document Architecture Overview

   Just as a conventional library contains books rather than pages, so
   the electronic library must contain documents rather than images.
   During the scanning process, images are automatically linked into
   documents by creating document structure files which order the image
   files in the same way the binding of a book orders the pages.  Thus,
   the digital book as currently configured consists of two parts: a set
   of individual pages stored as discrete bit map image files, and the
   document structure files which "bind" the image files into a
   document.  In addition, a database entry is made for each digital
   document which permits searching by author and title (i.e.,
   bibliographic information).  Beyond the order of the pages, the
   arrangement of a physical book provides information to readers.  The
   title page and publication information come first; the table of
   contents usually precedes the text; the text is divided into sections
   or chapters; if there is an index, it follows the text.  The reader
   often refers to these components of a book when browsing the library
   shelves, in order to determine whether to read the book.

   The document structure provides direct access to the components of an
   electronic document, storing the information that would otherwise be
   lost when the book is disbound for scanning.

Document Architecture Requirements

   Listed below are the requirements that were initially set down for
   the Cornell Digital Library Architecture.

   1. The architecture must be open (i.e., published and freely
      available).

   2. The architecture should be as simple as possible (to facilitate
      product development).

   3. The architecture should assume data storage in UNIX file systems.

   4. The architecture should allow for standard data usage, such as via
      FTP and Gopher servers (i.e., pages of a document must exist in a
      single directory, and the naming convention used must order them
      in the standard collating sequence, such as the series "0001.TIF,
      0002.TIF,..., 0411.TIF" (NOTE: a series such as "1.TIF, 2.TIF,...,
      10.TIF" would be ordered "1.TIF, 10.TIF, 2.TIF, ..." which is not
      acceptable).

   5. The architecture should provide for storing the same information
      in different formats.  For example, when a page of a document is
      available at several different resolutions.

Turner                                                          [Page 2]
RFC 1691               CDL Document Architecture             August 1994

   6. Low-resolution "thumbnail" images of each page must be stored to
      facilitate browsing and sharing of data.
Show full document text