The Winner of the 2007 IT Division Jo Ann Clifton
Student Award
|
|
Forging cultural heritage collections online : The story of An American Tale
Candidate for
M.A. Information Resources & Library Science University of Arizona Tucson, Arizona |
Windmill at Sunset
- Boone County [Missouri], Photo Credit: Duane Perry, Columbia, Missouri (http://www.missouri.gov/mo/mophotos/sunsets) |
Contents
Introduction 1.0 Initial goals and work undertaken 3.0 Lessons learned |
Introduction
In the heartland of 19th-century The purpose of this
paper is to reflect back upon the extensive planning and execution
required to create, from the ground up, the digital repository of 3
migrant pathways to Like the journey
of early settlers to |
Project goals for the 8-week
digital collection project were
to: 1) digitize 30 primary and secondary
sources from research collected over the past ten years by the author, 2) create an open access collection online of
the digitized images with relevant metadata, 3) create an online guide
which
would include interpretive and educational materials pertaining to the
subject
, and 4) use the project as a platform to understand the decision
issues
associated with organizing, describing, indexing, classifying,
digitizing,
presenting and retrieving items in building a digital collection. The scope of the collection
consisted of thirty vintage
photographs, and primary and secondary records, uncovered by the author
through correspondence with individuals or on-site research at local
cemeteries, public libraries, academic libraries, county courthouses,
state
departments of health, state historical societies, or federal archives. Types of records selected were photographs,
correspondence, vital records, census records, naturalization and
immigration
records, church records, military records, newspaper clippings, court,
land and
tax records. Three discrete, topical
themes formed the intellectual
boundary of the project: 1) Slaveholder from The NISO standard for
building digital
libraries, entitled "A framework of guidance for building good digital
collections," served as the framework for constructing the digital
repository (http://www.niso.org/framework/Framework2.html). The three intended audiences for
An American Tale were academic
historians, graduate
students, and family historians. Goals set for the project,
above, dictated requirements for its
design. The content management system
used as the container for the collection was Greenstone
Digital Library shareware (www.greenstone.org),
assigned to all
students matriculating Digital Libraries, the course for which the project was
assigned. Revealed in a pilot
walk-through of sample records was the
need for a taxonomy to uniquely identify each object.
After research and experimentation, a naming
standard was created for each file, using the family name, generation
number,
pedigree placement, and record type. File
naming followed the ISO 9660, Level 2 convention, which
allowed file
names of up to 31 characters, only lower case characters
a-z,
numerical digits, and special characters period, underscore, and hyphen (http://en.wikipedia.org/wiki/ISO_9660). Spaces or any other special characters were
not used. The reader
will learn later in this paper how
critical proper formation of a file naming convention early on in a
project is to
its later success. The collection required a
simple metadata standard with
modest granularity, due to the simple nature of the collection and the
limited
experience of its builder. The
metadata
standard selected for the project was Dublin
Core (DC), which provides standard
accessibility and expanded use of the collection. Use
of the DC standard retains the context of
each record, and provides a 'footprint' for rights status and digital
provenance. This
is compliant with the Open Archives Initiative Metadata
Harvesting Protocol standard (http://www.openarchives.org/OAI/openarchivesprotocol.html
). Full bibliographic detail of
the preserved items, including
structural, administrative, and descriptive metadata, is detailed in a Microsoft Excel spreadsheet file which
accompanies the digitized records. Implementation of standard encoding
practices for metadata will facilitate sharing with others among
federated
archives. Library of Congress Subject
Heading authorities were used to standardize descriptive terms. Of the six possible Creative
Commons licenses available to individuals, the project used the”
Attribution Non-commercial No Derivatives license”
(www.creativecommons.org). Others may
download works in the collection, on the condition that users cite
their
source, do not alter the material in any way, or reuse it for
commercial
purposes. Access to the original physical photographs and print records
is
available to the general public, with prior request for permission in
writing. Custodianship of high-quality
digital master copies of the original
records is retained by the author on compact discs, and on the author’s
local hard drive. FastSum
Integrity Control was used to ensure data integrity of
master files through back-up and any future migration (www.fastsum.com).
Lower
resolution digital surrogates of the high-quality digital master copies
reside
in the Greenstone Digital Library
database for public viewing. Records were scanned using an
A4-standard AcerScan 620U Prisa USB flatbed image scanner. Maximum resolution of 600x1,200
dots per inch provided adequate viewing of the objects.
Images
were manipulated to ensure consistency in size using Microsoft
Paint. No part of
the original digitized record was cut, cropped, or altered in any way
in the
manipulation process. TEI-P-5
Guidelines (version 0.4.1, July
2006) for processing and creating images were used to guide
digitization of
photographic or photocopy images, created for uploading to Greenstone
(http://www.tei-c.org/release/doc/tei-p5-doc/html/).
Finally, as part of project
management of the digital
library construction, an 8-week timeline was created toward work
completion, auxiliary
personnel were identified, equipment needs were assessed, a proposed
budget was
assembled, and project metrics or means for evaluating the process were
created. In summary, the ‘magic
formula’ for creating a
premier digital library collection was to clearly state project goals,
identify the scope and
selection policy of the collection, then target the main audience. Next, the best metadata standard was considered,
and
copyright rules suitable to the collection were identified. Then,
ownership and access conditions of the collection were ascertained, and
software and hardware requirements were refined. Finally, a clear
timeline was created, needed personnel were hand-picked, a flexible budget was
formulated, and plans to
measure success were defined.
|
The finished Greenstone library
collection represents
a simple mock-up of 30 artifacts, representative of a grander vision of
what
could be an extensive collection on 19th-century records of immigrants
to The home page, or the About page in Greenstone vernacular,
shown in Exhibit 2.1, orients the user to the
purpose, scope, selection
process, and arrangement of the collection.
Exhibit 2.1: Collection
Home Page
On the Titles a-z page, each bookshelf icon (below) represents a single document or photograph, sorted alphabetically by topical theme, surname, and then record title, shown below in Exhibit 2.2. Where did all of that information come from to fill the Titles a-z index? The secret is in the metadata. The elegance of Greenstone lies behind the scenes, buried in the metadata assigned to each record. What the user will not see while navigating An American Tale, are the 15 Dublin Core metadata elements which describe each record.
Take for example,
the 1863 Certificate of Disability
for Discharge issued to Sergeant Philip P. Wilhelm (20th from
the top of the list), who was
released from the Union Army's Company E, 37th Ohio Volunteer Infantry,
at Exhibit
2.3 Greenstone
Metadata Screenshot
Wherever the certificate travels,
shown in Exhibit 2.4, below, through data
harvesting or other means, users will have full Dublin Core metadata to know its
provenance. Sergeant Wilhelm may not appreciate the world knowing
about his indelicate disease contracted at the Battle of
Fayetteville. But the world will have accessible proof that he
was there at
the Battle, thanks to descriptive administrative and structural
metadata which comply with generally-accepted metadata harvesting
standards. Exhibit
2.4: Document
Object - Certificate of Disability
for Discharge for Phillip P. Wilhelm, 12 January 1863
The Subjects
Page, shown below in
Exhibit 2.5, represents some of the most powerful browsing
capability within the An American
Tale website. Users may browse through detailed Library of Congress Subject Heading
authorities to identify the specific document or photograph sought
after. Four pages of subject headings give the user over 70
topics and sub-topics from which to choose. Exhibit
2.5: Greenstone Subjects
Page
For example, an interest in
carte-de-visite photographs (pronounced cart-du-viZEET), popular during
the American Civil War, will net three finds in the photographic medium
shown in Exhibit 2.6, below: one for Henry M. Ogden, Mary Frances
Turpin Ogden, and Captain John James Ogden. Exhibit 2.6
Subject Thumbnails
The final searchable module is
the Coverage Page (Exhibit
2.7), which outlines four periods in Missouri
history: a) 1812-1819 Territorial Missouri, b)
1860-1877 Civil War
and Reconstruction Missouri, c) 1878-1899 Outlaw and Volunteer Missouri
and d) 1900-1929 World's Fair and Lindbergh Missouri. The only
period not represented is 1820-1859 Statehood Missouri, for which no
documents or photographs exist in the present collection. Exhibit
2.7:
Coverage Page
Once a
bookshelf icon is selected on the Coverage
Page, for example, for
Territorial Missouri,
two thumbnail images appear for that time
frame. Both are legal records associated with the 1816 marriage
of Katharine Smith to Henry M. Ogdon [sic]: an Affidavit of Age of Majority (Exhibit 2.8),
and a Marriage Bond, for
financial remuneration to the bride's father, should the groom, 24-year
old Mr. Ogdon, choose to flee from the altar. Exhibit
2.8: Affidavit of age of majority - Katharine Smith,
November 4, 1816, Bedford County, Virginia
In the final section, 3.0, the reader will learn shortcomings and successes of the project in an effort to understand best practices in building a premier digital collection online.
|
3.0: Lessons learned
The reader
may take away five important
lessons from the
project: 1) planning is crucial,
Digitization
requires a material long-term
financial investment, and pulls on often limited organizational
resources. Common sense dictates that
results be planned
well, and measured to make the return on
the initial investment imminently clear. Tools like a project
schedule, a prospective budget, a mid-project review, and a plan for a
file
naming convention served to ease the project's execution.The reader should have a
roadmap to where he or she is going.
A plan to evaluate day-to-day journaling
of activities mid-point through the project helped
enormously in planning
the
second part of the project. By keeping a
log to which one could refer mid-point, the author was able to
pinpoint logistical
problems immediately, and prepare for their resolution in the second
part of
the project. Inadequate planning on the
author's part in arriving at a file naming convention meant
repeating tasks four or five times, which would
have been avoided with better preparation, and saved lots of time. In the example in Exhibit
3.2, the Civil War
Pension record JPEG file, was assigned the number 23 because the
first individual bearing that surname in the pedigree chart was
numbered 23 (Rosa
May Wilhelm); that Phillip P Wilhelm was the sixth generation back, his
unique
individual number on the pedigree chart was 46, the document type was a
military record assigned the number 7, and it was the first of its kind.
But then the filename
became too long. File naming outlined in
the project's initial research proposal dictated that the
ISO 9660, Level 2
convention for naming files would be followed, as mentioned before in
section 1.2, which
allows file names of up to 31 characters. In
file naming, only lower case
characters a-z, numerical digits, and special characters period,
underscore,
and hyphen would be used. Spaces or
any
other special characters would not be used. Thus,
the file name was shortened in its third iteration. For its fourth iteration, a new name was needed because the new low resolution surrogates of the digital master files needed a name, which would actually be uploaded to Greenstone. The suffix 'lo' was appended to the file name, still within the 31 character limit. Then, after creating, building, and previewing the Greenstone collection, the author learned that the term "page" in the filename, confused Greenstone, and aborted compiling of the library. Thus, in its fifth iteration, the term 'page' was removed from the file name. Finally, the author
learned that Greenstone reads
a
file name up until the first period to extract the name of the file,
and then stops reading the name. Typically,
one would call a file filename dot
JPEG, or filename dot
BITMAP, and so forth. But
Greenstone stopped reading the
filename
after the first dot, which in our example was 23. Thus,
the file name was recorded in Greenstone
as 23 along with all other files prefixed '23,' excluding
the rest of the file name, creating confusion for the user with
multiple files titled "23." Therefore, in
its sixth and final iteration, periods were removed from file names to
allow Greenstone to properly
index the entire name of the file.
Time invested in planning
upfront nets
tremendous
savings later in the project. With that thought in mind,
lesson 2 teaches
us that experience counts, or rather that inexperience can result in
painful
revisions and delays. Inexperience in prepping the documents and retaining their provenance added time. With little training, creating a taxonomy for the first time added time. Understanding new and complicated standards, like the TEI-P-5 Guidelines, added time. Poor equipment selection added time in requiring that some documents be outsourced. Inexperience with Library of Congress Subject Headings meant long and laborious dissection of appropriate headings for collection objects which added time. Several technical problems resulted from inexperience. Resolution of scanned images was of mediocre quality due to the age of the scanner and poor technique, and troubleshooting the Greenstone Library Interface in constructing a basic digital library with Greenstone was an ongoing battle that a more experienced builder would not have had to endure. 3.3 Choose wisely The free, open access, Greenstone Digital Libray software is a welcome solution to many collections, which would otherwise not be mounted to the World-Wide Web were it not for Greenstone. The power of posting to the world an item plus its full bibliographic record for later data harvesting, is the stuff of which librarians dream. But the amount of difficulties surmounted, and the limited support documentation made the selection one to reflect upon. Greenstone proved a very cramped space in which to build a repository for a beginner. Its assets are great for universality of metadata but at a high cost. Dated and unreadable user manuals meant repeated combing through computer-ease written in awkward English. Grand designs of "reading
rooms"
or side-by-side ASCII text translations for each object in an
An American
Tale, FAQ's, a
Contact Us page,
Chronological Lifeline, and
User Guide all fell
flat with hard to understand
capabilities served up in Greenstone.
3.5 Keep a sense of humor Above all else, the reader should remember to try to maintain a sense of humor. The task of mounting a digital library to the World-Wide Web is no small feat. The ability to stand back and laugh at one's foibles or mishaps will only aid to keep the project on track, as well as its creator. |
In Forging cultural heritage collections
online: The story of An American Tale, the reader learned how
to lay the foundation for building an online digital collection of
cultural
heritage artifacts, by first defining one's requirements then defining
one's design. The reader learned what the
finished product might look like using the Greenstone
Digital Library platform, and its assorted features. Finally, as a result of the 8-week effort, the
reader
learned about 5 important lessons concerning digital library design
which may save time and heartache in any future attempts. Like the journey of early
settlers to |
An
American Tale: 19th century Folkways to digital library, please go to:
|
�2008 INFORMATION TECHNOLOGY DIVISION/SPECIAL LIBRARIES ASSOCIATION