Digitization 101: Digital Repositories

Showing posts with label Digital Repositories. Show all posts

Tuesday, June 13, 2017

Does an Award Winning Design Reflect the Content Within?

I am catching up on reading and Internet surfing, which means I'm finding things I should have read months ago. This blog post wonders if award winning book covers are on books with highly rated content. I've copied the post's graphic below and you're welcome to go read the original post. However, this got me thinking about web site design and specifically library web sites.

Most libraries have a web site. Those sites are created in a number of different ways, using free and fee-based tools. Some provide basic information about the library, while others are more in-depth. I suspect that most do not provide all of the information that their users want, such as information about the staff or board of trustees, or details about borrowing privileges. Indeed many libraries only provide what the staff is interested in sharing, and that could be very little.

Most libraries do not have someone on staff who can create a professional design of the web site. Sites which we might consider "award winning" are likely owned by large, well-funded libraries, where a tech-savvy person internally or externally is charged with maintaining the site. As our computing devices have changed (e.g., the move to mobile devices), our site designers have had to create sites that will look good and function on any type of device. This is called responsive design. My own site is an example of one that uses responsive design so that it functions well on any type of device.

The problem with web sites (and books) is that a great looking site may have very little useful content. In some cases, a great looking site may actually contain fake content, while a site that is not designed by a professional may have extemely useful content. Yes, judging a book (or web site) by its design can be problematic.

So what are you to do?

Whether your site is for a digitization program, a specific department, or the entire library, make sure that it gives users the information that they desire about you (program, department, library). If you are waiting until it is designed perfectly, don't. Place the information online, then schedule time to make it better.
State your assumptions. You actually have no idea who will use your web site, so don't assume that they will know specific details about you (e.g., location).
Work towards a design that is compliant with American with Disabilities Act rules/guidelines. If you don't know what that means, ask someone. Yes, there are free tools, like this one, which you can use to assess accessibility. I know you might get frustrated with the errors, but try to work on fixing them.
Work towards functional and informative, then towards beautiful. People will endure a less than beautiful web site, if it delivers worthwhile information.
When possible hire someone - even a knowledgeable intern - who can help you with your web site. Remember that you can contract with someone to provide this service on-demand.

By the way, I did run my own web site through the WAVE tool and I can see that I have some changes to make! I guess I better do that before I look at any of the books below.

Created by Syracuse University's School of Information Studies master of information management program.

Monday, November 03, 2014

Article: The Internet Archive, Trying to Encompass All Creation

In a New York Times article, Brewster Kahle talks about expanding what the Internet Archive can do, if anyone (and everyone) can become a curator. In terms of digitization, this text stood out to me:

A new book scanner was presented; Robert Miller, the archive’s director of books, literally unveiled it. This baby was only 40 inches tall and 62 pounds, versus the earlier version’s six feet and 350 pounds. In other words, it is portable, and can be taken to collections that are too fragile or cumbersome to make their own way to the archive. It’s much easier to use, too.

While that's not quite smaller enough to fit into anyone's home, it is a size that would fit into many libraries.

Monday, October 20, 2014

Version 2.0 of the International Image Interoperability Framework (IIIF)

This announcement came to me in mid-September. My apologies in not posting it sooner. The event mentioned below is being held today (Oct. 20).

The International Image Interoperability Framework community (http://iiif.io/) is pleased to announce the release of the second major version of its specifications intended to provide a shared layer for dynamic interactions with images and the structure of the collections and objects of which they are part. These APIs are used in production systems to enable cross-institutional integration of content, via mix and match of best of class front end applications and servers.

This release adds additional functionality derived from real world use cases needed by partners within the community, and reflects more than a year of experience with the previous versions and significant input from across the cultural heritage community. It also formalizes many of the aspects that were implicit in the initial versions and makes puts into place a manageable framework for sustainable future development.

Detailed change notes are available.

The specifications are available at:

Image API: http://iiif.io/api/image/2.0/
Presentation API: http://iiif.io/api/presentation/2.0/

Accompanying the release of the specifications is a suite of community infrastructure tools, including reference implementations of all versions of the Image API, collections of valid and intentionally invalid example Presentation API resource descriptions, plus validators for both APIs. Production ready software is available for the full Image API stack, with server implementations in both Loris [1] and IIP Server [2], and rich client support in the popular Open Seadragon [3].

There will be a rollout and dissemination event on October 20th, 2014 at the British Library to celebrate this release and engage with the wider community. Further details at http://iiif.io/event/2014/london.html, all are welcome but (free) registration is required.

Feedback, comments and questions are welcomed on the discussion list at iiif-discuss@googlegroups.com

[1] Loris: https://github.com/pulibrary/loris/

[2] IIP Server: https://github.com/ruven/iipsrv [3] Open Seadragon: http://openseadragon.github.io/

Friday, August 23, 2013

Rest in Peace: DialogClassic

I heard today that the classic version of Dialog® is on its last legs. All of the databases that were available in the old version of Dialog are now available in ProQuest Dialog. While I recognize that the command line version of Dialog is not what today's searchers want to use...and while I recognize that our technology has gotten better...I mourn.

Many librarians learned how to search for electronic information on the Dialog system. We learned commands, tricks, and short cuts. We memorized details about specific databases and consulted documentation to double-check details so that we didn't spend extra time or money online. We spoke in phrases peppered with file numbers and field names. We shared stories of commands (or searches) gone wrong and the charges that they caused. And we smiled.

In talking to someone from ProQuest today, I used the phrase "under the hood." She immediately thought I was referring to a 2001 article by Carol Tenopir entitled "Why I still teach Dialog" (Library Journal, 126(8), 35-36), which is available through ProQuest. In it, Tenopir writes:

For new students in an LIS program, DialogClassic helps them understand the workings of the systems they will be searching, teaching, or designing.

This is true. It is also true that teaching someone to search Dialog's old command line interface takes patience. The learner must be willing to try and fail, and learn from those failures as well as the successes. It is also true that most - if not all - of our searching these days is not done with a command prompt, but through some Google-like interface or an advanced search screen. It is difficult to teach students something that they will not use after the class is over. Face it, no one is going to run home and search using a command prompt just for fun!

However, I believe that understanding the old ways helps you understand how we do things now. All of us that used Dialog in the old days have a knowledge about the system that our younger counterparts will never have. Although hard to quantify, I would argue that there is value to that knowledge.

The Power of Full-Text

I remember when full-text records came to Dialog and the power that came with it. I no longer had to use a document delivery service to obtain the full-text. Not only was that a cost saving, it also meant I could get the full-text to my client faster. Initially the full-text was not searchable. When it became searchable, it was revolutionary! Now we take full-text searching for granted. And instead of have full-text that is in ASCII, we have full-text that is presented in PDFs with graphics, etc., intact. The addition of full-text has been due to re-typing, scanning, and other methods. Some of it has come with added errors. All of it has been appreciated.

By the way, could we look at those companies that produced the Dialog databases as being early pioneers in digitization? Yes, I think so.

First Dialog, then Google

We can also argue that Dialog paved the way for many services, including Google. I remember working with programmers on DR-LINK, a product of TextWise, and calling upon my Dialog knowledge in helping them make sense of the files that had to be turned into coherent databases. The command level search had taught me much about file structures and expectations. I'm sure that others that have built services have had the same revelations not from using fancy interfaces, but from "getting their hands dirty" at the command prompt.

In Memoriam

I wonder if ProQuest will throw a virtual event when DialogClassic is finally turned off? Or perhaps some of us "old Dialog searchers" will just find a way to gather, light candles, and tell stories of a great system that began it all...

Monday, August 12, 2013

2014 National Agenda for Digital Stewardship

The National Digital Stewardship Alliance has recently released its 2014 National Agenda for Digital Stewardship. As the site says:

The National Agenda for Digital Stewardship annually integrates the perspective of dozens of experts and hundreds of institutions, convened through the Library of Congress, to provide funders and executive decision‐makers insight into emerging technological trends, gaps in digital stewardship capacity, and key areas for funding, research and development to ensure that today's valuable digital content remains accessible and comprehensible in the future, supporting a thriving economy, a robust democracy, and a rich cultural heritage.

Over the coming year the NDSA will work to promote the Agenda and explore educational and collaborative opportunities with all interested parties.

In an announcement to the CNI community, Clifford Lynch wrote:

This is a very valuable concise survey and agenda for high priority areas of digital stewardship; it's also important because it reflects the wide consultation and breadth that characterizes the important leadership and coordinating work of the Alliance.

Thursday, June 14, 2012

Should librarians be required to know another language?

Let me tell you three situations that have me thinking about this.

First, if you are trying to catalogue or create metadata for an item that is not in your native language, can you complete the task? Would it be helpful to know another language? We know that some languages have similarities, so could knowing one additional language actually help you navigate a few more than that? And would it make you a more effective librarians?

Second, if you are working the reference desk in a city that has a diverse population, should you be able to service people in their own language? In some industries that diversity in language is sought and valued (e.g., hospitals). Should libraries also seek to have that type of diversity on their staff?

I have a student who is doing an internship in a public library. He has realized that being conversant in Spanish would be a good thing. In the U.S., a growing segment of our population speaks Spanish, so shouldn't our library staff speak Spanish? (And if there is another language widely used in the community, shouldn't we have staff members that also speak that language?)

Third, if you are build a service (e.g., digital collection) that will be used by a diverse group of people, would it be helpful to have text in their own languages that would help them use the site? Would you want to outsource that work? Would you want someone on staff to do that work or even know the language well enough to be able to supervise the work?

If you agree that knowing a foreign language would be useful for library and information science graduates, how do we encourage them to learn a language or maintain fluency in a language? Should we ask existing staff to learn a language that is being used in the community and even tell them which language they need to learn? (For example, you need to learn Mandarin, not French.)

Friday, February 24, 2012

David Smith: Inferring and Exploiting Relational Structure in Large Text Collections

This week, I heard David Smith talk about "Inferring and Exploiting Relational Structure in Large Text Collections." Interesting that digitized books in the public domain are becoming testbeds for these research endeavors. He is also using translated text (e.g., books that have been translated into several languages) in order to discern the words used to describe specific concepts across languages.

I am so used to thinking about the digitization effort, that I rarely think about all of the ways that these now digitized texts can be used. That is one of the reasons why I found Smith's talk to be of interest.

Abstract: The digitization of knowledge and concerted retrospective scanning projects are making overwhelming amounts of text in diverse domains, genres, and languages available to readers and researchers. To make this data useful, our group is working on improving OCR, language modeling, syntactic analysis, information extraction, and information retrieval. I will focus in particular on problems of inferring the relational structure latent in large collections of documents, such as books, web pages, patent applications, grant proposals, and social media postings. Which books or passages quote, translate, paraphrase, and cite each other? This research requires improvements in modeling translation and other forms of similarity, as well as improvements in efficiently comparing large numbers of passages. Finally, I will discuss how passage similarity relations can be used to improve tasks such as named-entity recognition and syntactic parsing.

Friday, September 23, 2011

Clifford Lynch, Scholarly Works, Big Data and Libraries

Clifford Lynch, executive director of the Coalition for Networked Information (CNI), spoke at Syracuse University yesterday. Lynch's talk was entitled "The Changing Landscape of Scholarship: Implications for Libraries and Scholarly Communication". He believes that you must look at the future of scholarly work first, then look at how that work will affect libraries.

A great deal of scholarly practice is becoming data and computationally intensive, and across all disciplines. Funding agencies are increasingly requiring that data produced as part of a grant be stored, maintained, and often shared (even before the research is completed). This has led to new areas of study, including data curation, eScience and data science, as well as new jobs, etc. However, it is not clear what the real need for data curation experts is. Nor clear where they will or should reside. Where is an important question since we don't have a national effort in the U.S. to store research data.

Lynch did make an important distinction about two types of research data. First, there is data that can be easily recreated. Do we need to store this data in perpetuity? Perhaps not. Then there is observational data which can be difficult, if not impoosible, to recreate. This data does need to be stored and maintained.

He described two used of research data. The first is to support the original research as well natural extensions of thar research. The other use of data is as proxy data and you cannot predict those uses. For example, data about high tides could be used by the shipping industry, but also could be used by environmentalists. Since you can't predict its use, the whereabouts of data sets needs to be known and access available.

Lynch spent a long time talking about access. When talking about big data, we all assume data that is digital, however, Lynch talked about the tremendous about of data - including specimens - that is not yet digital and that is very fragile. Who is going to create the digital surrogates? Where will the funding come from?

Universities sit on a tremendous amount of data that might be "hidden" in various departments. The library might not even know where it all is. Lynch believes that at some point various university offices will get involved in how data is stored, maintained and shared including the office of risk management and those that audit varous processes.

Near the end of the question and answer period, Clifford Lynch made the point that people outside of our research institutions cannot easily get to scholarly material anymore. (He used the phrase "scholarly material" in a broad sense, including data, databases, etc.) For me, that raised the question of how libraries will make scholarly information available to people are not part of their user-base.

Lynch mentioned that the UK had been a poster child for creating systems for storing data, but that funding shifts had harmed those systems. He mentioned both AHDS and JISC. In the U.S., there isn't a natural focal point in order to build systems similar to the UK.

Lynch also touched upon:

The unauthorized sharing of scholarly information.
Ethical constraints on research data.
"quantitative measurements of illusive things" - journal impact factors and other bibliometrics.
Managing nontraditional "publications"
Open access, self publishing,and specialized web site which have replaced specialized encyclopedias.
"Report from theResearch Data Workforce Summit"

Lynch would like to see people studying the "science of data" with theoretical underpinnings. He believes that archival practices are not enough in order to understand and handle the problems with scholarly data. He does believe that libraries have the processes and mission that will lead them to attack the problem of data sharing and data curation.

Several of my students attended Lynch's talk. Two mentioned being inspired by it and now being interested in knowing more about eScience. They saw a vision of an optimistic future. One student heard in Lynch an air of pessimism. Lynch, however, would say that he is neither an optimist nor a pessimist, but just a realist.

Friday, January 07, 2011

Notes from Clifford Lynch's short keynote at HICSS

My colleague, Kevin Crowston, is attending the Hawaii International Conference on System Sciences (HICSS). He has given me permission to share some of his notes from the short keynote address that Clifford Lynch gave for the Digital Media track.

In speaking about "digital media", Lynch "discussed how different kinds of media were evolving as they went digital. He noted that eBooks were still basically books, down to page flips. Journals are also digital, but journal articles look nearly the same. He suggested that the most truly digital medium was the video game, but that there was a lot of resistance to considering video games as the future form of literature. He noted that business documents had really gone virtual: e.g., the shift from a paper airline ticket to an entry in a database that doesn't even necessarily get printed out. He suggested that a real shift is the prevalence of personal libraries--people can carry around basically all of their music, books, papers, photos, and it's not clear how they are managing those." (quoting Crowston's notes. Emphasis added.)

Crowston said that Lynch "then changed gear to discuss problems of preservation of the culture record. He noted that library special collections are important as a record of how a person worked and ideas were developed. [This is] Increasingly problematic as boxes of obsolete diskettes and obsolete word processing files show up. Digital forensics increasingly is about seeing how a machine has interacted with the rest of the world, vs. finding files. Similarly, a person's personal record is now scattered across multiple services."

It is interesting that we continue to create digital versions/environments (e.g., the ebook) that mimic what we have done for decades without the use of computers. Perhaps it is that we haven't lived in the digital age long enough to understand how to take advantage of the technology in a way that is different than what we've done previously. Maybe we're still too tied to the old ways, to imagine how to do things differently. Could it take several generations of digital natives before changes occur?

Wednesday, May 12, 2010

Event: Survive or thrive: making the most of your digital content Conference

Received via email.

*Survive or thrive: making the most of your digital content Conference*

http://www.surviveorthrive.org.uk/

8-9 June 2010

Macdonald Hotel, Manchester

*Background*

The growth of digital content and use of content on the Web has been rapidly changing over the past decade. The digital deluge provides opportunities but how can these best be exploited? Are you making the most of your content? What are the technical and strategic approaches required to thrive in today's environment?

Question this conference will start to address:

How do we exploit the value of distributed resources, linked data, geospatial tagging and metadata etc?
In terms of scale what are the issues and barriers? What does working at web scale mean and offer? How can the crowd be exploited?
What are the issues and opportunities for opening up content?
How do we effectively and efficiently meet the needs of users and taking the best advantage of the available technologies? For example personalisation?
How do sectors work together? Education, the cultural heritage sector, engaging business and community and the public and private sectors? What role should strategic agencies play?

*Aim*

The aim of the conference is to bring together community of experts to provide a focus on the above questions. This will allow us to identify the key approaches that universities, colleges, the cultural heritage and public sectors can pursue to support education, research and the wider knowledge economy.

*Outcomes*

It will provide a *Position Paper*, based on the workshop discussion that gives direction to content providers in the networked environment.

Also the outcomes of the workshop will help to inform JISC programmes and service approaches, in particular the resource discovery and access strategies that JISC pursues; for example the JISC and Research Libraries UK Resource Discovery Task Force vision and the JISC's Strategic Content Alliance
as appropriate.

*Audience*

This event is aimed at people who have a stake in providing content for learning teaching and research. This includes policy makers, senior managers, information specialists and technical managers/developers, from the public and commercial sectors.

*Registration*

Your registration includes all refreshments, lunch and dinner at the event. It also includes a night's B & B accommodation. Otherwise, travel and subsistence are not included.

To book ASAP please click on the link as:

http://asp.artegis.com/SurviveorThriveopeninvitation

Monday, September 07, 2009

Event: Museum Computer Network, Nov. 11-14, 2009

From an email message.

Join the Museum Computer Network for the 37th annual conference in Portland, Oregon, November 11th 14th.

Museum Information, Museum Efficiency: Doing More with Less!

PRELIMINARY PROGRAM AVAILABLE ONLINE NOW

Join MCN for four days of programming with innovative sessions panels, papers, case studies, and workshops that illustrate how institutions are effectively functioning and planning to function during the tough times ahead.

Visit www.mcn.edu/conferences to view the preliminary program and for registration, hotel & travel information.

***

About the Museum Computer Network

Mission: The Museum Computer Network (MCN) supports the greater museum community by providing continuing opportunities to explore, implement, and disseminate new technologies and best practices in the field.

Founded in 1967, MCN is a nonprofit organization with members representing a wide range of information professionals from hundreds of museums and cultural heritage institutions in the United States and around the world. MCN helps museum information professionals and people
interested in technology in the cultural heritage community seek out and share ideas and information through a wide range of activities, including an annual conference, special interest groups, website, and other outstanding resources such as the new MCN Project Registry at
MuseTechCentral (http://musetechcentral.org/)

Technorati tags:

Web 2.0,

Digital Asset Management,

Digital Repository

Monday, July 27, 2009

Event: Digital Curation and Preservation Outreach and Capacity Building Event

From an email announcement....

Digital Curation and Preservation Outreach and Capacity Building Event
14-15 September 2009
Holiday Inn, Belfast, Northern Ireland
http://www.dcc.ac.uk/events/capacity-building-belfast-2009/

The Joint Systems Information Committee (JISC) and the Digital Curation Centre (DCC) in co-operation with the Strategic Content Alliance (SCA), the Public Records Office of Northern Ireland (PRONI) and the Digital Preservation Coalition (DPC) are delighted to announce that we will deliver a joint two-day workshop in Belfast to help to establish more effective digital curation and preservation networks of support across the UK and between domains of public sector activity.

In particular, this workshop will explore:

the current capacity of small and medium-sized organisations (including universities) to effectively undertake long-term preservation of digital materials
how the mixture of organisations and support agencies in the area of digital preservation and curation can best work together, and how they relate to international initiatives
recent and emerging technical developments in the curation and preservation field

The workshop will provide a mixture of presentations, breakout sessions and practical exercises and aims to:

inform the ongoing refinement of the content and objectives of training and professional development courses for the widest possible audience.
establish requirements for curation and preservation support, advice, and guidance for various domains from both a local and UK wide perspective.

To this end, the workshop will provide half-day taster courses for both Digital Curation 101 (DC 101) and Digital Preservation Training Programme (DPTP) courses.

Benefits of participation will include:

Participation in this workshop will provide registrants with an opportunity to establish peer support networks to share their concerns, experiences, and approaches both with colleagues from Northern Ireland and the rest of the UK for digital curation and preservation activity within their institutions. The workshop will also enable participants to help inform the development of future training and professional development courses to ensure that they are fit for purpose.

Following this event, a short report will be drafted offering:

Recommendations for improved local and UK-wide communications and interactions between a range of support networks and public sector institutions;
A summary of local and UK-wide training and professional development requirements as gathered from the workshop participants.
A set of recommendations from a local and UK-wide perspective to inform the future development of digital curation and preservation training courses.

The venue:

The workshop will be held at the Holiday Inn, Belfast. If required, accommodation will be provided on-site for participants for the nights of September 13th and 14th 2009.

Holiday Inn
22 Ormeau Avenue
Belfast
BT2 8HS

For directions to the venue, please see http://www.ichotelsgroup.com/h/d/hi/1/en/hotel/bfsoa/transportation?start=1.

Registration:

Registration is open to participants from the university; schools; library; cultural heritage; local government; health; public broadcasting sectors. Preference will be given to participants from Northern Ireland but will also be open to eligible registrants from the rest of the UK and from the Republic of Ireland. Registration is free and participation is limited to 40 participants.

To register for this event, please complete the online form at http://www.dcc.ac.uk/events/capacity-building-belfast-2009/register.

Technorati tag:

Digital Preservation

Wednesday, July 08, 2009

Blog post - DH2009: Digital Lives and Personal Digital Archives

Jeanne Kramer-Smyth blogged about the Digital Humanities 2009 conference, including the session "Digital Lives: How people create, manipulate and store their personal digital archives." The Digital Lives project sought to create "a better understanding of how people manage digital collections on their laptops, pdas and home computers." The research was conducted by interviewing 25 people in-depth.

Kramer-Smyth did a nice summary of the session (which I cannot in good faith summarize even further) and provides links to additional information.

Technorati tags:

Archives,

Digital Repositories

Thursday, September 04, 2008

Report: The International Survey of Institutional Digital Repositories

The company Research and Markets Ltd. has issued a reported entitled "The International Survey of Institutional Digital Repositories (cost EUR 96). The report description says:

The study presents data from 56 institutional digital repositories from eleven countries, including the USA, Canada, Australia, Germany, South Africa, India, Turkey and other countries. The 121-page study presents more than 200 tables of data and commentary and is based on data from higher education libraries and other institutions involved in institutional digital repository development.

In more than 300 tables and associated commentary the report describes norms and benchmarks for budgets, software use, manpower needs and deployment, financing, usage, marketing and other facets of the management of international digital repositories.

The report helps to answer questions such as: who contributes to the repositories and on what terms? Who uses the repositories? What do they contain and how fast are they growing, in terms of content and end use? What measures have repositories used to gain faculty and other researcher participation? How successful have these methods been? How has the repository been marketed and cataloged? What has been the financial impact? Data is broken out by size and type of institution for easier benchmarking.

The web site gives a table of contents and a short sample.

If you've never ordered a professional report before, you're first question will be "is it worth it?" Good question. EUR 96 ($138) may not be a lot of money for some organizations. And it might be what some are willing to pay for specific pieces of data. Experience has taught me that if you are willing to pay the report's cost for a few key data points, then it's worth it. For example, are they specific pages in the table of contents that you view as being "must haves"? If the answer is "yes" and you can bear the cost, then do it. If you can't see any data that you need or the cost is too much, then don't order the report. It could be that the data is actually available elsewhere (sometimes that's true) or that you can use other -- more readily available data -- as a substitute.

Technorati tag:

Digital Repositories