|
Monday, July 16: Fed up of clearing the clutter on your
office desk everyday? How does the idea of a ‘paperless office’ sound? Yes,
that’s true. Since a long time it has been believed that we are fast moving
towards a paperless office where everything right from the paper to your
microfiche, and microfilm documents for use on the Web will be digitised.
But again this is easy to dream but hard to conceptualise. There is a complex
process that takes careful preparation and the right tools. Scanning -- making a
digital picture of a physical object -- is straightforward compared to the other
gyrations the data may need to undergo. These include conversion into the most
suitable file format, pouring it into a database, optical-character recognition
(OCR) to convert those graphics files to editable text, and indexing to make it
searchable. There are decisions to make at every turn and no right answers --
the choices depend on your particular needs.
Take the case of File Formats. The format you choose to store the scanned data
depends on many factors like:
• Do the pages need to be represented as graphics or text?
• In color, grayscale, or black-and-white?
• Are small files, for quick transmission over the Internet, important?
• What computing platforms will viewers be using?
If making data available to Internet users running a diverse variety of
platforms is a priority, consider saving images in one of the Web's standard
graphics formats, such as JPG, GIF, or PNG. You can also make files available in
Portable Document Format(PDF) for use with the Adobe Acrobat Reader.
If your primary concern is reducing download time, you might want to consider a
specialized viewer such as MrSID. These tools produce small files that can be
scrolled and scaled from within the browser. However, they only work with
Windows and require a proprietary plug-in.
If searching the full text of the data is necessary, or you want to make the
data available as HTML or XML, you'll need to use OCR to convert the graphics
files to text.
But like a library without a card catalog, your digitalised data is useless if
you cannot find the information you need. That's why the scanning process is
usually accompanied by manual indexing -- entering information about the
documents (for instance, author, date, and subject) into a database. That
database can then be linked to a Web form for online searching. But the Dublin
Core Metadata Initiative offers a framework for creating a resource-description
index, or "metadata," which works especially well for information shared on the
Web. The Dublin Core includes 15 basic categories of resource-description
information, including title, creator, subject, and language. If your goal is to
publish the information on the Internet, storing metadata in a standard format
such as Dublin Core makes it easier to share the information with other sites
and search engines. Building a quality index often takes more time than the
scanning process. Experimental systems that perform automated classification and
description exist, but are not by any means perfect.
Scanning
Scanning can be done in-house, given the right equipment and the manpower to use
it, or outsourced to an imaging service. As for scanning Microfilm And Fiche the
Microfilm and microfiche scanners are relatively complex and expensive machines.
Production scanners, high-capacity machines for large-scale conversion, cost
tens of thousands of dollars. The simplest option -- black-and-white graphical
scans -- is also the least expensive. But at Deines, a 16-mm microfilm scanning
runs from 7 cents to 20 cents per image. 35-mm microfilm starts at 20 cents per
image. According to a company representative, the factor that most influences
cost is the amount of indexing involved.
No doubt, transferring information to digital format can mean significant
investments of time and money, but the results can be well worth the effort. For
eg. Ancestry.com recently digitalised the entire 1790 and 1920 U.S. Federal
Censuses, which were formerly available only on microfilm. The site makes the
data available on a subscription basis, creating a new revenue stream while
helping genealogists.
|