More on Yahoo! book scanning

Share This Post

Here’s an interesting report on the Internet Archive-hosted book scanning party held this week in San Francisco. Apparently, Microsoft has gotten involved — anything to counteract Google. Also, if you just want to cut to the chase, here’s an excerpt into how the book scanning is apparently being handled,

“While Google has released few details of its scanning project (the search company has nondisclosure agreements with its library partners), the Internet Archive had a display of its technology at the Tuesday night event.

The Internet Archive built a specialized scanning machine and written open-source software called Scribe for the specific purpose of digitizing books. The “machine” is an assembly of a standard PC with the Scribe software installed, two Cannon EOS cameras, a pedal-operated glass and metal stand to hold and secure books at an angle, along with a table and chair. The machine looks much like a photo or voting booth, with black cloth covering a box frame and shielding the books and computer gear from ambient light.

The chair seats one person, who operates the computer program and turns book pages by hand. During the scanning process, the book sits at a 90-degree angle under glass, which protects it from the camera light and causes the least amount of damage to its pages, according to the Internet Archive. The operator pushes a pedal under the table to release the book from under the glass, and turns the page before it’s ready to take another picture.

Once a picture is taken, both pages of the book appear on a computer screen in their original form. The Scribe software then finds the center of the page and makes adjustments of the picture’s angle or ensures that it’s cropped properly. It will also clean up any poor coloring and make it uniform.

The operator enters some metadata about the book–its author, title and publication date. And once the book is scanned, it’s then saved to the system and catalogued. Scribe takes the metadata from the book and matches it with data from existing card catalogs in order to prevent duplication. The work is then added to the digital record.

It takes roughly one hour to scan two 300-page books. And it costs an estimated 10 cents a page, split among data storage, labor and equipment and administration fees, according to Brewster Kahle, the project’s leader. The cost does not take into account libraries’ fees for getting the book to the scanners.

Daniel Greenstein of the University of California’s archive project said that his group has donated $500,000 to assess the ultimate costs of scanning from the libraries’ perspective.

The Internet Archive currently has 10 scanning machines, but it is ramping up to build 10 more in the next year.”

I guess our quesiton is, with all the cool book scanning technology out there that we saw at AIIM, why? I guess we’ll need to call Brewster Kahle to find out.

More on Yahoo! book scanning

Subscribe To Our Newsletter

Get updates and learn from the best

Latest Blog Articles