Tuesday, July 11, 2006

Tech: Google Books - another revolution in knowledge

Google Book Search is a project that, ideally wants to scan all books in existence.
When I first heard about this - launched in late 2004 - I couldn’t make sense of it. In particular, how would they get around copyright issues?

However, an absorbing article by Kevin Kelly clears up the mystery. Their solution is one that few other organisations could do: in partnership with a number of universities and publishers, they:
a) scan everything;
b) restrict access to books currently under copyright; and
c) fight out the resultant lawsuits.

It’s a particularly laudable concept: to digitise and make searchable all human knowledge – or rather, its proxy as represented by books. Its library partners are Stanford, Oxford, Harvard and Michigan Universities, and New York Public Library. Technology includes a rather large robot (from 4DigitalBooks) that delicately turns the pages of even rare books at the rate of 1000 pages per hour.

Kelly’s article is well worth reading, as it dishes up in depth information and prognostications on the future of copyright. He says that there are 32,000,000 books currently catalogued, and of those, an estimated 15% are out of copyright and in the public domain. About 10% are actively in print and clearly in copyright control. But the rest – about 75%, or 24,000,000 books – are in a grey area. That is, it is not clear who the current copyright holder is: whether the publisher still exists, whether the author is still alive, or their whereabouts. That’s quite a sobering thought if one stopped to try to track down copyright. Fortunately, Google didn’t.

Kelly’s brave new world is but one view. Which world eventuates is really a social and cultural question, which will be mediated in the political sphere. Don’t expect optimal results. For one, we cannot guarantee how much of those 85% will ever be available in digital form to the world at large. Yet I’m betting that the political outcome will be largely favourable to this project. Another stumbler in this interconnection of knowledge is whether facts in a given book (where applicable) are both correct and not in dispute. One person’s knowledge is often another person’s tripe.

Further, we should expect inaccuracies to creep in – in the scanning process, and in the original book. If you’re reading a digital book some time in the future, and suddenly it doesn’t make sense, maybe the robot scanner turned two pages at a time. Or maybe someone was eating lunch while reading the library’s copy, and splattered the book. Or ripped out a page or two, as some nefarious people have wont to do. Etcetera.

As I found out, another issue is the relevance of Google’s indexing and search results. As an example, I tried the famous opening lines from Moby Dick. “Call me Ishmael” got over 1000 hits, and I didn’t find the book in the first few pages. "Call me ishmael. Some years ago" gets 51 hits. [In fact it’s 21, but only on the third page does Google say this, and present the last entry. Strange bug.] However, Moby Dick only appears 13th, or about two thirds of the way down. The rest is a admixture of different types of books quoting the first few sentences. And I couldn’t look at the book, although I’m pretty sure Melville’s been dead longer than 70 years. However, I give Google credit for the project, and acknowledge that it’s still in beta.


Having said all that, technology is a wonderful enabler. Kelly’s vision of all book knowledge being searchable and cross-referenced is a powerful one. We’ve already seen major changes in approaches to publishing, and we should expect to see even more upheaval – as has been happening in the music industry. Publishers will have to adapt – fairly quickly. Vanity publication (“I want this book in print, publishers turned me down, so I’m paying to print it myself”) has already become easier – you can do it on the web. This may then force the author to think more deeply about why they want to publish. Using the Google model, I can guarantee that over time that book will be read by more people than if it was on paper. But some authors want that physical artifact – many readers, too. So be it.

Envisioning the future is fraught for all sorts of reasons. But it’s worth reading Kelly’s vision, to get the imagination active.

No comments: