Tuesday, February 10, 2009

The Semantic Web (Death of the Gene, part 1)

Two recent articles - in American Scientist and New Scientist - purport to sound the death knell of our understanding of genetics. Interestingly enough, the New Scientist article is the more sensationalist, whereas American Scientist has the more meaningful one.

First, however, a diversion into computer science.

I first encountered the concept of the Semantic Web about four years ago, through a seminar presented by the W3 Consortium. The Semantic Web was envisaged as a successor to the worldwide web, something to better enable collaboration.

Web pages, written in Hypertext Markup Language, represent a rather unstructured way to navigate information. True enough, linkages are made from one concept to another. But on the whole the effect is a rather unstructured journey, with no instrinic meaning underpinning one's meanderings.

In contrast, the Semantic Web is intended to be a network of information in which the navigational links are imbued with specifically defined relationships, such that they could be machine-read. Web pioneer Tim Berners-Lee has referred to this as a Global Giant Graph in contrast to the worldwide web. Descriptive relationships are facilitated by languages designed for depicting data: Resource Description Framework, Web Ontology Language (OWL), and particularly XML (Extensible Markup Language), which is already in heavy use for defining data in a very wide range of contexts.

Why do this?, was the question that occurred to me at that seminar. The applications proposed were restricted to scientific fields such as pharmaceutics and bibliographics, somewhat esoteric to me.

But this set of design and representational principles is starting to make sense in fields in which collaboration is necessary simply because it is too difficult to keep track of a field that is constantly burgeoning, updating faster than any traditional publishing method, and too large for any one person or group to maintain. Thus, an ontology: precise specifications for a knowledge-classification system.

That could easily be a description of Wikipedia. Such an endeavour is not possible without the web, simply because it calls for such a vast community of contributors.

The same could apply to a more structured discipline, where structured relationships may be just as important as the single instance or 'article'. The ensuing structures, spread out over a large number of web sites, could then be data-mined for meaning.

There is increasing need for this in genetics, as we start to see the concept of a gene break down, and the need to build a large number of relationships out of a genetic code with billions of letters.

No comments: