Two recent initiatives from IBM are promising in terms of information integration and discovery.
Of course, since IBM lost to Microsoft its mantle as the most monolithic and pervasive entity in the I.T. world, it’s been working hard to re-invent itself. It’s even sold (to Lenovo, a Chinese company) its PC hardware business – the very business that fostered microcomputer standardisation, allowed Microsoft to gain pre-eminence, and ate away at its traditional, mainframe business. Their business is currently split between software, services, and mainframe hardware. Mainframes are now a niche market, and it’s their software innovation that garners attention.
On imminent release is Viper, software technology for their DB2 database platform which, amongst other things, allows for “native” XML databasing. The presentation I attended last week gave me the impression it permits admixtures of data with XML-defined data, but I’d be quite cautious about that until I could see it in action.
This is quite a dramatic initiative*, providing some enabling technology for the Semantic Web (discussed here and here). For me, the significance lies not simply in its ability to handle XML – which can be done in proof-of-concept by any number of vendors – but that it can do this natively, as an integral part of its DB2 product.
Also announced is IBM’s Content Discovery for Business Intelligence. Although this is a part of their WebSphere (application server) product range, in concept it permits pervasive business intelligence throughout an organisation’s structured and unstructured data. Provided, I presume, the unstructured data has been sufficiently tagged (manually or automatically). The announcement is careful not to include the term “data mining” so I’m a bit suspicious of its “discovery” nomenclature. Business Intelligence involves specific query, analysis, and reporting functions, whereas data mining is more a discovery of trends – the difference between asking specific questions and asking what patterns are in the data.
We’ll find out the full story when the dust settles. Still, access to unstructured data is nothing to be sneezed at. And if Viper can’t immediately database extant web pages, be sure that that’s the direction they’re going.
*1-June: In fact, it's been said to be not that dramatic, that Oracle has had native XML support for some time. I guess it's down to how genuine that "native" label is, and how they mix XML and non-XML. Comments welcome.
(Viper also adds range partitioning, which I can see being particularly useful in a data warehouse/business intelligence context.)