Next Article Table of Contents Previous Article

Analysis & Commentary:

XML'S REALITY CHECK: DATABASE MANAGEMENT

As reported by Susana Schwartz, as service providers move toward integrated, end-to-end managed IP and digital services, document management and Web content management will need to come together. That will increase the amount and types of data to be manipulated, stored and accessed. Whether scanned images, graphics, Web content or digital audio and video clips, there will exist a unique need for storage, access and delivery.

Although there is little question that XML already is promising to be the key to integrating infrastructure among service providers, content-based enterprises and network providers, a lot of hype surrounds the marriage of XML and databases for superior data management.

Despite its many attributes, XML is not a silver bullet, as it is not ideal for storing, representing or transporting all types of data. "Rather, XML may be the catalyst that increases the license revenues of traditional DBMS [database management system] technologies. "Delivery of XML capabilities is an integral part of the overall trend of DBMS vendors moving toward management of unstructured data," says senior research analyst Ted Friedman, author of many research notes on the future of XML and databases for the Gartner Group.

"To prevent failed projects, service providers must first realize they don't have to replace their existing RDBMS [relational database management system] technology, but look to emerging XML database technology in a targeted way for support of metadata management, content management, publishing or other B2B application areas," says Mike Galvin, vice president of operations and development for BT OpenWorld, the mass market Internet division of British Telecom.

For the past six months, the organization has been using XML for public-facing content, which it manages with WebLogic servers from Vignette. "There, XML provides us with program-level interactivity and transactional capabilities for simpler on-line applications, such as gateway applications for short messaging, registration and billing interfaces where 'handshakes' are necessary with customers' PCs," Galvin says. BT also uses XML for its business-to-business interfaces in DSL provisioning. "We use XML to exchange data between us, network providers and other suppliers about order information, appointments, activation and billing, as well as trouble tickets," he says.

When the Rubber Hits the Road

With emerging models of XML database technology, there are two primary ways for using XML. The first is a document-centric model (as with XHTML), involving semi-structured documents with free-flowing content. The second is known as a data-centric model (as with SOAP), where XML data resides in a relational database or similar repository. In this case, XML is a storage or interchange format for structured data exchanged in a decentralized, distributed environment.

"Regardless of the model, XML must be stored in a repository that allows for more sophisticated storage and retrieval of the data," Galvin says, "particularly in cases where multiple users will have to access the XML."

With all that XML promises for representing data, it still lacks true data management. There is no question that existing DBMSs can manage prodigious amounts of data, but there is doubt that they can effectively share XML data among many users. The problem is that most XML is generally document-centric in nature, while most DBMSs are data-centric. That difference could become an obstacle once service providers are forced to choose the appropriate data management and integration architecture.

"If your content is still more data-centric, and applications feature a lot of numerical reporting for sales and consumption, then a relational database is the way to go," says Philippe Vauclair, vice president of business development for XML database start-up Ixiasoft. The company plans to bank on its experience with indexing and searching through massive volumes of unstructured data-inherited from its parent company, Cedrom-SNi, a leading Canadian provider of online news content.

"Although the XML database space remains marginal in terms of the overall database market, it is destined to grow in size and share as XML truly takes flight. XML databases are not object databases revisited," says Vauclair. He notes that the principal selection criteria for developers moving toward XML databases tend to be reduced development efforts and resources required to complete a project, superior performance with reduced hardware required to run servers and reduced licensing fees. He contends that, with more than 1,000 servers deployed, XML database growth reflects its suitability for content-to-knowledge applications. "XML databases are more ideal where you search, consult, comment and pass the information along to others," he says.

In Vauclair's view, Ixiasoft's XML database delivers performance superior to relational databases with regard to load capacity and server execution in Web applications. He asserts that customer benchmarking reveals an average 30 percent increase in efficiency. Despite their advantages, however, Vauclair admits that XML databases by no means replace relational database technology for everything that is relational. Instead, he says, "XML databases will be a complementary technology that can expand the role of databases in the management and access to semistructured or unstructured content."

AT&T, one of Ixiasoft's larger customers, is looking at how XML can lend design flexibility and performance to some applications where RDBMSs fall short. "AT&T had myriads of engineering maintenance documents and procedural/technical manuals that are text-, image- and graphics-oriented," explains Vauclair. At first written in SGML, the content now exists in XML. "With an SGML format, there is a need for a fat client to access that data," he says. "Therefore, AT&T needed an SGML station on every computer, which proved impractical and expensive."

In order to move to a thin client and browser application for such textual content, AT&T has converted SGML to XML and moved from a relational to an XML database. "They had to deal with elements for fields, such as 'author,' "but if they had two authors or anything out of the ordinary on documents, the relational database got confused, and they had to go back and create more 'author' fields. Then they were creating and allocating space to information that might or might not be there," Vauclair explains. Putting textual content into relational databases often produces "empty spaces" that cause inefficiencies, which add up to overwhelming volumes over time, he adds.

The point Vauclair and other XML database vendors make is that free-form data requires the ability to accommodate changes in content or fields that RDBMSs don't achieve efficiently. "When there is an important element, you have to create a new field within columns and rows and determine how they relate to one another," Vauclair says, and that costs time and money. "With XML, you just instruct the database structure to pick up information under the new element, and it will do so and build data on that, even retroactively. Then, you can change the database structure without modifying applications built on top of that, which allows for greater flexibility."

RDBMS proponents answer that today's relational databases are fast and flexible enough for existing needs. According to George Demarest, director of database marketing for Oracle, OracleNC/SQL is "plenty fast" for today's XML needs. "The XML database vendors sometimes claim that you add overhead every time you transform XML into SQL data." But, he says, "fatness" is not an issue right now. He believes service providers should be more concerned about making sure the underlying infrastructure provides good enough performance to handle XML data.

Oracle, for one, is looking at capabilities for handling XML on a more massive scale for future needs. Demarest says Oracle is working toward storing a terabyte of XML data in its 9i database. "However, service providers right now are storing XML in tens of gigabytes, so terabyte databases aren't necessary. It's still predominantly a SQL format data world." He expects the real issue will be to make SQL and XML co-exist. Some XML database standards, he says, are already on the drawing board, such as XML query (X Query) or SQL X-both aimed at easing SQL and XML interaction.

Demarest agrees that XML database offerings will complement relational DBMS technology, not replace it-particularly for document-centric applications with unstructured content.

"For now, you can get a short-term solution out of the XML database, but when you have serious enterprise workflows, and you have to interact with SQL data, that is not yet matured in XML databases," he says-and until service providers have an integrated environment to handle both, all database players are "providing a niche service rather than general-purpose business solution to XML storage."

Evolution of XML Database Technology

For the most part, XML databases seem to fall into two categories: those that are XML extensions to existing DBMS technology, and those that are not really databases at all, but ostensibly XML search engines. Of course, the former usually applies to the major DBMS vendors, the latter to smaller start-ups.

According to Gartner's Friedman, XML database technology will continue to follow the course of the hype cycle, resulting in a period of significant disillusionment during the next one to two years, "followed by renewed interest and focus on true value-added applications over the subsequent two to three years, and finally reaching maturity and delivering consistent business value."

Gartner identifies three styles of databases that will actually represent XML database technology -- XML-wrapped and XML-grounded DBMSs, and XML-aware repositories -- all of which exhibit persistent storage and management of data, the key characteristics of a database.

XML-Wrapped DBMS

One of the most common ways to store XML data is by mapping XML documents to a relational or object DBMS structure. "The mapping functionality is 'wrapped' around the DBMS, serving as a translation interface for XML," explains Friedman.

Some say this is not a true database technology: "In this model, XML documents must be manipulated so that they are stored according to predefined rows and columns of elements that adhere to RDBMSs' mappings and rules," according to Mike Champion, advisory R&D specialist at Software AG (which provides an XML-grounded DBMS). "That complicates things when XML documents are of unknown structure, or deviate from the mapping definitions of RDBMSs." Conversely, he notes, "data from the DBMS can only be retrieved in XML format based upon those same DBMS mappings."

Friedman agrees that the term "XML database" is a misnomer in this case, because XML documents are "shredded," meaning their elements are stored in rows and columns within the RDBMS, or objects within an object DBMS, based on predefined mappings.

Because it relies on traditional data models, this approach is used mostly for highly structured, data-centric applications where XML is just a transport for the data.

"For anything other than static XML documents, sequence of elements and additional comments could be lost, because the ability to index and search XML data is dependent on the underlying DBMS' indices and SQL queries," Friedman says.

XML-Grounded DBMS

XML-grounded databases store XML documents in their entirety, explains Champion, via proprietary database structures in which XML "chunks" or fragments form the foundation of the database. This is more of a document-centric approach, as the complete physical structure of the XML document is maintained; there is no need to tear it down into separate constituent pieces. It allows storage of diverse types of XML documents, reducing administration and support costs with XML-standard interfaces like Xpath and DOM (Document Object Model).

Unlike XML-wrapped solutions, these databases maintain the complete physical structure of the XML document. "It can, therefore, be retrieved in its original state," says Friedman. "Because the document is not broken into elements and attributes, no predefined knowledge of the document structure is required. This enables the database to store XML documents of varying and dynamic formats." Because XML documents are stored in their entirety, these databases support indexing elements within the XML document structure. "This enables searching for attribute values within documents," he says. Friedman believes these databases are best reserved for document-centric applications, as they are still immature in comparison to RDBMS technology.

XML-Aware Repository

An XML-aware repository comes closest to the idea of an XML database, since XML documents are stored in their native format within the repository. It supports data of a dynamic or less-structured nature. "The physical structure of the XML document is maintained in the database and can be retrieved in its original state, making it fully indexed and searchable," Friedman says. "It is well suited to applications in publishing and general document management, because it is managed at a user-defined level of granularity, with extensive indexing and searches possible across a breadth of dynamic data." He adds, however, that XML repositories "lack key attributes common in DBMSs, such as transaction management and administration tools."

As the capabilities of XML-wrapped and XML-grounded DBMSs expand, Friedman says, "there will be pressure on traditional vendors to expand their offerings for broader applicability."

Whether the DBMS giants will eat up the XML database start-ups, as happened in the object-oriented realm not long ago, is debatable. But most agree that the two worlds must at some point merge. "The goal is high-performance access, while maintaining a very granular level of update capability," says Gartner's Friedman. "It's wiser that companies wait before making any risky investments."

Gartner predicts that by 2005, the XML-wrapped and XML-grounded worlds will combine, resulting in a single style of DBMS with both data- and document-centric strengths. To meet future needs, a true XML DBMS will evolve, inheriting the transaction management, security, tuning, tools and administration features of existing relational databases with improved XML document management.

While there is little overlap between today's XML-wrapped and XML-grounded databases, all of the major relational database vendors already are working to support retrieving and storing XML in one form or another (see "A Closer Look at DB Vendors.").

However, most agree that in upcoming years, applications will demand that traditional RDBMS technologies come together with XML-grounded technology, ultimately maturing to include capabilities for integration with non-XML data sources, and thereby supporting data-centric applications.

A Closer Look DB Vendors

IBM

Database DOM requires the user to create a template file that contains the SQL-to-XML mappings for the query to be performed. Another approach is the one taken by DB2XML, which employs a default mapping of SQL results to XML data that the user cannot alter.

IBM's DB2 offers XML Extender, which provides new data types supporting storage of XML documents in DB2 databases. In the case of XML data stored in DB2, an entire XML document can be stored in a single column or can be shredded into traditional SQL data types and stored in multiple tables and columns reflecting the hierarchical structure of the document.

Microsoft

Microsoft identifies itself with XML more than just about any other company, as it pushed for a view of XML that emphasized its ability to transfer any data, not just documents. Despite that fact, critics say Office 2000 doesn't really support XML, or data anywhere, anytime, any way -- the entire foundation behind XML's integration capabilities.

In the meantime, Microsoft's SQL Server 2000 supports XML operations performed on relational data. XML data can be retrieved from relational tables using the "FOR XML" clause. For data-centric solutions, Microsoft's ADO.NET provides XML integration to such a degree that results from queries on XML documents or SQL databases can be accessed identically via the same API. In the connected model, .NET Data Providers connected to a database (such as SQL Server, Oracle or DB2) execute queries and return results. Those results can be read directly as a stream of data records. When executing a command against SQL Server, the user has the option of returning results either as a stream of data records or, using "FOR XML," a stream of XML.

Oracle

Oracle has completely integrated XML into its Oracle 9i database as well as the rest of its products. XML documents can be stored as whole documents in user-defined columns, where they can be extracted using XMLType functions such as Extract. They can also be stored as decomposed XML documents that remain in object relational form and can be reconstituted using the XML SQL Utility (XSU) or SQL functions and packages.

Software AG

Adabas DBMS has extended its capabilities into the XML arena with its Tamino offering. It is an XML-grounded system that provides broader functionality than simply XML data storage, manipulation and retrieval. Through a combination of an XML data store and a SQL engine, Tamino supports integration of data from a range of heterogeneous sources, thereby attempting to provide a DBMS suitable for a wide range of applications.

Source: Ted Friedman, Gartner Group

Top of Page


Previous Article  |  Table of Contents  |  Next Article