Analysis & Commentary:
META DATA CONUNDRUM CARRIES ON
by Richard Adhikari
A new approach to managing meta data currently in the works will free CIOs
from having to cope with constantly changing technological standards. The
Needham, Mass.-based Object Management Group (OMG) is leading work on this
approach with the support of major software vendors, including IBM, Sun
Microsystems and Oracle. The new approach consists of a model-driven
architecture that will transcend individual technologies so it will be valid
even when underlying technologies change. To create the new architecture, its
builders have expanded the meaning of meta data. They are also leveraging the
object-oriented paradigm so that corporations can create reusable meta data
objects from data about their existing resources -- including computer
languages, file systems and software architecture -- by wrapping the data with
Extensible Markup Language (XML) descriptions.
The new architecture will be the sole enterprise meta data management standard
available because the OMG merged last year with another standards body, the
Meta Data Coalition, whose efforts were built around Microsoft's Open
Information Model (OIM), and OMG is merging OIM-based work with its own.
But XML is fragmented, and pulling together cross-industry definitions is a
Herculean task. Users do not care how elaborate the standard is; all they want
to know is whether it will help them do their jobs better.
The existing definition of meta data -- that it is data about data -- is
dated, said Sridhar Iyengar, chief software architect at Blue Bell, Pa.-based
Unisys Global Industries, one of four Unisys Fellows and a prime mover and
shaker in the meta data management arena. According to Iyengar, meta data "is
the missing link between data and meaningful information" and, as such, it can
be a wrapper around anything as long as it "describes the essence of what data
means, how it is used, how it is extracted and how it is manipulated."
Iyengar said that meta data is now defined as "any kind of information that is
used in the development, design, deployment and management of a computing
infrastructure." It also contains "descriptions of business, data, warehouses
and other things."
This expanded definition of meta data will let developers use meta data as a
tool for integrating computer systems, said Stephen Brodsky, software
architect and development manager at IBM's Application Integration and
Middleware (AIM) division in San Jose, Calif. While it does not matter which
language the meta data is expressed in, XML has become the language of choice
because it is a good means of transmitting data between different systems.
One example of leveraging XML meta data wrappers is an application created by
custom software developers Interface Technologies Inc, Raleigh, N.C., for a
Boston-based client. The client, startup firm Virtual Access Networks, wanted
an application that would help end users migrate personalization data such as
bookmarks and address books from their existing desktop systems to new ones
during upgrades. Normally, such data is not migrated during hardware or
operating system upgrades.
Interface developed a client/server application that created XML meta data
wrappers around end users' personalization data so they could upload and store
the data with its wrappers on Virtual Access Networks' servers. The end users
could then download the data once their new desktops or operating systems had
been installed.
Wrapping a data object with a description of that data in XML allowed
Interface to "make a self-describing package that can be sent back and stored,
and can be easily translated into any other format like HTML and WAP" because
the meta data description gives developers the context to know what they are
manipulating, said Interface president Kelly Campbell. So, Virtual Access
Networks can provide users access to their meta data on its servers over any
client, including cell phones. "The data itself is well represented so all
they have to worry about is the presentation layer," Campbell said.
The OO paradigm reapplied
There are many kinds of meta data from various domains -- computer resources,
languages, file systems and databases, as well as software architecture like
Enterprise JavaBeans (EJBs) and messaging. To integrate systems, developers
need an architecture to consistently describe these different types of meta
data, IBM's Brodsky said.
That is where object-orientation comes in. UML is a common modeling language
for application development and it is an OMG standard. It was extended to
create an object-oriented meta data standard, Meta Objects Facility (MOF). MOF
consists of the core OO model from the Unisys repository, Urep, which was
integrated with UML. MOF lets developers build meta data for the various
domains in a consistent, object-oriented fashion, Brodsky said. IBM and other
major vendors are working with the OMG to standardize these key meta data
domains by creating models of the type of information to be obtained from the
domains.
Model-driven architecture The OMG's new approach will take meta data
management to the next level. Called the Model-Driven Architecture (MDA), it
is being built around MOF and UML.
"Models and meta data are at the core of this new architecture, not RPC-based
architectures like Java or SOAP," said Unisys' Iyengar, who is the primary
architect of MOF. That will make MDA middleware-neutral, because "instead of
mapping middleware to small concepts like interfaces, you use UML to model
interfaces, relationships and business rules, so you can work at a higher
level of abstraction," Iyengar said.
CIOs will be insulated from changes in technology. "The problem with our
industry is that the technology changes so fast that you spend more time
changing to newer technology than getting your work done," said OMG chairman
Richard Soley. The modeling approach "lets you bridge between technologies
because, if you start from a common model, translating between technologies
gets easier and that means meta data is stored in one way for all your
applications while it may be expressed in different applications differently."
For example, Soley said, meta data could be stored as Java code in one
application and as a Sybase database schema in another.
MDA will provide best practices and industry-standard models, standards and
meta data formats, and as new technologies emerge, the OMG will provide
mappings to them in MDA, Unisys' Iyengar said. MDA will be mapped to CORBA,
Enterprise JavaBeans, DCOM, SOAP, the OMG's Common Warehouse Metamodel (CWM),
Microsoft .NET and other standards. SOAP, which has been jointly agreed to by
major vendors, including IBM and Microsoft, is described as "the middleware in
the XML standard" by Mike Blechar, vice president, Internet and e-business
area, Gartner Inc, Stamford, Conn.
The OMG is expected to make MDA its formal architecture in the third quarter
of this year, and MDA "will be OMG's architecture for the next 10 years, and
includes its previous architecture, OMA," Iyengar said.
A unified model
Until August 2000, there were two meta data management standards: the OMG's
Common Warehouse Metamodel, and the Meta Data Coalition's Object Information
Model (OIM), which was created by Microsoft. At the end of August, the OMG and
Meta Data Coalition merged, and OIM is being subsumed by CWM.
CWM is built on UML, XML and XMI. It establishes a common meta model (a model
about models) for warehousing and also standardizes the syntax and semantics
needed for import, export and other dynamic data warehousing operations. CWM
supports data mining, transformation, OLAP, information visualization and
other end user processes. Its specifications include application programming
interfaces (APIs), interchange formats and services that support the entire
life cycle of meta data management, including extraction, transformation,
transportation, loading, integration and analysis.
Standards spaghetti
A whole slew of standards revolve around MDA. There is MOF, which lies at the
core of MDA. Then there is the Java Meta data Interface (JMI), a mapping from
MOF to Java. Because MOF is an abstract model, there are also mappings from
MOF to XML and Interface Definition Language (IDL), an OMG standard for any
CORBA environment, similar to Microsoft's and DCE's IDLs. MOF is an extension
of UML; it ties in with CWM, which includes OIM. MOF will be integrated with
J2EE, and a New York-based company called MetaMatrix is "the first company
that said they will do this integration," Iyengar said. JMI will be integrated
with J2EE.
Gartner's Blechar provides an overview: Evolving standards for communications
and interoperability such as XML and J2EE components, Microsoft's SOAP, COM
and .NET are beginning to converge "because of the need for companies doing
B2B or B2C or having partnerships in the supply chain and value chain to
communicate with their partners or suppliers."
For middleware interoperability, the de facto standard is XML, inside which is
SOAP, which "is being jointly agreed to by major vendors like IBM and
Microsoft as a means of passing objects back and forth," Blechar said. Beneath
that, there will be the Universal Description, Discovery and Integration
(UDDI) standard. This will be a sort of Yellow Pages in which companies can
list their services and contact information. The services will be defined in
Web Services Description Language (WSDL), an XML-based language that defines
Web services and describes how to access them.
Meanwhile, the market is being divided into Microsoft and non-Microsoft camps
with J2EE and XML being outside of both sides, Blechar said. Business partners
will communicate through XML and within their organizations they will have to
build components either as .NET or J2EE/CORBA components, he said.
XML: Wishful thinking or reality?
While XML is an excellent means for exchanging information between systems,
some industry experts have doubts about whether it can provide the backbone
for standardization efforts of the magnitude that OMG is eyeing. There are
"hundreds" of XML protocols, according to Razmik Abnous, chief architect and
technology vice president at content management software vendor Documentum
Corp, Pleasanton, Calif. These protocols need to converge to the point where
there is "one protocol in a vertical industry for a specific domain of
expertise" before XML can pay off as a common meta data language, Abnous said.
That standardization of business definitions and naming conventions sounds
easier than it is, according to Art D'Silva, manager, data warehouse planning
and integration at the Royal Bank Financial Group, Waterloo, Ontario, which is
Canada's largest banking institution with approximately $230 billion in
assets. "You have to figure out which businesses you are working on and how
you map the definitions to different businesses," he said.
Data definitions can differ even within a business. For example, different
groups within a bank will look at the same data, such as interest amounts,
from different perspectives and know the data by different names, D'Silva
said, so corporations have to make sure each business unit understands the
information in the proper context. Wrapping meta data around the data to give
it context is easier said than done because "you have a variety of contexts
and I'm not sure anyone has stepped up to doing that just yet," D'Silva said.
For meta data capture and management, Royal Bank Financial Group uses the
ETI.Extract tool suite from Evolutionary Technologies International, Austin,
Texas. The company has the standard corporate mix of mainframes, mid-range
boxes and Windows NT boxes. Meta data captured from legacy applications
consists both of definitions of data as well as technical constructs of what
information exists. This meta data is then stored in Islandia, N.Y.-based
Computer Associates' Platinum Repository.
The Royal Bank Financial Group's legacy applications include "a lot of fairly
complex file structures that are hierarchical structures in a flat MVS
environment," said D'Silva, as well as files in an NCR Teradata environment.
ETI.Extract lets users extract, transform, consolidate and load data from
incompatible data sources, generating conversions that automate data
transformation, reorganization and integration. It also provides meta data
management capabilities that let users document, track and manage progress as
they develop repeatable processes to consolidate data.
Documentum's Abnous agreed that it will be difficult to enforce a set of
all-encompassing XML standards. Instead, he sees a Unix-type compromise where
there is one core definition in XML on which everyone agrees, with each vendor
building its own specialized flavor of XML on top of that.
There is also the issue of performance. Addressing a lot of entities within
XML does not allow for a high-performance repository, and, in the short term,
there will not be high-scale, high-performance XML repositories, Abnous said.
Instead, relational databases will continue to be a source of meta data
because "we've spent years perfecting performance of meta data coming out of
relational database engines."
Even if an all-encompassing XML standard could be established, it would have
to be dynamic because "the requirements of what's needed and the information
that would be included in meta data today may not be the same tomorrow," said
Charles Meyers, vice president of technology at Rockville, Md.-based Computer
Technology Associates Inc., a company that provides Internet solutions and IT
services to the private and public sectors nationwide. Also, the standards
must be highly extensible so users can capture new meta data without having to
revise them.
Pragmatism drives users
As a solutions provider, Computer Technology Associates employs a classic
user's viewpoint. "Our focus and approach to projects is situational—it's
based on users' requirements," said Meyers. "You can have all things to all
users but it starts becoming overwhelming and then you have recursive layers
of management—meta data to manage meta data to manage meta data." If a
corporation has an enterprise-level solution in mind, it can create a meta
data layer across all the corporate applications; if it is a project-oriented
solution, an application that tracks and captures statistical information for
a historical view may be all that is necessary.
"There is no silver bullet -- no ultimate meta data interface," Meyers said.
XML is one standard, but only one, for meta data. Meyers said that the OMG's
MDA will only be adopted if it provides increased efficiency to users. "Just
because the definition has been designed doesn't necessarily mean it will be
used," he said.
As the OMG and vendors move ahead with plans for XML and the MDA, they would
do well to remember that users will buy what is useful, and, especially in
these days of tight budgets, will not be blinded by technoglitter.
Nine tips for managing meta data
Christine Mandracchia, principal consultant at KPIUSA, a Flanders, N.J.-based
consultancy focusing on data administration, business rules and other areas,
co-developed a meta data framework with one of her colleagues some years ago.
She says IT managers have to bear the following points in mind as they manage
meta data:
Whatever you have to do with the data, you also have to do with the meta data.
"When you model and analyze data, you also have to model and analyze meta
data, as well as build meta data stores and figure out how you get it into and
out of the stores," Mandracchia said. Therefore, it takes as much effort to
work with meta data as it does to work with the data itself. Prioritize the
meta data as you would prioritize the data, and scope the meta data. "You may
not need meta data about every part of your system, but only for the more
mission-critical and volatile systems," she said. "Otherwise, there's too much
work involved. Figure out from which systems you can get an IT and business
benefit." Una Kearns, XML architect at Pleasanton, Calif.-based Documentum and
a member of the board of directors of the Organization for the Advancement of
Structured Information Standards (OASIS), said corporations need to establish
correct definitions for information being managed so they can reuse it across
the organization. According to Kearns, to do this, corporations must:
Understand what type of information is being managed in the organization. See
how that information is used across the organization. Establish a steering
committee across different departments and business units within the
enterprise to help understand the type of data being managed and how it is
used. Get together with different customers and partners in its supply chain
if it is looking at vertical standardization. Provide effective ways of
capturing the information correctly and entering it into its system once it
has defined the data. This task includes making provisions for automatically
checking new information entered into data dictionaries and updating data
dictionaries. Maintain multiple data dictionaries.
Richard Adhikari is a widely published high-tech writer based in Silicon
Valley. He can be reached via e-mail at radhikari@earthlink.net.
|