DSstar Logo Data Intensive Storage Solutions For The Enterprise

Previous Article  |  Table of Contents  |  Next Article

Features - Enterprise Data Insights:

UNDISCOVERED DATA IN ENTERPRISE VIRTUAL METADATA

Leveraging Enterprise Data Assets - Data Management with Virtual Metadata and an Enterprise Data Dictionary.

Like the Internet cloud on a network diagram, enterprise data assets represent a nebula. The organization's staff don't know exactly where or how to find information they need; in many cases they do not even know what data exists. Many different factors have fractured an organization's data assets and complicated how the company uses, and stores, information in its computer systems:

  • Different Technologies; Legacy systems hosted on mainframes might have given way to modern servers with relational databases.
  • Different Data Sources; The organization probably put different types of information into different data resources.
  • Different Ownership; Different departments or acquisitions might have stored different systems for different systems.

To conduct its business effectively, an organization must find a way to cope with the proliferation of its data across systems. Metadata management using MetaMatrix's MetaBase provides organizations the means to describe all of their disparate data sources, from relational databases to legacy systems to Internet data feeds. These descriptions, literally data about the data, enable the organization to see the relationships between different information sources.

Managing Metadata With Metabase

MetaMatrix offers a product, MetaBase, that helps an organization discover and map the disparate data sources in its varied computer systems and describes the way the organization's data consumers use the information. Organizations can use MetaBase to create the physical metadata models that describe how the databases, data feeds, flat files, XML documents, and other data sources store information as well as the virtual metadata models that describe how the organization's users and applications use that information.

A complete view of the information at hand offers many benefits to the organizationand its members:

  • Clarity of Information

MetaBase presents a complete view of the organization's data sources. The organization's data modelers can model the different information systems using appropriate structures for each system. Data modelers can not only model the existing data sources, but can also capture some of the transformational logic its users perform when combining and using more than one data source. This offers a more complete and immediate snapshot of the enterprise's information sources than other methods. This separate, virtual layer of information can present a virtual database can present the way the organization's data consumers would like to see the information.

  • Single Repository for Metadata Information

The MetaBase Repository offers a single storage location for metadata information about all of the organization's data assets. The organization can safely maintain the information with version control and check-out/check-in model management capabilities. Within this repository, the organization's users can search throughout the organization's information sources for particular columns, tables, tags, or other information structures based on descriptions, names, and keywords.

Standards-Based Solution:

MetaBase takes advantage of several industry standards to ensure interoperability;

  • Meta-Object Facility (MOF) - The metamodels in the MetaBase are based on the Object Management Group's Meta-Object Facility standard.
  • XML Metadata Interchange (XMI) MetaBase uses XML Metadata Interchange (XMI)files for transmission and archiving.
  • Common Warehouse Metamodel (CWM)

MetaBase supports the collaboration standards implemented within CWM, including the use of XMI and UML.

  • Comprehensive Impact Analysis

MetaBase enables the organization to evaluate the data in its native data sources and the data's relationships with other data in other sources. With a clearer view of these relationships, the organization can identify overlap and redundancy—or find places where it lacks information.

  • Data Type and Data Dictionary Support

Using MetaBase, an organization can model its own custom data types and use those data types while modeling its physical data sources and the virtual layer. These extensible data types can create a cross-platform data dictionary, enforceable throughout the organization, and can use the data types in different models created with different metamodels, so that the organization can describe information from its disparate sources using similar constructs.

  • Easy Information Integration

If the organization wants to introduce a model-driven information integration solution, the MetaMatrix Server works with MetaBase to provide a single integrated virtual database for the organization's applications.

The MetaMatrix Server can use the physical and virtual metadata models as a map to the data stored in the organization's disparate data sources. The applications can place queries based upon the virtual metadata models, which can collect and combine result sets from the various physical data sources. The application receives a single result set, defined by whatever transformational logic and functions the data modelers have created.

Discovering Data Assets With Metabase

The organization can use MetaBase to model the information within its systems into a unified set of models and can manage those models and share them among people or business groups using a version-controlled repository. How does MetaBase do this? The key to managing data is managing metadata.

Mapping Data with Metadata -- What is Metadata

Metadata Describes Data Structures;

Metadata means data about data. A piece of metadata, called a meta object, contains information about a specific information structure. For example, an address book, in a very basic database, would probably include a field or column for the ZIP code or postal code number. For a basic United States address book database, this column or field would have the following properties:

  • Named ZIPCode
  • Five numeric characters long
  • Located in the StreetAddress table
  • Represents an identifier used by the United States Postal Service to hasten postal delivery

This definition represents metadata about the ZIP code data in the address book database. It abstracts information from the database itself and becomes useful to describe the content of the organization's enterprise information systems and to determine how a column in one enterprise information source relates to information in another database.

Metadata Models Describe Complete Data Sources

An organization can use discrete meta objects to build a complete picture of the organization's data sources. When assembled into a complete description of a data source, the meta objects form a metadata model which contains the full namespacing for each element within the data source. This metadata model offers many easy-to- navigate ways to browse metadata, including Unified Modeling Language diagrams, hierarchical tree views, tables, and searches.

Describing Different Data Source Types

Different types of data sources use different metamodels. The MetaBase Modeler uses metamodels to control its behavior. Metamodels define what constructs, properties, and terminology are available to describe information. The type of information a data modeler can capture in a metadata model comes from the metamodel. For example, the metadata representation of an address database's ZIPCode column has a length associated with it because the metamodel contains a construct called a column which has a property called length. The metamodel that describe the constructs within a Relational Database Management System is called the Relational metamodel. MetaMatrix MetaBase supports a number of metamodels out of the box and also enables each organization to extend the metamodels so that it can model its own custom data sources in such a way to relate to its or other sources. The OMG's Meta-Object Facility standard describes metamodels in terms of Package type, Class type, Attribute type, Key, and Associations.

This means that each metadata model created with a metamodel can have the following components, or meta objects, within it:

  • Package type - A container or namespacing that can hold one or more classes, packages, or combination of both. In the relational database world, packages include schema and catalogs.
  • Class type - An object description that can contain one or more attributes, keys, or combination of both. These objects contain collections of related individual information elements. A relational data modeler recognizes tables as examples of this entity type.
  • Attribute type - Attribute types belong to one and only one class type and describe individual pieces of information within the information source. These atomic elements include the relational database notion of columns. Each class type can have one or more attribute types.
  • Key - Each class type can have one or more keys, which uniquely identify each record within the class type.
  • Associations - These relationships connect one class type to another class type. These associations describe often link attributes that contain the same information so that data modelers can specify the equivalence and perform SQL join operations.

The MetaMatrix MetaBase Modeler supports several different metamodels that adhere to this standard, such as the Relational, Data Access, XML Document, and XML Schema metamodels. Because the MetaBase Modeler is truly metamodeldriven, an organization can extend the metamodels or create new ones and use them with the MetaBase Modeler.

Metamodels Share a Standard - Meta-Object Facility

The MetaBase Modeler uses metamodels to capture metadata according to the Object Management Group's Meta-Object Facility standard. The Meta-Object Facility standard defines a parent metamodel from which all other metamodels derive. Hence, companies can create other metamodels using the blueprint provided by the parent metamodel, called the meta-metamodel.

For more information about this standard, see the Meta-Object Facility (MOF) Specification available from the Object Management Group.

Mapping Data Sources - Importing Metadata from Relational Databases

The MetaBase Modeler lets organizations import the metadata information from a variety of sources, including JDBC-compliant relational database management systems or XML Metadata Interchange (XMI) metadata models. Importing models captures a great deal of the metadata properties each meta object needs and can faithfully create the structure of a database in meta objects. The organization's data modelers can choose to import all or some of the data source's layout. Once the data modelers have created the metadata model, they can refresh that model easily when the underlying data source changes. For example, if a production Oracle database gets updated with new tables, the data modelers can use a wizard within the MetaBase Modeler to automatically update their models with the database changes.

Importing Metadata from XML Schema

The MetaBase Modeler can also import the metadata structure from an XML Schema, providing a blueprint of XML documents. The organization's modelers can then model the enterprise's use of the XML information into its virtual layer to identify and map the other, non-XML data sources into XML. Organizations that use Simple Object Access Protocol (SOAP), message buses, Web services, or similar technologies to exchange information between different systems can show how the information in their non-XML data sources map to the XML documents used in the exchange.

Importing from Other Sources

The MetaBase Modeler includes an extensible importer plug-in system that allows organizations to develop their own custom metadata importers so that their data modelers can import metadata information from their specific, custom data sources.

Modeling Data By Hand

If the organization's data modelers cannot import the information for the organization's particular data sources, they can use the MetaBase Modeler to create the meta objects that represent the data structures. The MetaBase Modeler provides many different means to create and review metadata models, including a hierarchical tree, a UML diagram, and tables. This ensures that data modelers can create and edit models in the most efficient way for each individual.

Physical Metadata Models Map Actual Data Sources

When an organization creates or imports meta objects that describe its physical data sources, it creates physical metadata models. These physical metadata models can describe the storage-specific technical details, such as data type, scale, update characteristics, and other technical properties. The physical metadata models can also capture business user information that not only describes where the data is in the data source, but what that data means to the user.

Modeling Enterprise-Specific Data Types

The MetaBase Modeler comes with simple, common data types for common information. Data modelers can create metadata models with strings, integers, floats, and other common data types from the moment they install the MetaBase Modeler on their computers. The data modelers can also create their own custom data types to describe their organization's particular data sources. For example, if the enterprise uses a common information element, such as the ZIP code, the data modelers can create a data type specifically for use with ZIP code information. By defining a data type, the organization can create a data dictionary of its own and enforce standards in its metadata models. These custom data types help identify similar, and sometimes redundant, information within the organization's data sources. The data modelers can quickly and easily see all instances of a particular sort of information and its complementary information by reviewing these specialized data types.

Built-In Data Types

The MetaBase Modeler offers a wide variety of common data types to accommodate common data source data types. These built-in types include over 40 common data types. Organizations can create fully functioning models using only the built-in data types.

Why Model Custom Data Types?

Because an organization can model its information sources using only the basic, built-in data types, modeling custom data types might seem an extraneous step in creating metadata models to describe its information systems and data consumption. However, creating custom data types offers an organization many benefits.

Formalizing a Data Dictionary

When an organization creates custom data types, it can use them enterprisewide to describe information more distinctly. It can create data types that describe the nature of the information more completely than the existing data types. For example, when confronted with the ZIPCode column within the Address Book database, the organization can model this information easily as a string or an integer; however, if the organization creates a custom data type called "ZIPCodeDT," data modelers within this organization can use this new data type specifically to model ZIP codes. Data modelers can easily find all locations in their disparate data sources that use the same data type. Hence, they can easily find similar or redundant information in the data sources.

Describing Data Rules in Detail

By creating special custom data types, the organization can easily create rules that apply to information of that data type. Its data modelers can set allowable values for that data type by:

Creating a pattern

The pattern, a rule, describes the format of the data that the data type can contain. For example, for the ZIPCodeDT, the organization could set the allowable values to include 5 digits, or 9 digits, or 5 digits followed by a hyphen and then 4 more digits.

Enumerating actual values

The data type definition can include a list of actual values for the data type. For example, data modelers could create a data type called ZIPCodeStL to specify ZIP Codes in St. Louis, Missouri, and establish that the allowable values for this data type include 63043, 63141, 63104, and whatever other values that instances of this data type can contain.

Reusing Data Types

Once the organization has created a custom data type, it can reuse that definition throughout its metadata models and in different metamodels. For example, the organization can not only model the ZIPCode column from the Address relational database, using the Relational metamodel, but it can also use the ZIPCodeDT to model information using XML Schema and other metamodels.

Deriving Custom Data Types

When the organization models custom data types, it bases each new data type upon existing built-in data types or other custom data types. This ensures that it can use information modeled using custom data types within runtime metadata if the organization uses the MetaMatrix Server for data access.

Deriving from Built-In Data Types

The most basic data type derives directly from the built-in data types. For example, the ZIPCodeDT data type relates directly to the integer built-in data type. As such, it bears most of the characteristics of the integer data type, but extends or limits the integer to a specific purpose or content. This new ZIPCodeDT represents an integer that has the pattern of having five numbers in it. When the data modelers model a column as a ZIPCodeDT, it has all the characteristics of an integer plus it only allows values comprised of five- digit numbers.

Deriving from Other Custom Data Types

Once the organization's data modelers have created custom data types, they can further extend or restrict those data types according to need. Again, the new custom data types bear the characteristics of the parent data type and ultimately the characteristics of the base data type. For example, the derived data types ZIPCodeStL and ZIPCodeKC both have the same patterns as their parent data type, ZIPCodeDT, but each restricts the allowed values, by enumeration, to certain literal values. Ultimately, both share characteristics of the built-in integer data type.

Using Custom Data Types

The data modelers can then use these custom data types within their metadata models as they can use the basic types. Because the MetaBase Modeler stores these data types in metadata models of their own, the data modelers can store, export, and exchange these data type definitions much as they do other models. These custom data types allow data modelers to capture much more information about the data than most applications. For example, the address book database could store the ZIP code as an integer because that data type is more efficient. However, the domain of values available for the ZIP code is more constrained than any integer. MetaBase helps retain this valuable information that organizations can sometimes misplace using simple, out-of-the-box data types.

Describing Data Usage in a Virtual Layer

Describing the Application and End User Needs

Ultimately, the organization has data in its disparate sources for one reason—to use it. Each organization has one or more applications that use the information in its data sources. Sometimes, users derive information from one or more sources in common, not necessarily codified, sources. For example, an accountant might use information in a relational database and in a spreadsheet. In his or her day-to-day activity, the accountant must join the information from each source.

A purely physical metadata model would not capture the nature of this information that the accountant needs. Instead, a special type of modeling needs to describe the physical sources of metadata information and the mental processes that convert the source data into the information the end user or application uses.

Virtual Metadata Models Describe Enterprise Data Uses

An organization's data modelers can create special metadata models specifically to contain the end user's view of the information. These special metadata models, called virtual metadata models, describe not only what information the end user needs, but maps the information to the physical data sources from which the information ultimately comes. These virtual metadata models present a logical view of the organization's data sources. This view describes the available information as data-consuming users and applications want to see it, in terms of a single virtual data source. The virtual metadata model uses a powerful transformation tool to explicitly state how the virtual metadata is derived from the physical metadata.

Transformations Use SQL To Derive Information

The transformations that lie between source metadata, typically physical data sources, and the target virtual metadata use Structured Query Language (SQL) syntax and queries to "select" and transform information. Because the transformation uses SQL, it offers the power of a robust querying language and the ease of a common industry standard. Within these transformations, data modelers can use SQL queries and functions to manipulate source meta objects and to convert them into virtual meta objects. The organization can also design its own custom functions to represent its own particular logic and then use these functions with its transformations.

XML Transformations

Map Relational and Other Sources to XML

The MetaBase Modeler offers transformations that can map the physical models or virtual models to XML documents. The organization's data modelers can link columns or elements from a group or table and associate that information with the appropriate tags within the XML document. These transformations describe exactly where the organization stores the components of its needed XML documents within its disparate data sources.

The Enterprise Data - Mapped

Describing the Enterprise's Data in Metadata Models

Once the organization has modeled its physical data sources and its virtual data uses, the organization will have a coherent and complete map to the organization's available data and current uses. The organization can navigate this data in many ways, including diagrams, tree-views, and reports, to find information it needs or it can use the robust search capabilities to locate a particular meta object. The organization can convert the models into different formats to share. The data modelers can save the models as images, print the models, or export the models into standards-compliant XMI files that others can open in other tools.

Storing the Metadata Models in MetaBase

The MetaBase Repository offers an access-and version-controlling mechanism in which an organization can store its physical and virtual metadata models. This metadata repository, managed by the MetaBase Server, manages the content of the MetaBase Repository.

Once the organization has added models to the metadata repository, its data modelers can check the models out to make changes and check those changes back in. Users can add labels and stamp versions onto models as they want. MetaBase also includes the ability to archive groups of models together to simplify exchange between repositories.

Using a SearchBase to Browse Metadata

MetaBase enables an organization to maintain a separate database, called a SearchBase, which contains metadata that the organization's data modelers make available for public browsing. The SearchBase makes available the relevant properties for each meta object and offers searching capability on meta object name, description, and keywords. MetaMatrix offers a Web application, the MetaViewer, that organizations can use to browse the SearchBase in Web browsers. Organizations can also create their own applications to browse the SearchBase.

Creating Data Definition Language Scripts from Models

The MetaBase Modeler enables data modelers to actually create relational databases for many common platforms, including Microsoft SQL Server, DB2, and Oracle 9i, based on both physical and virtual metadata models. Data modelers can create metadata models to design databases and can then generate Data Definition Language (DDL) scripts. They can use these DDL scripts with the database management system to create a data source with the structure of the models used.

Return on Investment: Knowledge

The MetaBase helps the organization immediately reap an increased return on its investment in its existing data sources. An organization that does not know what data it has available cannot effectively use that information. It cannot fully realize the valuable investment in that data unless it knows about that data. When an organization charts its data sources, mapping the vast nebula of information it owns, the organization presents a comprehensive and comprehensible view of the data assets for the organization. The organization's members can easily review that map through diagrams, trees, tables, or searches to discover the sources of information. This new presentation can also spark the organization's creativity in using the existing information in new, productive ways.

Contact MetaMatrix, Shawn Curtiss, 314-739-3190 x120, scurtiss@metamatrix.com

Top of Page


Previous Article  |  Table of Contents  |  Next Article