
Features - Enterprise Data Insights:
WHAT SOME DATA QUALITY VENDORS WON'T TELL YOU By Anurag Wadehra, VP of
Marketing, Siperian
In the last few years, enterprises have added several tools to their IT data
integration toolbox to improve the return on their CRM investments. Yet, most
of these companies have not achieved a simple goal: to create reliable,
unified views of their customers -- aggregated across data silos -- and
deliver these to all customer-facing applications in a timely fashion.
Recently, companies have turned to three common technologies to create
solutions for customer data integration. These are data movement tools such as
Extract- Transform-Load (ETL), data query and aggregation tools such as
Enterprise Information Integration (EII) and Data Quality (DQ) tools. However,
what the tool vendors aren't telling you is that these tools are woefully
inadequate for developing a reliable Customer Data Integration (CDI)
platform.
Customer Hubs Emerging
Industry Market Research firm Gartner Inc defines CDI as "the combination of
the technology, processes and services needed to create and maintain an
accurate, timely and complete view of the customer across multiple channels,
business lines and, enterprises, where there are multiple sources of customer
data in multiple application systems and databases." There are several
implementation styles of CDI solutions but the most effective is where an
enterprise commits to building and managing a customer hub that serves as a
central repository of customer data reconciled from multiple data sources.
This hub may contain some or all of the critical customer data needed to
provide multiple customer views to downstream applications. While there are
significant differences among the various customer hubs available, such as
what type of data to persist and how much to aggregate dynamically, there is
little doubt that a large, enterprise-class CDI solution needs a central
customer hub.
Data Tools Ill-Suited
In the past decade, many companies that tried to build an in-house version of
a customer data integration hub using ETL, EII and DQ tools are now struggling
with the aftermath of a custom solution. There are several reasons for the
failure of CDI solutions built with these tools.
First, all three technologies originated for narrow purposes ill-suited for
CDI: ETL to move large volumes of data in batch-mode; EII to run distributed
queries across disparate sources in real-time; and DQ tools to "scrub"
incorrect names and addresses in a single source at a time. Each of these
technologies effectively supports only a single data-modality; batch or
realtime.
Since customer data is inextricably tied to both operational and
strategic business processes of a company, such as order-to-cash process or
profitability segmentation analyses, it needs to be delivered in time for each
business process. Therefore, any customer data integration solution needs to
support a range of modalities of data movement: from a large-volume batch
process that loads a new source into a customer hub; to scheduled intra-day
batches; to a publish-subscribe model for immediate updates of critical data.
Tools designed for single-modality can quickly hamper the reliability and
scalability of a CDI solution.
Treat Different Data Types Separately
To build a reliable CDI solution, it is imperative to treat different types of
data separately, such as master reference data, relationship data or
transaction data. Master reference data is the foundational entity data (such
as name and address) that is critical for uniquely identifying a customer
across multiple systems and channels. Without a persistent and trustworthy hub
of customer reference or profile data that serves as the "system of record",
other types of data can not be aggregated reliably. Ideally, such a master
store should create and maintain the best-of-breed record for each customer
culled from all relevant internal and external data sources -- at the cell or
attribute level -- along with the associated cross-reference keys. This store
then becomes the best source of truth for customer profile information for all
downstream operational and analytical applications.
The next type of data is relationship or hierarchy data. This type of data
defines the relationships among various entities (such as individual to
organization, organization to organization, or individuals within households).
Relationship data can be managed reliably across different sources only after
the underlying conflicts of master (entity) data have been resolved. Most of
the custom solutions deployed have fixed relationships among entities embedded
in the system's data model, which makes it hard for IT to manage changes in
customer relationships and affiliations.
The third type of data is transaction or activity data (such as amount
withdrawn from an account). Although there are significant challenges in
managing large volumes of transaction data, there is usually little conflict
in reconciling such data since there is an unambiguous system of record for
each type of transaction. The key issue lies in attaching these transactions
correctly to the same customer across multiple CRM touch-points and then
aggregating them accurately for other applications to consume (such as the
average account balance). Note that transactions can be aggregated for the
right customer or household only after the ambiguities of the associated
master and relationship data have been removed.
Essentially, without treating different data types separately and establishing
a reliable foundation of master data at the start, a trustworthy CDI platform
can not be built. Yet, none of the data tools maintain separation of data
types. For instance, ETL tools neither recognize nor treat master data apart
from other types of data. EII tools assume that all federated data results are
clean and unambiguous; in fact, they rely on an external source to provide
correct cross-reference keys and global IDs to accurately join the results of
a federated query. DQ tools provide ad-hoc cleansing of a source but do not
recognize data types nor offer on-going management of data changes.
The Challenge Of Data Models
One of the key reasons custom solutions are inextensible is because of their
instantiation of a fixed data model in a physical database repository or data
warehouse. This fate is also shared by "packaged" CDI solutions offered by
application vendors (such as Siebel, Oracle and SAP). In a large enterprise,
rarely does a single vendor have access to all sources of customer data --
external and internal. Therefore, standardizing on the application vendor data
model means more, not less, work since every data source outside the vendor
application has to be transformed to feed into the vendor's customer data hub.
The best approach is to create a template-driven, logical data model
specifically for each enterprise reflecting all its specific customer data
sources that need to be integrated. Ultimately, the solution provider has to
deliver a data model and a solution framework cognizant of the needs of each
major industry vertical. None of these data tools attempt to address the
challenge of data models for a diverse set of data sources encountered in
various verticals.
Meta-Data Driven Framework Needed
The most fundamental short-coming of the trio of data tools (ETL/EII/DQ) is
the fact that they do not offer a meta-data framework for managing the
complete set of data management tasks required of customer data integration
solution. Each of these tools, along with the numerous enterprise application
integration (EAI) technologies, solves only a narrow integration issue within
the IT "stack" -- integrating application to application, moving data to
single warehouse, cleansing a single source, etc.
A comprehensive CDI framework must include the tools needed for all processes
associated with managing different data types. For example, the framework
should address the complete lifecycle of master reference data; model,
cleanse, match, merge, share, extend and manage. The solution should allow
customer and organization hierarchies across data sources to be leveraged
instead of tied to a fixed hierarchical view of an implementation. The
solution should readily access all relevant customer activity data and
accurately unify it with other data types for a complete view (through caching
or aggregation).
For the solution to manage data changes without software programming efforts,
it must be driven by meta-data that captures the data syntax, semantics and
business rules that are relevant to integrating customer data into unified
views. It is important to maintain the distinction between managing meta-data
through a generalized meta-data tool versus having a meta-data driven
framework designed for a specific purpose (such as CDI). A meta-data driven
framework captures, stores and uses highly contextual meta-data tied to a
business purpose (such as, when was a customer address changed and by whom).
By separating meta-data from its business context, a generalized meta-data
tool often limits its business value.
The key advantage of using a meta-data driven CDI framework is that it renders
the solution entirely configurable, so that business and IT changes can be
implemented rapidly without writing code. Since the CDI framework is
manageable by business analysts and data stewards as well as by IT, such a
solution becomes the successful foundation for all unified customer views in
an enterprise. Additional data sources are easy to add, without additional
programming, as businesses evolve through mergers and acquisitions.
Because the custom CDI solutions built with ETL-EII-DQ tools are not meta-data
driven, they are not manageable by data stewards, are hard to configure and
are generally not extensible beyond a handful of sources.
Service Oriented Architecture Critical
Finally, if a customer data hub is to be the central repository of critical
customer information for other systems, it needs to have critical capabilities
to synchronize reliable data back to source systems. In addition, such a CDI
solution needs to support standards-based service-oriented architecture (SOA)
so that its underlying data services may be used by future service-oriented
applications. Typically, none of the hubs built by data tools offer these
critical capabilities, foreshadowing their quick obsolescence.
Summary
Although necessary components of the data integration architecture, ETL, EII
and DQ tools are not designed, nor able, to build a trustworthy foundation for
customer data integration. For the same reason you wouldn't hire a plumber to
build your house, organizations should not rely primarily on these
technologies when developing a reliable customer data foundation.
Like plumbing in a house, the tools that push data through the pipes are not
representative of the overarching blueprint needed for customer data
integration (CDI) architecture. The cornerstone of the architecture is the
recognition that different types of data need to be treated separately.
Additionally, data reliability can only be maintained through a set of best
practices that first put in place the bedrock of reliable customer master
reference data. A solution that has a flexible data model supported by a
metadata
driven, configurable framework is the best way to construct such a
foundation. Once built, it should be easily manageable by data stewards and
extensible to emerging service-oriented architecture standards and therefore
to new business conditions.
Before hiring a customer data integration "plumber" to build your customer
foundation, take the time to evaluate a data architecture expert who can build
a solid foundation from which to achieve your customer data integration
goals.
About the Author
Anurag Wadehra is the vice president of marketing at Siperian Inc, a customer
data integration solution provider. The Siperian solution creates the most
trustworthy and manageable customer master reference store possible from
widely disparate internal and external data sources. It is the foundation for
delivering accurate, relevant and actionable 360-degree customer views.
Siperian's highly manageable and extensible solution enables enterprises to
cost-effectively provide trustworthy customer master data to any system or
business user, resulting in more efficient and profitable customer
relationships, reduced customer data operations costs and increased accuracy
of regulatory compliance.
|