Analysis & Commentary:
ENTERPRISE DATA MANAGEMENT TECHNOLOGY STRATEGY
by Tim Staub, managing editor
During The Data Warehousing Institute (TDWI) conference in San Diego, DSstar
spoke with Tho Nguyen, Director of Data Warehousing Strategy for SAS.
DSstar: What is ETLQ?
NGUYEN: The integration of Data Quality within the ETL process results in a
user benefit called ETLQ (Exponentially enhance the power of ETL with data
Quality), it affects critical business decisions, resource allocation, price
changes, marketing campaigns, and daily operations that revolve around the
quality of information in a corporate data warehouse.
Integrating Data Quality and Data Warehousing is the latest trend coming from
the Analysts such as Gartner, IDC, and Meta etc. In a market that is mature
but swamped with both ETL and Data Quality vendors, SAS is leading this trend
by being the first vendor to fully integrate Data Quality into the ETL
process.
DSstar: Why is Data Quality so important?
NGUYEN: The basic fact is that organizations have a limited appreciation of
the quality of data residing in the operational systems with the majority
having no data quality processes in place at all. A survey conducted by "The
Data Warehousing Institute" (TDWI) shows that around 44% of respondents said
that their data quality was worse than they had anticipated. Additionally, 40%
admitted to costs, problems and losses directly attributed to data quality
issues.
Further studies by Business Analysts conclude that poor data quality is main
cause for failed and limited acceptance of Data Warehousing and Business
Intelligence projects. Poor data quality is costing U.S. organizations
Billions of Dollars every year in lost sales, lower customer satisfaction
rates, and through the lack of accurate information available to those
responsible for making business critical decisions.
DSstar: Clean data -- what is it worth?
NGUYEN: Companies live and die by the intelligence they can draw out of their
data. Intelligence is derived using a combination of data warehousing,
advanced analytics and business intelligence. But that intelligence is only as
good the quality of the data itself. Drawn from a variety of platforms,
formats and even physical locations, today companies need to merge their data
warehousing and data quality activities to achieve a 'rapid return on
intelligence'. Yet, most organizations do not fund programs designed to ensure
data quality in a proactive, systematic and sustained manner. According to The
Data Warehousing Institute's (TDWI) recent Data Quality Survey, half of all
companies have no plan for managing data quality.
DSstar: Why combine Data Warehousing and Data Quality?
NGUYEN: Bill Inmon, father of data warehousing says the purpose of the ETL
phase is to load the data warehouse with integrated and cleansed data. Data
quality is a key component in preparing data for entry into the data
warehouse. By integrating data warehousing and data quality, ETLQ
(Exponentially enhance the power of ETL with Data quality) is the result and
provides the ability to manage data quality on an enterprise-wide scale,
solving issues for both Data Stewards/Business Analysts and IT/Data
Warehousing professionals. A true data quality solution must address the
entire process -- IT/Data Warehousing professionals need data quality tools
that function within their ETL environment. Data Stewards/Business Analysts
need data quality tools that simplify the complex business rules governing the
algorithms and methodologies that identify true errors in the data.
DSstar: Who is the Target Audience for ETLQ?
NGUYEN: Most interested should be the Information technology (IT) management
team (primary focus on directors, managers). Individuals who are accountable
for the delivery of technologies that help drive the success of their
organizations by providing accurate, appropriate information for effective,
strategic decision making. IT managers plan, administer and review the
acquisition, development, maintenance and use of hardware and software systems
within organizations.
The business audience as well, are users of integrated and cleansed data to
analyze and generate reports for the decision makers within the organizations.
DSstar: Why is IT a Target?
NGUYEN: Because their responsibilities include continually analyzing the
changing information technology needs; being an advocate for and supporting
the development of appropriate resources to meet the organization's
information technology needs, providing leadership in the identification,
justification and articulation of plans for information technology,
establishing priorities for systems development, maintenance and operations
and controlling the security aspects of IT systems. They also are responsible
for improving technology processes, ensuring timely and cost-effective
delivery of IT services and project implementations, and developing long- and
short-range strategic plans.
DSstar: What is IT's Data Quality Pain?
NGUYEN: IT department heads tell us that decision makers in their organization
have trouble getting the high-quality information needed to determine how best
to allocate resources, manage costs, add and retain the right customers,
attain profit targets and more. It is their responsibility to find smarter
ways to use technology for satisfying the business needs and show how IT
contributes to the organization. IT says that it is faced with the challenges
of so much data constantly coming in and everyone wanting immediate answers to
their questions. It is hard to produce accurate information and meet
everyone's needs.
Another challenge we hear is that data comes from so many sources and it is
not standardized. "We have duplicate data on individual customers that doesn't
match up. Different reports on the same question yield different answers. We
do not have a consistent view of information. We don't know the best way to
achieve high-quality data in a low-risk manner. We can't even tell management
how much time and effort it will take to clean up the data."
And another IT department states, "We have records and fields from various
data sources, platforms and systems but we don't have the ability to extract
and transform the data into information that users can trust. We know if we
our systems don't produce accurate information, our entire organization
suffers."
DSstar: Why is the quality of data that companies collect so poor?
NGUYEN: There are a variety of reasons -- everything from the ambiguous nature
of data itself to the reliance on data entry perfection. Yet the simple fact
is that there are so many different data sources that a company relies on for
capturing information.
TDWI currently estimates that data quality problems cost U.S. businesses more
than $600 billion a year. Yet, most executives are oblivious to the data
quality problems that slowly bleed a company to death. This includes the
unnecessary printing, postage, and staffing costs, and the slow but steady
erosion of an organization's credibility among customers and suppliers, and
its inability to make sound decisions based on accurate information.
The problem with data is that its quality quickly degenerates over time.
Experts say 2 percent of records in a customer file become obsolete in one
month. In addition, data entry errors, systems migrations, and changes to
source systems, among other things, generate bucket loads of errors. As
organizations fragment into different divisions and units, interpretations of
data elements mutate to meet the local business needs. A data element that one
group finds valuable may be nonsense to a different group.
DSstar: What does an ETLQ Solution do?
NGUYEN: ETLQ enables Data Quality solutions to easily integrate and leverage
existing compute and operational environments across all platforms and storage
facilities while improving data quality through the entire IT process. It
helps cleanse data before loading it into the data warehouse so that further
downstream analyses and decisions are based on reliable information. The
chosen environment must be low-risk, easy to build and manage, and flexible to
change with evolving business needs. This solution provides a foundation for
accurate information and insight that can be used to optimize an
organization's performance.
The solution automatically analyzes data from a quality perspective, providing
the ability to plan and cost-justify data cleansing expenditures. It provides
an easy-to-use way to standardize, match and verify data so that an
organization has a single version of the truth.
DSstar: How can an organization achieve a high degree of quality of data?
NGUYEN: One approach is to do nothing and let the end users find the errors.
And they will do exactly that. The problem with letting the end users find the
errors is that the end user confidence in the data warehouse erodes. A second
approach is to get a cadre of well meaning and attentive clerical staff and to
have them pore over data. However, this is a terribly time consuming and
expensive approach. In addition this approach itself is prone to errors. The
most effective approach to improving the quality of the data is to use as much
automation as possible.
DSstar: Can I improve Data Quality with my current IT investments?
NGUYEN: Because SAS integrates data quality into the ETL process, IT does not
have to spend extra money and resources on quality tools. SAS works with all
data sources and platforms. SAS leverages existing hardware, software, data
and human resources to easily consolidate legacy and non-legacy data sources
in a highly flexible, readily maintainable environment.
DSstar: Can I deliver decision makers information they can trust?
NGUYEN: Data quality enables IT to retrieve and deliver consistent, accurate,
and reliable information representing a single version of the truth to the
business community -- information that they can trust. A Data Quality solution
featuring ETLQ manages data quality on an enterprise-wide scale, solving
issues for both business analysts and IT professionals. It provides business
analysts with easy-to-use tools that simplify the data auditing and analysis
processes. Data warehousing professionals get data quality administration
tools that function within their ETL environments. This approach emphasizes
not only the construction of quality data with loading into a warehouse, but
also the ongoing management of your warehouse, providing increased automation
of data transformations, integration of external information and simplified
management of complex job dependencies.
DSstar: Where should data quality start?
NGUYEN: There are many opportunities to improve data quality at a point of
data integration. The most logical point is at the source of the data. Data
sources have various formats, reside in multiple platforms and are often
widely distributed. Some data sources are more complete, while others have
missing or incorrect values. By performing corrective maintenance and
preventing data quality issues at the source, the data warehousing effort
becomes more effective.
The point of passage of data from the operational environment into the data
warehouse is a very good place to address data quality. In order to address
the completeness of data coming from multiple sources, it is necessary to
first address the issue of data quality in the source applications, and then
address the issue of compatibility of data as the data is merged. Data quality
tools provide robust matching logic to facilitate merging of disparate data
across data sources.
DSstar: What is the message behind "ETLQ"?
NGUYEN: ETLQ is the benefit received by taking Data Quality as a pro-active
approach rather than a reactive action. The mistake most organizations make is
to treat Data Quality as a cause-and-effect event. There is no final solution
to Data Quality and it should always be considered as an ongoing process. The
more effort you put in to, it the greater your returns will be... The power of
"Q"!
Contact SAS Institute, Wally Maczka, 919-531-5350, wally.maczka@sas.com,
www.sas.com.
|