Next Article Table of Contents Previous Article

Analysis & Commentary:

ENTERPRISE DATA MANAGEMENT TECHNOLOGY STRATEGY
by Tim Staub, managing editor

During The Data Warehousing Institute (TDWI) conference in San Diego, DSstar spoke with Tho Nguyen, Director of Data Warehousing Strategy for SAS.

DSstar: What is ETLQ?

NGUYEN: The integration of Data Quality within the ETL process results in a user benefit called ETLQ (Exponentially enhance the power of ETL with data Quality), it affects critical business decisions, resource allocation, price changes, marketing campaigns, and daily operations that revolve around the quality of information in a corporate data warehouse.

Integrating Data Quality and Data Warehousing is the latest trend coming from the Analysts such as Gartner, IDC, and Meta etc. In a market that is mature but swamped with both ETL and Data Quality vendors, SAS is leading this trend by being the first vendor to fully integrate Data Quality into the ETL process.

DSstar: Why is Data Quality so important?

NGUYEN: The basic fact is that organizations have a limited appreciation of the quality of data residing in the operational systems with the majority having no data quality processes in place at all. A survey conducted by "The Data Warehousing Institute" (TDWI) shows that around 44% of respondents said that their data quality was worse than they had anticipated. Additionally, 40% admitted to costs, problems and losses directly attributed to data quality issues.

Further studies by Business Analysts conclude that poor data quality is main cause for failed and limited acceptance of Data Warehousing and Business Intelligence projects. Poor data quality is costing U.S. organizations Billions of Dollars every year in lost sales, lower customer satisfaction rates, and through the lack of accurate information available to those responsible for making business critical decisions.

DSstar: Clean data -- what is it worth?

NGUYEN: Companies live and die by the intelligence they can draw out of their data. Intelligence is derived using a combination of data warehousing, advanced analytics and business intelligence. But that intelligence is only as good the quality of the data itself. Drawn from a variety of platforms, formats and even physical locations, today companies need to merge their data warehousing and data quality activities to achieve a 'rapid return on intelligence'. Yet, most organizations do not fund programs designed to ensure data quality in a proactive, systematic and sustained manner. According to The Data Warehousing Institute's (TDWI) recent Data Quality Survey, half of all companies have no plan for managing data quality.

DSstar: Why combine Data Warehousing and Data Quality?

NGUYEN: Bill Inmon, father of data warehousing says the purpose of the ETL phase is to load the data warehouse with integrated and cleansed data. Data quality is a key component in preparing data for entry into the data warehouse. By integrating data warehousing and data quality, ETLQ (Exponentially enhance the power of ETL with Data quality) is the result and provides the ability to manage data quality on an enterprise-wide scale, solving issues for both Data Stewards/Business Analysts and IT/Data Warehousing professionals. A true data quality solution must address the entire process -- IT/Data Warehousing professionals need data quality tools that function within their ETL environment. Data Stewards/Business Analysts need data quality tools that simplify the complex business rules governing the algorithms and methodologies that identify true errors in the data.

DSstar: Who is the Target Audience for ETLQ?

NGUYEN: Most interested should be the Information technology (IT) management team (primary focus on directors, managers). Individuals who are accountable for the delivery of technologies that help drive the success of their organizations by providing accurate, appropriate information for effective, strategic decision making. IT managers plan, administer and review the acquisition, development, maintenance and use of hardware and software systems within organizations.

The business audience as well, are users of integrated and cleansed data to analyze and generate reports for the decision makers within the organizations.

DSstar: Why is IT a Target?

NGUYEN: Because their responsibilities include continually analyzing the changing information technology needs; being an advocate for and supporting the development of appropriate resources to meet the organization's information technology needs, providing leadership in the identification, justification and articulation of plans for information technology, establishing priorities for systems development, maintenance and operations and controlling the security aspects of IT systems. They also are responsible for improving technology processes, ensuring timely and cost-effective delivery of IT services and project implementations, and developing long- and short-range strategic plans.

DSstar: What is IT's Data Quality Pain?

NGUYEN: IT department heads tell us that decision makers in their organization have trouble getting the high-quality information needed to determine how best to allocate resources, manage costs, add and retain the right customers, attain profit targets and more. It is their responsibility to find smarter ways to use technology for satisfying the business needs and show how IT contributes to the organization. IT says that it is faced with the challenges of so much data constantly coming in and everyone wanting immediate answers to their questions. It is hard to produce accurate information and meet everyone's needs.

Another challenge we hear is that data comes from so many sources and it is not standardized. "We have duplicate data on individual customers that doesn't match up. Different reports on the same question yield different answers. We do not have a consistent view of information. We don't know the best way to achieve high-quality data in a low-risk manner. We can't even tell management how much time and effort it will take to clean up the data."

And another IT department states, "We have records and fields from various data sources, platforms and systems but we don't have the ability to extract and transform the data into information that users can trust. We know if we our systems don't produce accurate information, our entire organization suffers."

DSstar: Why is the quality of data that companies collect so poor?

NGUYEN: There are a variety of reasons -- everything from the ambiguous nature of data itself to the reliance on data entry perfection. Yet the simple fact is that there are so many different data sources that a company relies on for capturing information.

TDWI currently estimates that data quality problems cost U.S. businesses more than $600 billion a year. Yet, most executives are oblivious to the data quality problems that slowly bleed a company to death. This includes the unnecessary printing, postage, and staffing costs, and the slow but steady erosion of an organization's credibility among customers and suppliers, and its inability to make sound decisions based on accurate information.

The problem with data is that its quality quickly degenerates over time. Experts say 2 percent of records in a customer file become obsolete in one month. In addition, data entry errors, systems migrations, and changes to source systems, among other things, generate bucket loads of errors. As organizations fragment into different divisions and units, interpretations of data elements mutate to meet the local business needs. A data element that one group finds valuable may be nonsense to a different group.

DSstar: What does an ETLQ Solution do?

NGUYEN: ETLQ enables Data Quality solutions to easily integrate and leverage existing compute and operational environments across all platforms and storage facilities while improving data quality through the entire IT process. It helps cleanse data before loading it into the data warehouse so that further downstream analyses and decisions are based on reliable information. The chosen environment must be low-risk, easy to build and manage, and flexible to change with evolving business needs. This solution provides a foundation for accurate information and insight that can be used to optimize an organization's performance.

The solution automatically analyzes data from a quality perspective, providing the ability to plan and cost-justify data cleansing expenditures. It provides an easy-to-use way to standardize, match and verify data so that an organization has a single version of the truth.

DSstar: How can an organization achieve a high degree of quality of data?

NGUYEN: One approach is to do nothing and let the end users find the errors. And they will do exactly that. The problem with letting the end users find the errors is that the end user confidence in the data warehouse erodes. A second approach is to get a cadre of well meaning and attentive clerical staff and to have them pore over data. However, this is a terribly time consuming and expensive approach. In addition this approach itself is prone to errors. The most effective approach to improving the quality of the data is to use as much automation as possible.

DSstar: Can I improve Data Quality with my current IT investments?

NGUYEN: Because SAS integrates data quality into the ETL process, IT does not have to spend extra money and resources on quality tools. SAS works with all data sources and platforms. SAS leverages existing hardware, software, data and human resources to easily consolidate legacy and non-legacy data sources in a highly flexible, readily maintainable environment.

DSstar: Can I deliver decision makers information they can trust?

NGUYEN: Data quality enables IT to retrieve and deliver consistent, accurate, and reliable information representing a single version of the truth to the business community -- information that they can trust. A Data Quality solution featuring ETLQ manages data quality on an enterprise-wide scale, solving issues for both business analysts and IT professionals. It provides business analysts with easy-to-use tools that simplify the data auditing and analysis processes. Data warehousing professionals get data quality administration tools that function within their ETL environments. This approach emphasizes not only the construction of quality data with loading into a warehouse, but also the ongoing management of your warehouse, providing increased automation of data transformations, integration of external information and simplified management of complex job dependencies.

DSstar: Where should data quality start?

NGUYEN: There are many opportunities to improve data quality at a point of data integration. The most logical point is at the source of the data. Data sources have various formats, reside in multiple platforms and are often widely distributed. Some data sources are more complete, while others have missing or incorrect values. By performing corrective maintenance and preventing data quality issues at the source, the data warehousing effort becomes more effective.

The point of passage of data from the operational environment into the data warehouse is a very good place to address data quality. In order to address the completeness of data coming from multiple sources, it is necessary to first address the issue of data quality in the source applications, and then address the issue of compatibility of data as the data is merged. Data quality tools provide robust matching logic to facilitate merging of disparate data across data sources.

DSstar: What is the message behind "ETLQ"?

NGUYEN: ETLQ is the benefit received by taking Data Quality as a pro-active approach rather than a reactive action. The mistake most organizations make is to treat Data Quality as a cause-and-effect event. There is no final solution to Data Quality and it should always be considered as an ongoing process. The more effort you put in to, it the greater your returns will be... The power of "Q"!

Contact SAS Institute, Wally Maczka, 919-531-5350, wally.maczka@sas.com, www.sas.com.

Top of Page


Previous Article  |  Table of Contents  |  Next Article