[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

OBSERVATIONS ON THE CURRENT STATE OF DW/DM TECHNOLOGY
by John K. Thompson


In the past week, I have been out talking to many different people and reading quite a few journals, periodicals and even a small part of a novel. (I add the part about the novel because my wife is constantly teasing me about my total immersion in technical publications. Hey! I like to be informed about what's happening in our industry, alright?)

First, about the most interesting conversation, I was having a drink with an acquaintance. He asked me if I would spend some time with him to discuss the current state of technology and opportunities within the high tech marketplace. An invitation to talk about two of my favorite topics coupled with the fact that he was buying, how could I refuse?

This fellow works for a national not-for-profit and has an interest in moving into the high technology field. I won't bore you with all of the details, but the comment that made my ears prick up was, "I want to work with a company or companies that are accessing a data warehouse using the internet as the access mechanism." A firestorm of neurons fired in my brain, but I held it in check and started to probe for more information. But, my initial thought was, "How in the heck does a guy from left field decide that data warehouses coupled with the Internet is the key to entering the high tech marketplace?" I asked him if he knew what a data warehouse was, and for the most part he did. I also asked him to give me some representative examples of the value of connecting these two elements together, and he gave rational examples that could be done and made economic sense. So, there you have it. The Net and data warehouses have become so mainstream that even people outside of Information Technology (IT) and high tech vendors are using the vernacular and plotting how to make money from the fusion of the two.

When I discussed the state of data mining, his response was "Sounds like great stuff, but you mean you can't access the data in the data warehouse? You've gotta move it into another place to analyze it?" You may be thinking that this fellow has a naive perspective, and he does, but my experience shows that those with a naive perspective are an excellent barometer on what the mainstream will buy in volume. As I have said before I came into the data mining market, and I will continue to espouse while in the market, is that data mining tools and applications must be able to access data in mainstream platforms before prospects in the traditional firms will consider using the technology.

The Data Warehousing Institute (TDWI) has always been a great place to obtain information on what practitioners are really doing with technology. We all know it is difficult to separate marketing hype from product features and other forms of reality, and the people at TDWI have done a good job of keeping the vendors fairly honest and reporting on what has been implemented and how it actually works.

In the spring issue of the Journal of Data Warehousing (yes, I am a bit behind in some of my reading) they report on 22 new developments in data warehousing. This is not a scientific study and the results are not statistically valid, but they are indicative of what consumers are seeing and reacting to in the marketplace.

Three of the 22 "findings" pertain to data mining. Over 13%, not bad depending on where you think data mining is in its life cycle, but I think that's quite good for now, and remember the people polled are those that are actually implementing systems not those simply allocating budget items.

The three new developments in the data warehousing community pertaining to data mining are:

  1. Data mining functions are being embedded into end user tools. Announcements like those from IBM/Lotus that they are embedding Intelligent Miner functions into Notes, and I can't recall if it was an announcement or just a leak that Microsoft was including mining functions in specific MS Office tools. This gives credence to the argument that others and I have put forth that much of the usefulness of data mining will be realized when data mining disappears into other technologies. The end user market for desktop oriented tools is only one market or application area that data mining is making an impact in.

  2. The use of subsets of data from the data warehouse for mining purposes. This brings us around to the sampling vs. not sampling discussion and where the real value lies in data mining, but for the most part practicality dictates that data must be extracted from the data warehouse, reformatted for ingestion by the data mining tool, and analyzed. This is simply reality for now. We expect the data-mining vendors to correct this shortcoming over time.

  3. The emergence of applications that contain data mining functionality or that are enhanced with data mining capabilities. I have written and spoken about this extensively and if you have been reading my past columns, you know that I vehemently believe that this is the only way for data mining to become a widely used and a widespread technology.

In summary, the masses now know about data warehouses and are thinking of how to connect them to the Internet. We are in trouble now. No, but all joking aside, this is a turning point for technology and technology vendors. Data mart vendors have been talking about developing software that allows data marts to be built in weeks instead of months. I think that we need to have disposable data warehouses and data marts. It should be possible to build a data mart based on an idea, that idea may been proposed or discovered through data mining, analyze the data, and dispose of the data mart. Disposable data marts should be able to built in a day or two instead of weeks.

Also, data warehousing professionals know that data mining is coming via the embedding of data mining functionality into desktop tools and end user applications as well as vertical applications that are enhanced with data mining functionality. The data warehouse professionals are also performing extracts from the data warehouse to allow others in the firm to experiment with data mining tools and applications. Both of these are changes from where we were a year ago. Last year there was a high degree of skepticism if data mining would make at all and certainly very few data warehousing professionals were taking the time to extract data for the express purpose of being examined by data mining tools or applications.

All of this is good news in the evolution of data mining. The more people know the more they will be able to do.
---

John Thompson, Vice President - Marketing, Magnify, Inc.
I'd like to hear your thoughts. You can reach me at jkt@magnify.com


[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]