WILL I AS A MARKETER EVER BUILD A MODEL?
by John K. Thompson
In talking, thinking, and writing about data mining I started to wonder, "Will I ever build a model of my own?"
I don't mean sit down and use a data mining system to build a model for the fun of it. I've done that before. While I was at IBM, the development lab in Rochester released version 1 of what was then called the Neural Network Utility (NNU). There were about 10 of us selected to spend a week learning and playing with the software, and giving our feedback on how useful we thought the tool was. In the end, it was a fun week. We all completed our canned exercises, a few of us actually built models that read data, adjusted weights, and produced results. But, beyond being a fun exercise for people who were fairly technical, nothing of business value came of our efforts, which admittedly weren't much, nor did anything of general market value come of NNU.
When I say build a model, I am implying that I would understand the input data, have the ability to not only gather the complete data set, but to also manipulate the data into the required format to be used by the data mining software, split the data into training and validation data sets, build the model, test and validate the model, and then deploy the model in a system or processes that I use to make decisions that affect how my area of the business operates. Will I ever be able to do that? I'd like to think so. But if we examine the steps in the process, how many of those steps are dependent on other people being good corporate citizens?
Let's see. First and foremost, how will I know and understand the corporate data resources if I don't have an easy-to-use tool that allows me to browse the metadata and detail level data elements to have a full picture of what internal and external data we as a corporation own? In the first step we have an issue. This is a big problem in most corporations, not only in America, but most of the world. We have an Information Technology (IT) infrastructure that is set up to restrict the use of data and information rather than promote free use, synthesis, and dissemination.
A problem, but not the end of the quest. Many corporations, through their data warehousing initiatives, are building metadata repositories that can be browsed by IT professionals, Line-of-Business IT liaisons and power users. Some of these efforts are quite mature and allow these aforementioned constituents to access, and manipulate data for further use.
Let's assume that I am in one of these forward- thinking and acting organizations and I have access to a fully functioning and up-to-date catalog of internal and external data resources. I can now formulate alist of the data elements I wish to work with, but in all likelihood I won't be able to build an extract for each of the data marts, data warehouses, and transactional systems which I want data from. Thwarted again. This time the first roadblock is the IT infrastructure and culture, but more importantly, I as a non technical user probably do not have access to the tools that will allow me to build the extracts. Ok, so I take my data element list to my LOB IT liaison or my IT department and ask for my extract. A small detour on the journey and one I can live with if I have a fairly responsive group to work with.
Now I have my data set. Luckily for me I have a PC on steroids. I do not have to depend on IT to allocate disk space on a server to store my raw data set, or my intermediate data sets, or my final results. I can do it all on my PC. My data set is not in the terabyte range, but more like a few gigabytes.
Time to start manipulating the raw data, and splitting the transformed data into subsets for various uses. From this point on I should be living completely within my data mining tool. All of the data manipulation, data subsetting, model building, testing, and validation should be supported within an integrated environment. Within that environment, everything that is associated with model production should be available from one coherent interface.
After performing the data management processes, and the model building and validation steps I want to have this entire process setup as a monthly activity to monitor business indicators. What I really need is some type of workflow system that will encapsulate all of the steps that are performed by corporate IT, LOB IT, and by me. There are quite a few workflow management systems out there that will accomplish this goal, but very few are integrated with data mining systems.
After writing this I realize, we're a long way from enabling end users, such as marketing professionals to build their own models. But we and I must ask ourselves: Is that the right goal? For now, probably not. The IT infrastructures are not at the level of maturity or are not oriented to free-for-all type access, the access and manipulation technologies are targeted to IT only and not end users, and finally data mining tools and applications are simply not end-user friendly yet.
So, can I build my own model now? No. Will I be able to in the future?
Probably, but you'll need quite a lot of moxie, intelligence, and some help
from others. Will many end users want to build their own models? How many
is many? Not too many, I would say. The power users in Marketing Research,
and other analytically proficient people will, but that number will be
limited.
---
As always, you can reach me at
jkt@magnify.com
I'll be on vacation next
week. I'll be back the following week after my short hiatus.