[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

SOME THOUGHTS ON THE CURRENT STATE OF DATA MINING SOFTWARE APPLICATIONS: PART II
by Kurt Thearling


5) Effort Knob. Users do not necessarily understand the relationship between complex algorithm parameters and the performance that they will see. As a result, the user might naively change a tuning parameter in order to improve modeling accuracy, increasing processing time by an order of magnitude. This is not a relationship that the user can (or should) understand. Instead, a better solution is to provide an "effort knob" that allows a user to control global behavior. Set it to a low value and the system should produce a model quickly, doing the best it can given the limited amount of time. On the other hand, if it is set to the maximum value the system might run overnight to produce the best model possible. Because time and effort are concepts that a business user can understand, an effort knob is relevant in a way that tuning parameters are not.

6) Incorporate Financial Information. Data mining does not operate in a vacuum. The results of the data mining process will drive efforts in areas such as marketing, risk management, and credit scoring. Each of these areas is influenced by financial considerations that need to be incorporated in the data mining modeling process. A business user is concerned with maximizing profit, not minimizing RMS error. The information necessary to make these financial decisions (costs, expected revenue, etc.) is often available and should be provided as an input to the data mining application.

7) Computed Target Columns. In many cases the desired target variable does not necessarily exist in the database. If the database includes information about customer purchases, a business user might only be interested in customers whose purchases were more than one hundred dollars. Obviously, it would be straightforward to add a new column to the database that contained this information. But this would probably involve database administrator and IT personnel, complicating a process that is probably complicated already. In addition, the database could become messy as more and more possible targets are added during an exploratory data analysis phase. The solution is to allow the user to interactively create a new target variable. Combining this with an application wizard (#10), it would be relatively simple to allow the user to create computed targets on the fly.

8) Time-Series Data. Much of the data that exists in data warehouses has a time-based component. A year's worth of monthly balance information is qualitatively different than twelve distinct non-time-series variables. Data mining applications need to understand that fact and use it to create better models. Knowing that a set of variables is a time-series allows for calculations to be done that make sense only for time series data: trends, slopes, deltas, etc. These calculations have been in use manually by statisticians for years but most data mining applications cannot perform them because time-series data is considered as a set of unrelated variables.

9) Use vs. View. Data mining models are often complex objects. A decision tree with four hundred nodes is impossible to fit on a high-resolution video display, let alone be understood by a human viewer. Unfortunately, most data mining applications do not differentiate between the model that is used to score a database and the model representation that is presented to users. This needs to be changed. The model that is presented visually to the user does not necessarily have to be the full model that is used to score data. A slider on the interface that visualizes a decision tree could be used to limit the display to the first few (most important) levels of the tree. Interacting with the display would not have an effect on the complexity of the model but it would simplify its representation. As a result, users would be able to interact with the system to provide only the amount of information they can comprehend.

10) Wizards. Not necessarily a must-have, application wizards can significantly improve the user's experience. Besides simplifying the process, they can help prevent human error by keeping the user on track.

---


[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]