BRINGING DATA MINING HOME
by Ed Colet, Virtual Gold, Inc.
Having gone through the process of purchasing a home, I was reflecting upon my use of technology in general and data mining in particular in support of this effort. To buy or not to buy, that was only part of the question - the what and how also had to be decided. I realized that technology (the Web in particular), played a significant role, and that data mining could also be extremely useful. The lessons learned here pertain to a home purchase, but they also generalize beyond this domain.
With regard to home buying, there is more information available today through the Web than ever before. Lots of information is a good thing, - but a better thing is critical information that can be assimilated and organized so that it's useful. End users can visit several Web sites for relevant information. For example, I spent time at several sites - http://www.homeshark.com to learn about the process of buying a home, http://www.realtor.com to search for homes, and http://www.quickenmortgage.com for information about mortgages. In theory the entire process from the early financial assessments, the hiring of brokers and attorneys, the decision on a particular home, and the arrangement of financing, can all be completed from within a single Web site.
Whether one uses online information from a single site or from several sites, the nature of these interactions can be characterized as a "search and retrieve" model. For example, you can enter some information on finances and find prices of homes that you can afford, or you can enter information on desired properties of homes and retrieve listing of homes that match your criteria. The results are based on matching of parameters and/or some underlying calculations.
This "search and retrieve" model is identical to the way data are often used in organizations today. SQL queries are issued and answers are retrieved. This assumes that the end user knows what questions to ask, and often this is sufficient. But in many cases, data mining's "search and discover" model is more beneficial. Through data mining one can initiate a query and get back information that one may not have known to have initially asked about - but that is significant and perhaps critical. A poor decision with respect to a home purchase can be a financial and emotional disaster.
To a large extent, data mining tools to help an end-user buy a home are not readily available due to a difficult data integration problem. Data integration is also a limiting factor in the development and use of data mining in large organizations. When buying a home, diverse information had to be related together. I would argue that since a lot of information is already online, and is therefore available, the problem is that it's not integrated together - and there are no tools to do this well. Lots of Web sites do let you access information about the local school district, the crime rate, prices of recent and comparable sales in the neighborhood, demographic information, and even neighborhood traffic loads by time of day. These results all come back as individual reports for the user to assimilate. Other data that are publicly available but less accessible (i.e., more than a mouse click away) pertain to local economic indicators, zoning regulations, and pending development projects - all of which can impact property values and rate of appreciation. What data mining can do is mine this information and build model(s) that find important associations and dependencies among these factors with respect to financial risk and potential appreciation of property. Experienced real estate agents know a lot of this information, yet still don't really know what makes certain properties better values than others. Buyers learn about some of this only after going through the process.
The overall objective of data mining is to discover information that would enable better decisions to be made. As such, the use of data mining tools could have made the research and analyses over the past few months more efficient. In this case, we hope that data mining would have led me to the same decision on the same home.
---
For more information, see http://www.virtualgold.com.