Next Article Table of Contents Previous Article

DATA MINING ON THE WEB
by Angus Maclaurin and David Rickard

As the Web analytics market grows from $425 million in 2000 sales to a projected $4 billion in 2004, one Boston-area company in the education technology space, HighWired.com, illustrates the potential bottom-line return on data mining investments. The company began a concentrated business-intelligence campaign in mid-2000, and the lessons it has learned apply well to other companies that may be considering similar initiatives. To begin, it helps to understand what information sources data miners might target.

Website server logs hold a tremendous amount of information on users: what people look at, when they visit, where they come from, and how long they stay. As managers across functions attempt to incorporate more data into their decision-making, focused attention to Website data - or, for that matter, other sources like sales and inventory records, call center records, and the like - can yield major efficiencies across functions, whether for sales, marketing, finance, or personnel recruiting.

HighWired.com provides Web community and hosting services to high schools, allowing them to create and maintain school-specific Web sites in a turnkey fashion. With over 13,000 schools registered on the site, coming from all 50 states and 76 different countries, the site generates over 10 million page views a month. Each of the school communities has numerous individual users, who log on to enjoy content options like sports team performance tracking, online publications, and discussion groups. Starting last August, HighWired.com has run an ambitious data mining program with help from Cambridge-based technology firm Hanrick Associates. The data mining takes aggregated records of user behavior and aims at three goals:

  1. Developing user profiles to understand, on a session-by-session basis, what individuals on the Web site care about and what drives profitable interactions for HighWired.com. This is not that different from what businesses have done for years on an informal basis, but the sheer volume and comprehensiveness of Web data gives HighWired.com's management team a uniquely data-driven perspective. Server log analysis provides real-time updates on what customers actually do and want, rather than isolated sets of data on portions of business.
  2. Integrating Web data with registration data and school information to develop better customer segmentation, which in turn aids business development efforts with potential sponsors and advertisers on HighWired.com.
  3. Determining the success of various marketing partnerships, as well as one major corporate acquisition, and building a program for continual performance improvement, and hence cost savings.

As with all major corporate initiatives, HighWired.com's foray into data mining was not simple or trouble-free. The company started its efforts using a popular server log analysis software product, but management team members became dissatisfied with the software for two main reasons: first, they were unable to receive all necessary information from the reports (for example, reports did not integrate site traffic information with HighWired.com's internal database), and second, they were devoting too many internal resources, both in terms of personnel and hardware, to maintaining the software.

Software packages typically provide a broad set of data on number of users and page views. Low-end packages such as WebTrends, Hit Box Pro, Net Tracker, and SuperStats Professional price in the range of $20/month to $1000 and provide this basic level of information. 91% of companies on the Web today use some form of software, but continued dissatisfaction with them is precisely what underlies ongoing innovations in the data mining market. Like many companies seeking genuinely actionable business intelligence, Highwired.com needed to go further.

Frustrated by trials with pre-packaged software, its team settled on a customized solution involving automated queries of the server logs, using queries written in Structured Query Language (SQL). As the data mining partner to HighWired, Hanrick separated student and faculty user patterns by linking the log data to HighWired.com's internal database. "Hanrick's approach made a big difference," said Anne Yount, Director of Data Services at HighWired.com. "We've worked with server-log software packages before, but they couldn't answer all of our strategic questions. We needed a partner who would help us mine the underused customer data in our Web server logs, and Hanrick came in and started providing valuable analysis in a surprisingly short time."

A key to initial successes in the data mining campaign was full buy-in from across the company's different functions. For example, the business development team at HighWired benefited immediately when the Data Services team was able to provide monthly statistics on users from all of HighWired.com's partners, illustrating which ones spent the most time on the site, and which users registered. This profiling information, in turn, separated good partnerships from bad ones and strengthened the negotiating leverage HighWired.com maintained relative to its allies.

Meanwhile, HighWired's Marketing team used information from the server logs and its registration database to inform strategic decision-making. In one brief project, HighWired.com measured return on investment (ROI) for a marketing initiative run by the site. High school teachers administering sections of their high school's HighWired.com site were encouraged to increase participation on the site, receiving prizes based on their level of success. Analysis of log files determined which schools were most successful and the overall result of the campaign. Finally, data mining helped the technical department troubleshoot various issues related to the Web site's operations.

In one instance, Highwired.com had just inked a three-year deal with a new partner. Internal log analysis turned up a large discrepancy between the number of page views reported by the partner and the number seen in HighWired.com's logs (the latter was the number which resulted in paid advertising for HighWired.com.) Further investigation showed that the partner was maintaining a "cached" copy of each of the pages on their own server; while this speeded download times, it prevented HighWired.com from receiving advertising revenue from those page views. Working with Hanrick, the HighWired team was able to create a new way of advertising on these pages, without affecting the download times. The increase in ad revenue may have been as high as $10,000 per month. The HighWired.com example, though brief, shows some of the potential pitfalls as well as benefits of a concentrated data-mining campaign. A few lessons:

  • Data mining exercises should typically begin from a specific need communicated by business managers, not gearheads. Rather than directing analysis in an open-ended, academic fashion, work should center on tangible, revenue- and profit-driving issues like marketing, partnerships, and content management. If you're not sure whether the issue ties back closely to such important issues, think about the metrics you use to measure progress; and whether or not they relate to key needs.
  • Don't settle for generic software. By keeping queries customized and flexible, managers can get an overview of site performance while digging deeper into specific areas or changing the reporting system as business needs shift.
  • Scalable technology is important. Highwired's initial system did not anticipate a substantial increase in traffic as the site grew, and hence did not rest on robust, enterprise-class technologies.
  • Communicate aggressively to get more creative minds involved. Especially with relatively new, less understood technologies like data mining, initial successes should be communicated widely, in order to get people enthused and increase subsequent brainstorms and participation across the company.

For more info, see www.ehanrick.com

Top of Page


Previous Article  |  Table of Contents  |  Next Article