DATA MINING ON THE WEB
by Angus Maclaurin and David Rickard
As the Web analytics market grows from $425 million in 2000 sales to a
projected $4 billion in 2004, one Boston-area company in the education
technology space, HighWired.com, illustrates the potential bottom-line return
on data mining investments. The company began a concentrated
business-intelligence campaign in mid-2000, and the lessons it has learned
apply well to other companies that may be considering similar initiatives. To
begin, it helps to understand what information sources data miners might
target.
Website server logs hold a tremendous amount of information on users: what
people look at, when they visit, where they come from, and how long they stay.
As managers across functions attempt to incorporate more data into their
decision-making, focused attention to Website data - or, for that matter,
other sources like sales and inventory records, call center records, and the
like - can yield major efficiencies across functions, whether for sales,
marketing, finance, or personnel recruiting.
HighWired.com provides Web community and hosting services to high schools,
allowing them to create and maintain school-specific Web sites in a turnkey
fashion. With over 13,000 schools registered on the site, coming from all 50
states and 76 different countries, the site generates over 10 million page
views a month. Each of the school communities has numerous individual users,
who log on to enjoy content options like sports team performance tracking,
online publications, and discussion groups. Starting last August,
HighWired.com has run an ambitious data mining program with help from
Cambridge-based technology firm Hanrick Associates. The data mining takes
aggregated records of user behavior and aims at three goals:
- Developing user profiles to understand, on a session-by-session basis,
what
individuals on the Web site care about and what drives profitable interactions
for HighWired.com. This is not that different from what businesses have done
for years on an informal basis, but the sheer volume and comprehensiveness of
Web data gives HighWired.com's management team a uniquely data-driven
perspective. Server log analysis provides real-time updates on what customers
actually do and want, rather than isolated sets of data on portions of
business.
- Integrating Web data with registration data and school information to
develop better customer segmentation, which in turn aids business development
efforts with potential sponsors and advertisers on HighWired.com.
- Determining the success of various marketing partnerships, as well as
one
major corporate acquisition, and building a program for continual performance
improvement, and hence cost savings.
As with all major corporate initiatives, HighWired.com's foray into data
mining was not simple or trouble-free. The company started its efforts using a
popular server log analysis software product, but management team members
became dissatisfied with the software for two main reasons: first, they were
unable to receive all necessary information from the reports (for example,
reports did not integrate site traffic information with HighWired.com's
internal database), and second, they were devoting too many internal
resources, both in terms of personnel and hardware, to maintaining the
software.
Software packages typically provide a broad set of data on number of users and
page views. Low-end packages such as WebTrends, Hit Box Pro, Net Tracker, and
SuperStats Professional price in the range of $20/month to $1000 and provide
this basic level of information. 91% of companies on the Web today use some
form of software, but continued dissatisfaction with them is precisely what
underlies ongoing innovations in the data mining market. Like many companies
seeking genuinely actionable business intelligence, Highwired.com needed to go
further.
Frustrated by trials with pre-packaged software, its team settled on a
customized solution involving automated queries of the server logs, using
queries written in Structured Query Language (SQL). As the data mining partner
to HighWired, Hanrick separated student and faculty user patterns by linking
the log data to HighWired.com's internal database. "Hanrick's approach made a
big difference," said Anne Yount, Director of Data Services at HighWired.com.
"We've worked with server-log software packages before, but they couldn't
answer all of our strategic questions. We needed a partner who would help us
mine the underused customer data in our Web server logs, and Hanrick came in
and started providing valuable analysis in a surprisingly short time."
A key to initial successes in the data mining campaign was full buy-in from
across the company's different functions. For example, the business
development team at HighWired benefited immediately when the Data Services
team was able to provide monthly statistics on users from all of
HighWired.com's partners, illustrating which ones spent the most time on the
site, and which users registered. This profiling information, in turn,
separated good partnerships from bad ones and strengthened the negotiating
leverage HighWired.com maintained relative to its allies.
Meanwhile, HighWired's Marketing team used information from the server logs
and its registration database to inform strategic decision-making. In one
brief project, HighWired.com measured return on investment (ROI) for a
marketing initiative run by the site. High school teachers administering
sections of their high school's HighWired.com site were encouraged to increase
participation on the site, receiving prizes based on their level of success.
Analysis of log files determined which schools were most successful and the
overall result of the campaign. Finally, data mining helped the technical
department troubleshoot various issues related to the Web site's operations.
In one instance, Highwired.com had just inked a three-year deal with a new
partner. Internal log analysis turned up a large discrepancy between the
number of page views reported by the partner and the number seen in
HighWired.com's logs (the latter was the number which resulted in paid
advertising for HighWired.com.) Further investigation showed that the partner
was maintaining a "cached" copy of each of the pages on their own server;
while this speeded download times, it prevented HighWired.com from receiving
advertising revenue from those page views. Working with Hanrick, the HighWired
team was able to create a new way of advertising on these pages, without
affecting the download times. The increase in ad revenue may have been as high
as $10,000 per month. The HighWired.com example, though brief, shows some of
the potential pitfalls as well as benefits of a concentrated data-mining
campaign. A few lessons:
- Data mining exercises should typically begin from a specific need
communicated by business managers, not gearheads. Rather than directing
analysis in an open-ended, academic fashion, work should center on tangible,
revenue- and profit-driving issues like marketing, partnerships, and content
management. If you're not sure whether the issue ties back closely to such
important issues, think about the metrics you use to measure progress; and
whether or not they relate to key needs.
- Don't settle for generic software. By keeping queries customized and
flexible, managers can get an overview of site performance while digging
deeper into specific areas or changing the reporting system as business needs
shift.
- Scalable technology is important. Highwired's initial system did not
anticipate a substantial increase in traffic as the site grew, and hence did
not rest on robust, enterprise-class technologies.
- Communicate aggressively to get more creative minds involved.
Especially
with relatively new, less understood technologies like data mining, initial
successes should be communicated widely, in order to get people enthused and
increase subsequent brainstorms and participation across the company.
For more info, see www.ehanrick.com
|