[ Table of Contents | NEXT ARTICLE ]

FOLLOWING HOLIDAY DATA TRAILS
by Ed Colet


Early and ongoing trends indicate that this holiday season could be an E-Christmas for many. There are plenty of online purchases being made, and the rosy e-business and e-commerce projections of high traffic and high revenue appear to be coming to fruition. For those of us involved in DS* activities such as data mining, there's a lot to learn from the electronic data trails made from online purchases. But there is an important point to be noted regarding this holiday season. Business-as-usual analytical techniques, if not modified, could lead to misleading conclusions and therefore it's important to interpret things within an appropriate context (in this case, knowing that it's the holiday season).

In a way, there are as many crowds in cyberspace as there are in the shops on Main Street USA. Popular sites such as BarnesandNoble.com, Startrekstore.com, Macys.com and E-Bay have reported slowdowns and shutdowns due to the holiday rush of up to 10 million hits per day. The rush to cyberspace has helped drive sky-rocketing projections associated with e-commerce. Reports indicate that web site traffic for the week of December 4 through the 10th is up 80% from the previous week, with department store web site traffic up 137%. Overall online sales revenue projections for the year are estimated to range from $2.3 billion to $2.55 billion, up from a range of $800 million to $1.1 billion in 1997. Customer testimonials have expressed a willingness to move from a retailer's traditional non-Internet sales channels onto their web site, buying into the promises of greater convenience, richer interactivity and better personalization that retailers have been promoting.

Much of these online activities leave data trails that if mined carefully can lead to many interesting findings. We already know fairly detailed facts and trends about purchases that have been made in this holiday season. For example, in an online world that is typically predominantly male, it's discovered that most of the online purchasing has been done by women; 55% of the audience are females aged 12 and above, and 23% of the online audience is a subset of women between 18-34 years old. We even know that of the gay people that shop online there are more men than women, and these men range in age from 25-44, with an annual income of about $57,000. We know that consumers are shopping at sites to mostly buy toys, books, music and movies, and clothing.

Transforming raw data about visitors to a site into potentially useful information is a complex task. It's typically good practice for a web-site (retailer) to separate the more dynamic operational databases that handle transaction processes away from the relatively static analytical databases that store aggregated data and projected estimates. The processes that convert data from one source database to the other are non-trivial. Transaction reports, database table updates, sales reports, and other sources of information, all have interdependencies that need to be managed. The complexity is compounded by the finding that even within a single organization, it's typical to find a lack of data standardization, and that an enterprise-wide language and common definitions are used less than half the time. If transformational and analytical processes are not designed well, then apparently contradictory numbers can result. For example, it's been asked how the total e-commerce sales for 1998 can be approximately $2 billion if other indicators show that Dell alone sells $6 million a day through the Web?

Analytical techniques that involve seasonal data also need to be examined carefully for misleading conclusions. Online purchasing and e-commerce can potentially provide the consumer with a more personalized shopping experience. The Internet affords retailers with an unprecedented opportunity for one-to-one marketing, i.e. marketing and selling specific products to specific customers, and providing personalized interactions to consumers. For example, at Land's End, women can enter some data about their height, weight, etc and "try on" selected clothing combinations that would fit well. CD-Now, Amazon.com, and others rely on the method of collaborative filtering to be able to recommend CD's and books that are likely to match a specific customer's tastes and interests.

A technique such as collaborative filtering requires plenty of data to be effective (otherwise the underlying data representation is too sparse). But as we all know, much of the data from the recent online purchases made during this holiday season are obviously gifts for others. For one-to-one marketing to succeed, gift purchases need to be distinguished from personal-use purchases. Other than explicitly asking the user, it's possible for a site to determine which purchases are gifts. A site can check if the shipping address differs from the billing address, whether gift wrapping was requested, or as Amazon.com can do - email the recipient a notification of a gift and ask that he/she send their shipping address back to Amazon because the purchaser didn't know the shipping address. In these circumstances, an unmodified version of collaborative filtering won't be able to recommend a product that the purchaser might like, but rather it can only recommend a product that a purchaser might like to buy as a gift for someone else (whether the recipient might actually enjoy the gift is of course, a different matter altogether). But unless a site can adequately determine that a product is not intended for use by the actual purchaser, then standard techniques (such as collaborative filtering) for recommending a product to a customer will be grossly inaccurate. Thus, some of the benefits to the consumer for buying online may be lost. And much of the online data and knowledge to be learned is essentially wasted.

Whatever data mining or analysis is conducted must be interpreted carefully within an appropriate context or frame of reference. In the scenario discussed here, part of that frame of reference is the knowledge that the online purchases that consumers have made in the past month are for products intended for others rather than themselves, and this has implications for one-to-one marketing. Often, having a "human in the loop" that knows that it's Christmas can be much more important than any automated analyses.

---

Ed Colet can be reached at edcolet@virtualgold.com.


[ Table of Contents | NEXT ARTICLE ]