WEB MINING: CREATING, ENHANCING, MINING AND ACTING ON WEB
DATA
by Jesus Mena, WebMiner
In their frenzy to be the next Amazon.com companies of all size and types
are scrambling to set up their e-commerce sites. They often concentrate on the
mechanics of transactional processing; setting up their inventory and shopping
carts -- but usually fail to plan for the vast amount of customer data their
site will generate. Most companies fail to see that in e-commerce success will
depend on how this Web data is leveraged to convert visitors into customers.
The Web data that is generated with a single sale is of more value then the
sale itself since it can lead to a long and profitable relationship with that
customer.
Every visit to a retailing site generates important consumer behavioral
data, regardless of whether a sale is made. Every visitor action is a digital
gesture exhibiting habits, preferences and tendencies. These interactions
reveal important trends and patterns that can help a company design a Website
that effectively communicates and markets its products and services. Companies
can aggregate, enhance and mine Web data in order to learn what sells, what
works and what doesn't, who is buying and who is not.
Creating Web Data
Since every visit to your Website signals a consumer's interest in your
product or service, it is vital that you closely scrutinize every interaction.
However, Web data is diverse and voluminous. Thus, to analyze e-commerce data
you must assemble the divergent data components captured via server log files,
form databases and Emails generated by visitors into a cohesive, integrated
and comprehensive view. If you plan ahead of time how you will capture
important customer information, you can more easily integrate and mine Web
data.
By planning strategically before you implement your e-commerce Website, you
can capture important information about your visitors' preferences and online
behavior. By taking the time to consider the overall design of your site,
such as what prompts and links you position in your home page, you can map
the movements of your visitors. In addition, by prompting for a quick and
short registration at the onset of a visit, an inquiry or a purchase you can
also capture important personal information which you can latter enhance and
mine.
One key to compiling and capturing this shopper information is a unique
identifier: a visitor id number. A proven strategy is having visitors
register initially at the site by enticing them with a special service or
incentive. Offer access to a special section of your site. Have contests or
door prizes. The point is that you need them to register in order to set a
cookie, which can be used as the unique id number. From that point the unique
key can enable the retailer to track every interaction with that visitor.
This unique key will allow the site to link log files and forms database with
the company's data warehouse and other demographic and household information,
ad server networks or collaborative filtering engines.
Enhancing Web Data
Server log files provide domain types, time of access, keywords, and search
engine used by visitors and can provide some insight into how a visitor
arrived at a Website and what keywords they used to locate it. Cookies
dispensed from the server can track browser visits and pages viewed and can
provide some insight into how often this visitor has been to the site and what
sections they wander into. Forms can provide important visitor personal
information, such as gender, age, and ZIP code. This is probably the most
important customer view since it contains information that can be used to
append additional data such as that from a data warehouse. You can also append
to visitors form information demographic and household data, including a
visitor's probable income, the type of auto they drive and the number of
children they have.
This external information can be linked to Website data and enable
additional insight into the identity, attributes, lifestyle and behavior of
visitors. It's available from various vendors, including Acxiom, Equifax,
Experian, MetroMail, Polk and others. There is an entire industry devoted to
segmenting, classifying and reselling consumer behavior information to
companies, including of course those with Websites.
In addition new providers of 'Webographics' have recently emerged who sell
either software or services, and sometimes both, for collaborative filtering,
relational marketing and visitor profiling. These new data providers
represent a whole new genre of Web companies seeking to capture and generate
information about Internet users' behavior and preferences. It includes such
firms as DoubleClick, Engage Technologies, Firefly, Net.Perceptions and
others. These new players used a myriad of solutions to track and profile
visitors -- everything from proprietary software and databases to commingling
cookies via server networks.
All of this internal and external information can be written to an Oracle
table, or a flat file, which then can be linked or imported into a data
mining tool. These include automated tools, which have principally been used
in data warehouses to extract patterns, trends and relationships and new
easy-to-use data mining data mining tools with GUI interfaces that are
designed
for business and marketing personnel. These data mining analyses can provide
actionable solutions in many formats, which can be shared with those
individuals responsible for the design, maintenance and marketing of an
e-commerce site.
Mining Web Data
So far most analyzes of Web data have involved log traffic reports, most of
which provide cumulative accounts of server activity but do not provide any
true business insight about customer demographics and online behavior. Most of
the current traffic analysis software, including NetIntellect, Bazaar Analyzer
Pro, HitList, NetTracker, Surf Report, WebTrends, and others offer predefined
reports about server activity based on the analysis of log files. This
basically limits the scope of these tools to statistics about domain names, IP
addresses, cookies, browsers and other TCP/IP specific machine-to-machine
activity.
On the other hand, the mining of Web data for an e-commerce site yields
visitor behavior analyses and profiles, rather than server statistics. An
e-commerce
site needs to know about the preferences and lifestyles of its
visitors. Data mining in this context is about addressing such business
questions as, "Who is buying what items and at what rates." You also would
like to know what is selling so you can adjust your inventory and plan your
orders and shipping. You need to know how to sell and what incentives, offers
and ads work, and how you should design your site to optimize your profits.
Data mining is a "hot" new technology about one of the oldest processes of
human endeavor: pattern recognition. Our hairy ancestors relied on their
ability to recognize the patterns of predators, paths, prey, and the seasons
to survive. Today, sites inundated with data -- generated daily by customer
visits -- are faced with the same challenge of recognizing the patterns of
opportunity and threat to their survival. One of the common traits of firms
who have traditionally used data mining is that they have mountains of
transactional data and find themselves competing for customer loyalty and
dollars in crowded markets -- where it cost little for customers to switch.
Which, if you stop and think about it, is a good description of the evolving
electronic commerce landscape.
A fast, competitive marketplace where millions of online transactions are
being generated (and captured) in log files and registration forms every hour
of every day -- a marketplace that doubles every 100 days. A marketplace where
online shoppers browse by retailing sites with their fingers poised over their
mouses, ready to buy or move on should they not find what they are looking for
- should the content, wording, incentive, promotion, product or service
of
that site not meet their preferences. A marketplace where browsers are
attracted and retained based on how well the retailer remembers the customers'
needs and whims. Where the goal is to know and serve every customer, one at a
time, and to build long-term, mutually beneficial relationships.
Data mining is the key to customer knowledge and intimacy in this type of
competitive and crowded marketplace. In hyper-competitive markets, the
strategic use of customer information is critical to survival. As such, AI in
the form of data mining, has become a mainstay to doing business in fastmoving
markets. In a networked electronic environment the margins and profits
go to the quick and responsive players who are able to leverage predictive
models to anticipate customer behavior and preferences. Data mining of
customer information is required in order to make decision about which
clients are the most profitable and desirable and what their characteristics
are in order to find more customers just like them. Electronic retailers and
advertisers are beginning to expect such customer profiling and business
knowledge from the Web after years of heavy investments and marginal ROIs.
The information that a merchant gathers from its site and mines can reveal
what products have cross-selling opportunities, or what information and
incentives the merchant should provide to its visitors based on their gender,
age, demographics and life style interests. The process involves capturing
important visitor attributes from server logs, cookies and forms and appending
to it household and demographic information, and then, using powerful
pattern recognition
technologies, such as neural networks, machine-learning and
genetic algorithms, profiling customers in order to predict their propensity
to buy.
Data mining solutions come in many types, such as association, segmentation,
clustering, classification (prediction), visualization, and optimization. For
example, using a data mining tool incorporating a machine learning algorithm a
Website database can be segmented into unique groups of visitors each with
individual behavior. These same tools perform statistical tests on the data
and partition it into multiple market segments independent of the analyst or
marketer. These types of data mining tools can autonomously identify key
intervals and ranges in the data, which distinguish the good from the bad
prospect. These types of data mining tools generally output their results in
the form of graphical decision trees or IF/THEN rules. This type of 'Web'
mining allows a merchant to make some projections about the profitability
potential of its visitors in the form of business rules, which can be
extracted, directly from the Web data:
IF search keyword is "PC_software" AND gender male AND age 24-29 THEN average projected sale amount is $267.26 <= Low
Or,
IF search keyword is "math_software" AND search engine YAHOO AND subdomain .AOL THEN average projected sale amount is $379.95 <= High
On the other hand, predicting customer propensity to purchase can also be
done using a data mining tool incorporating a back-propagation neural network.
Neural networks can be used to construct customer behavior models that can
predict who will buy, or how much they are likely to buy. The ability to learn
is one of the features of neural networks. They are not programmed as much as
trained. A neural network trains on samples and can construct predictive
models for "scoring" visitors' propensities to purchase behavior.
Typically, a neural network is "trained" on observations about data
relationships for example, "AOL sub-domains purchase printers but not
scanners." A net can gradually learn to detect this relationship and the
features of these types of consumers. Neural networks are basically computing
memories where the operations are association and similarity. They can learn
when sets of events go together, such as when one product is sold, another is
likely to sell as well, based on patterns it has observed over time.
Ten Steps to Mining Your Web Data
Before you start to mine your data you must define your objective and what
information you will need to capture to achieve your objective. For example,
you may need to issue visitor identification cookies when they complete
registration forms at your Website. This will enable you to match the
information captured from your forms, such as the visitor's ZIP code, with the
transaction information generated from your cookies. It will also allow you to
merge your cookie information, which will detail the locations where your
visitors go to while in your Website, with the specific attributes like age
and gender from your forms. Additionally, a ZIP code or visitor address will
allow you to match your cookie and form data with demographics and household
information matched from third-party data resellers.
You will likely need to scrub and prepare the data from your Website before
you begin any sort of data mining analysis. Log files, for example, can be
fairly redundant since a single "hit" generates a record of not only that
HTML but also of every graphic on that page. However, once a template,
script, or procedure has been developed for generating the proper recording
of a single visit, the data can be input into a database format from which
additional manipulations and refinements can take place. If you are using a
site traffic analyzer tool, this data may already be format-ready for
additional mining analysis. Keep in mind that several steps may be required
prior to undertaking your analysis, including the following ones, which are
discussed more fully in the book "Data Mining Your Website."
1. Plan Your Project: Identify Your Objective The mining
of your Website involves some advanced planning about what type and level of
information you intend to capture at your server and what additional data you
plan to match it with. This by itself will ensure your data mining efforts
will yield measurable business results. For example, you need to plan with
your Web team what kind of log, cookie and form information you intend to
capture at what juncture from your visitors.
Next, you need to decide with your business, sales and marketing teams what
kind of demographic and household information you need to purchase to merge
with your server data. In addition, you should consider asking your
information system team to help integrate your data mart or data warehouse and
customer database with your Web data.
2. Select Your Data: Once your business objective has
been defined, you must then select the Web server and company data for meeting
this goal. Here is a quick checklist:
- Is the data adequate to describe the phenomena the data mining analysis
is
attempting to model?
- Is there a common field in your Web data being used for linking to
other
databases?
- Can the data from your Web be consolidated with your data warehouse?
- Will the data being mined be the same and available after the analysis?
- What internal and external information is available for the analysis?
- How current and relevant is the data to the business objective?
- Are the data sets being merged consistent with each other?
- Who is knowledgeable about the data being gathered?
- Is there redundancy in the data sets being merged?
- What joins are needed for the various databases?
- Is there lifestyle or demographic data available?
3. Prepare The Data: Once the data has been assembled and
visually inspected, you must decide which attributes to exclude and which
attributes need to be converted into usable formats. Here is another
checklist:
- What condition is the data in, and what steps are needed to prepare it
for analysis?
- What conversions and mapping of the data are required prior to the
analysis?
- Are these processes acceptable to the users and the deliverable
solution?
- How skewed is the data, are log and or square transformation needed?
- Do you need to do 1-of-N conversion for categorical fields?
- How will you handle missing data and noise or outliers?
- Normalize dollar fields by dividing them by 1000?
- Convert purchase dates to continuous values?
- Convert addresses to sectors?
- Convert Yes/No field to 1/0?
4. Evaluate the Data: You should evaluate your data's
structure to determine what type of data mining tools to use for your
analysis. Here is a checklist:
- What is the ratio of categorical/binary attributes in the database?
- What is the nature and structure of the database?
- What is the overall condition of the data set?
- What is the distribution of the data set?
- How skewed is the data set?
As a general rule neural networks work best on data sets with a large number
of numeric attributes. Machine-learning algorithms incorporated in most
decision tree and rule-generating data mining tools work best with data sets
with a large number of records and a large number of attributes. Empirical
studies* have shown that the structure of the data critically impacts on the
accuracy of a data mining tool. For example, data sets with extreme
distributions (skew > 1 and kurtosis > 7) and with many
binary/categorical
attributes (> 38%) tend to favor machine-learning based data mining tools.
Often, derived ratios of input fields may be required in order to capture
the impact or the true value of the inputs -- to capture the "velocity of a
client value, such as profit or propensity to buy." For example, a common
derived ratio is one of debt-to-income, so that rather than using simply the
debt and income attributes as inputs, more can be gained by the ratio rather
than the individual values. In your Web analysis, the number of site visits or
the number of purchases made over time may provide a better insight into the
true value of a Web site customers:
- # of purchases/ # of visits: 7/9 = .77 Propensity to Purchase Ratio -
Amount of sales/ # of visits: $39/5 = 7.8 Profit Ratio
5. Format The Solution: As previously mentioned there are
a number of Web mining formats or solutions. When you evaluate your Web data
and set your business objectives you must select the format of your e-commerce
solution. Here is yet another checklist:
- what is the desired format of your solution: decision tree, rules, c
code,
graph, map?
- what is the goal of the solution: classification, regression,
clustering,
segmentation?
- how will you distribute the knowledge gained by the data mining
process?
- what are the available format options from the data mining process?
- what does management really need, insight or sales?
- what do you need from the data mining process?
You may need to use multiple tools in order to come up with the ideal Web
mining format for your Web site. For example, you may need to extract rules
from a clustering analysis. To do so you will need to first perform the
clustering analysis using a Self-Organization Map, or Kohonen Network. Next
you will need to run the identified clusters through a machine- learning
algorithm in order to generate the descriptive IF/THEN rules which "profile"
the extracted clusters. Conversely, you may need to first do an analysis
using a machine-learning algorithm on a data set with a large number of
attributes in order to compress it: to identify a few significant attributes.
Then run those significant attributes through a neural network for the final
classification model.
6. Select the Tools: To choose the right mining tool, you
must select not only the right technology but also must consider the
characteristics and structure of your data. Here is a checklist of data
related issues you should considered when selecting a data mining tool:
- Number of continuous value fields
- Number of dependent variables
- Number of categorical fields
- Length and type of records
- "Skeweness" of the data set
As a rule, machine-learning algorithms perform better on skewed data sets
with a high number of categorical attributes and with a high number of fields
per records. Neural networks, on the other hand, do better with numeric data.
7. Construct the Models: It is not until this stage
that you actually being mining your Web site files. Again, during the mining
process you search for patterns in a data set and generate classification
rules, decision trees, clustering, scores, and weights, and evaluate and
compare error rates. Here is a quick checklist of items to consider:
- What are the model error rates, and are they acceptable or can they be
improved?
- Is additional data available which could help the performance of the
models?
- Is a different methodology necessary to improve model performance?
- How many models do you require for your entire Web site?
- Train and test models using a random number seed?
- Output SQL syntax for distribution to end-users?
- Supervised learning or unsupervised learning?
- Incorporate C code into a production system?
- Integrate rules in a decision support system?
- Purge noisy and redundant data attributes?
- Classification, prediction or clustering?
- Monitor and evaluate results?
8. Validate the Findings: As previously mentioned, a data
mining analysis of your Web site will most likely involve individuals from
several departments, such as Information Systems, Marketing, Sales, Inventory,
etc. It most definitely will involve the administrators, designers, analysts,
managers, and engineers responsible for designing and maintaining the day to
day operations of your Web site.
It is important after you have completed your data mining analysis that you
share and discuss with all of them your findings. Domain experts, people who
are the specialists in their area, need to be briefed on the results of the
analysis to ensure the findings are correct and appropriate to your site's
business objectives. This is the sanity check step. You need to be objective
and focused on your initial goal for mining your Web site. If your data
mining results are faulty whether its due to the data, tool or methodology,
you may need to do another analysis and reconstruct a new set of models with
your domain experts' participation and input.
9. Deliver the Findings: A report should be prepared
documenting the entire Web mining process, including the steps you took in
selecting and preparing your data, the tools you used and why, the tool
settings, your findings, and an explanation of what the code that was
generated is supposed to do, etc.
As with any business process you need to establish for your Web mining
initiative both baselines and procedures. In your analysis report you need to
comment on the results of the data mining analysis, stating whether it meets
the business objective of your Web site. If for some reasons it doesn't, you
should state why not. You may want to include in your report how the data
mining analysis results can be improved, such as by the addition of different
or new data. You might merge external demographic and household information
or capture better information via newly designed registration forms or
cookies.
10. Integrate the Solutions: This final step is really a
commitment to continue the process of learning from your firms online
transactions. This process involves incorporating the findings into your
firm's business practices, marketing efforts, and strategic planning. Web
mining is a pattern recognition process involving hundreds, thousands or maybe
millions of daily transactions in your Web site. This final step of your Web
mining analysis also involves monitoring the performance of the models that
you have generated. All models will age and their performance will
deteriorate, so you must monitor the accuracy of your Web mining models. Be
prepared to re-train and test new ones. Because today's business environment,
especially the Web and the data it generates, is highly dynamic, economic
conditions change and the models you build or analysis you perform will likely
need to be readjusted or re-done over time.
Clearly not all of these ten steps are required, but you should consider
them prior to starting any in-depth analysis. They certainly do not always
follow this exact sequence, but in most assignments I've undertaken these
steps represent the issues that needed to be resolved before we could complete
the project. In most of my previous data mining projects, I analyzed customer
information files, datamarts, and data warehouses from retailers, banks,
insurers, phone, and credit card companies, but they typically dealt with the
same client-centered issues or questions, mainly: Who are the customers? What
are their features? And how are they likely to behave? Electronic retailers
face the same questions today.
Acting on Your Web Mining Solutions
Most likely you will need to do your Web mining on a separate server
dedicated to analysis. After your analysis you will need to validate your
results through some sort of production system such as a marketing test Email
campaign. Note that the costs involved with email versus physical mail or
phone calls allow for a very rapid assessment of your Web mining and marketing
efforts. It is certainly a very economical way to evaluate your Web mining
project: it only costs about five cents to Email a potential customer,
compared with as much as five dollars for direct mail and eight to twelve
dollars for a phone sales call. Planning and executing a traditional marketing
campaign used to take months; today on the Web an Email campaign can take
hours. The Web has accelerated the trend toward one-to-one marketing and the
validation of Web mining results by allowing the rapid evaluation of
predictive models.
It is not difficult to assess the benefits of Web mining and its return-on-
investment (ROI). Simply consider the quantitative counts of clickthroughs of
ads or banners prior and after your Web mining analysis. Consider the
percentage of sales or requests for product information, as well as the
amounts of purchases made as a result of a Web mining analysis. Consider the
rates prior to your data mining efforts and afterwards. If you initiate a
marketing Email campaign on the basis of your data mining analysis, consider
the rate of responses by splitting your Emails between those individuals
targeted via your analysis and those excluded from the targeting. Compare the
improved rate of responses and sales from those targeted via the Web mining
analysis to those without it.
The dynamics of your industry and marketplace will dictate how often you
should mine your Website data. The intervals for mining your data will depend
on how often the attributes of your customers change. For example, a bank may
have a cross-selling model for its call site that can be quite effective for
months. The intervals in which the bank model are created may take place on a
quarterly or monthly basis and still be relevant to the business questions
they are trying to answer, such as cross-selling opportunities of their
financial products like CDs, bankcards, loans, etc. For a portal, such as a
search engine, models may need to be refreshed on a weekly basis, because the
dynamics of the content, their visitors, and their features change more
quickly than those for a bank's customers. The end products the portal is
trying to predict are also subject to change more frequently, for a bank it
is a loan, for a portal it is an ad.
For an Internet company, which exists completely on the Web, the Web mining
process represents a biofeedback system to its entire supply chain. Web
mining can identify for electronic retailers key market segments, which can
impact directly on its overall Website design and inventory control systems.
As with physical retailers, by leveraging data mining Web retailers can
position the right message, product, and service in front of the right
customers at the right time in the right format.
Web mining is not an isolated process carried in a vacuum; it must be
integrated into the entire electronic retailing and marketing processes. This
is especially true with virtual storefronts because everything- selections,
transactions, orders, customer communications-is accelerated to "Internet
time." For a Website entirely supported by advertising, data mining is even
more critical since it can quickly discover and measure the effectiveness of
a multitude of banners and ads on its continuous stream of visitors.
Electronic retailing changes not only the distribution and marketing of
products but more importantly it also alters the process of consumption and
the related transactions of buying and selling. The data, which is an
aftermath of every product and service purchased on the Web, is the core ore -
- which can be mined to develop customize products, forecast demand, profile
customers and improve relational marketing. Because of the interactive nature
of electronic retailing; consumers not only order and buy product online,
they can also indicate in some venues (auctions) their willingness to pay
price points.
The act of retailing on the Web is an interactive one in which the consumer
can negotiate, exchange information, specify and customize the product and
services they wants from the retailer. For the electronic retailer it is of
paramount importance that they analyze what consumers are doing and saying.
Web mining can serve retailers by providing them the technology to segment,
model and predict how to sell more, learn what's working and what's not and
quickly adjust their marketing, pricing, inventory and communications.
Web Mining is a Process: Pulling it All Together Use segmentation analysis
to stratify your Email offers to prospects you have identified via your mining
analysis. Use targeted Email to provide incentives only to those individuals
likely to be interested in your products or services. Remember that Email to
individuals who you know little about will be little more than Spam.
Automatically reply, route, manage and segment Emails so you can efficiently
and effectively respond to your customers through Email via direct marketing.
Provide prompt customer service via auto or segmented Email.
Use your Web mining analysis to discover your customers demographics,
consumer preferences, values and lifestyles. Incorporate your knowledge about
your customers in the tone, manner and method by which you communicate with
them. Look for similar attributes of your current customers and new future
prospects. Manage your customer contacts as you interact with them online and
offline. Pull together customer and transactional data as you interact with
them through sales calls, meetings, phone and email inquiries -- as they buy
your product and services.
Track your marketing ad efforts to know what works and why. Monitor what ads
are working and which actually lead to sales. Develop profiles that include
demographics, tastes and Email addresses of your best prospects. Manage your
back-end logistics effectively via your supply chain. Close the supply chain
in the inventory loop, translate the knowledge of your customer's tastes and
purchases into a quick turnaround by customizing your products and services
for them.
As billions of business interactions evolve and organize themselves into
revenue streams, subtle transformations occur between consumers and retailers
in this dynamic marketplace. The mining of Website data -- with AI-based
tools, like neural networks and machine-learning and genetic algorithms,
themselves programs designed to mimic human functions -- is an attempt to
recognize, anticipate and learn the buying habits and preferences of
customers in this new evolutionary, mutating business environment.
It is of paramount importance that retailers in a networked economy such as
this be adaptive and receptive to the needs of their customers. In this
expansive, competitive, and volatile environment Web mining will be a critical
process impacting every retailer's long-term success, where failure to quickly
react, adapt, and evolve can translate into customer "churn" with the click of
a mouse. Electronic retailing represents a growing exchange of data between
consumers and retailer, evolving and changing -- much as an organism develops
a nervous system.
Data Mining Your Web Site Excerpted from "Data Mining Your Website" by Jesus
Mena. Copyright 1999 by Jesus Mena. ISBN # 1-55558-222-2. Excerpted by
permission of Digital Press, a division of Butterworth Heinemann. All rights
reserved.
Courses by The Modeling Agency, This article was provided with permission by
The Modeling Agency (TMA). TMA provides solutions, consulting and training in
information modeling, data mining, business intelligence, decision support
systems. Background on TMA and public course listings - to include courses
in data mining and personalizing Ecommerce - may be referenced on the Web at:
www.the-modeling-agency.com or
call TMA toll-free at 888-742-2454.
|