[ Table of Contents | NEXT ARTICLE ]

LOOKING AT DATA MINING THROUGH A PORTAL
by Ed Colet


Portals are becoming an increasingly important trend in IT today. It's anticipated that sales of enterprise portal software will be $15 billion by 2002. A portal is a browser interface that provides access to information content. Yahoo is an example of a portal for viewing the content on the World Wide Web. An emerging trend is the use of portals not only to access WWW content but to also access corporate data. The concept of a corporate data portal promises to be useful for a variety of reasons. It provides dynamic and customized data access, and easier management of views into distributed and diverse data sources. But as this column shows, explicit data mining considerations can be especially relevant and important as well.

The February 22nd issue of PCWeek has an article about the development of enterprise portals to provide better access to corporate data. The overall goal of a data portal is to facilitate access to a variety of corporate data stores. These data stores could include standard databases, billing and sales data, real time data, performance indicators, news feeds, information from transactional and financial systems, and even other data sources such as email, and text documents. The general architecture is to have these diverse data source connected with a portal server, and then a browser based portal interface residing on a client connects to this server. The client can also be directly connected to back end data stores such as the web (where the web is considered to be a back end data store). The objective of the architecture is to provide an effective way to publish, organize and access the data for a user to view.

Developing an architecture that makes it easier for users to publish, organize and access data will result in users having better access to more information than they've had before. But there is then a potential risk of information overload. This risk can be addressed via a data mining approach, which seems to have been overlooked. Part of the appeal of portals is that it is easy for an individual user to customize their portal to view information most useful and relevant to their needs. The nature of how a particular view is customized to an individual user differs among enterprise portal developers, but none currently involve data mining. It seems to me that it is also entirely possible that a user's customized view can be determined through data mining. The approach would be to look at the data stores that a particular user accesses most frequently, and the links and documents that are followed most often - and then use this pattern to evolve the user's customized view into the corporate data stores.

The dynamic and customizable nature of the portal interface is appealing to IT departments because it lessens their development and support load - especially if increasing numbers of users have access to data. A customized portal view is much less demanding for IT departments, since they wouldn't have to support each individual user's access to specific data stores. Portals also provide different user groups with access to data very easily. By creating a portal that can be extended to extranets, the external customers of a corporation can gain better access to relevant data (but an open issue with respect to this is security).

Providing better access for many users to data is one thing, making good use of the data by these users is another. Currently, some portals come with some elementary search and query capabilities. One manufacturer is working on a portal view that has access to a ROLAP engine - and is therefore able to do some analysis on large sets of aggregated data and provide answers in very rapid response times considering the amount of data that is analyzed. But OLAP is not data mining, and the next step is obviously to incorporate true data mining capability via a portal. That is, rather than querying for specific answers (e.g. "what is the amount of sales by geographic regions?"), one would like to see capabilities for exploratory data mining (e.g. "what is interesting about sales?"). Some manufacturers are trying to approach this kind of capability by alerting the user of business development changes based on data, and/or features that deliver scheduled data reports.

To conclude, portals appear to hold promise for providing better access to data, and there is promising development work underway along this line. If coupled with the effective use of data mining technologies portals can become even more important.

---

For more information, see http://www.virtualgold.com.


[ Table of Contents | NEXT ARTICLE ]