[ Table of Contents | NEXT ARTICLE ]

THE IMPORTANCE OF STANDARDS
by Ed Colet


Formulating and proposing technology standards has become a significant part of today's technical activity. Standards can affect the commercial success of developed applications. In this column, I comment on the role of standards today compared to a not so recent past, and also point out how data mining is influencing some of the standards in related domains such as databases.

In the pre-Internet age, standards simply evolved and succeeded based on success of the applications in the marketplace. Witness the success of VHS rather than Betamax as the de facto standard for consumer videotapes. It also used to be the case that standards only became an issue subject to formal reviews and ratification by the appropriate authoritative body only after things had stabilized. But formally defining a standard at this point in essence merely established an order upon an already ordered state of events.

In this highly technological Internet age the old view of standards is rapidly changing. Gone is the notion of waiting for things to stabilize and let de facto standards appear. Perhaps this is because in an environment of rapid development it is constant change rather than stable equilibrium that is the norm. Today, a new standard is defined, proposed, and evaluated early- sometimes well in advance of any apparent need for it. Like the past, a standard is still intended to impose order. Because things are in flux, an imposed order serves to streamline development. By guiding progress along lines that are consistent with standards, it becomes easier for different applications to become compatible with each other. In the Internet age, these compatibilities are critical to the successful movement of information across the web and among applications.

Timing is absolutely critical for standards. If standards are proposed too early they can become too restrictive. A standard is defined by a set of specifications. These specifications obviously can not include unforeseen technical advances. If such future technical advances prove beneficial but are out of the scope of the specifications, then the standard is too restrictive. A consequence is that the standard won't be followed but ignored instead. If a standard is proposed too late, then it can actually be counter-productive. In order for applications to be compliant with a standard proposed late, significant re-engineering may be necessary. Rather than moving forward, it becomes necessary to take a step backwards.

The solution to the timing problem is to have a sound and timely process of reviewing and ratifying standards. The W3C (World Wide Web Consortium) is a global academic and industrial consortium with a charter for developing standards relevant to the web. Consortium members are organizations actively involved in technology developments, and have a say (and stake) in refining and ratifying proposed standards. Public services include information repositories and sample code implementations to embody and promote standards as well as demonstrations of new technology. But even with a governing body to review standards, the development community can still moves forward developing applications upon promising technologies. XML is an example of an apparently adopted standard well before any formal ratification.

In the data mining area, there are ongoing efforts related to standards. This is a good sign because it indicates that the domain has a sufficient number of parties actively working in the area and that there is a recognized need to ensure compatibilities with each other's development efforts. Doing so, ensures that the field moves forward in a coordinated manner. For example, PMML (predictive modeling markup language) is a newly proposed standard for ensuring that the data mining results from one application can be more easily used by another.

Data mining is also heavily dependent on the standards that exist in related technologies. To some extent, data mining is a driving force for new standards in these related technologies. For example, data mining is closely linked with database technologies. Database technologies have accepted standards such as SQL and ODBC which address the way to query and the way to access content stored in databases. But data mining has shown that SQL can have performance limitations for large scale databases and has driven efforts at developing new querying approaches (but presently un-standardized). And relative to accessing content, there are reviews underway about an OLE DB standard for accessing and sharing database content for data mining.

Even if one is not actively involved in proposing or reviewing standards there are good reasons to be familiar with work that's underway. One is that it provides an understanding of the present limitations of the status quo of existing technologies. Second is that knowing what standards may be adopted can affect decisions about long-term planning and strategic directions. Today's standards are an important influence on future technologies.


Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.

For more information, see http://www.virtualgold.com.


[ Table of Contents | NEXT ARTICLE ]