Next Article Table of Contents Previous Article

MITIGATING RISK WITH DM WHEN MIGRATING TO PROCUREMENT CARDS
by Ben A. Hitt, Ph.D.

Many large organizations are moving to a system of acquisition through the use of procurement cards. Purchasing agents use the cards much in the same way as credit cards or debit cards. Procurement card systems are implemented to streamline the procurement process and reduce the time and paperwork associated with the purchase order system used previously. However, while the procurement card system results in significant savings in time and processing costs, it suffers from an increased potential for misuse.

Most procurement card systems capture transaction data effectively. This feature of the system holds a key for mitigation of the misuse risk mentioned above. The commercial credit card industry has recognized for some time that their customers establish habitual patterns of use and that any substantial deviation from this pattern of use is suspect of fraudulent use of a stolen card or card information. It is likely that purchasing agents using procurement cards ("P-Cards") appropriately will also form patterns of usage behaviors and that abusive use patterns will differ.

Therefore, it would seem that application of fraud detection methods to procurement card use is appropriate. However, fraud detection systems to date have relied on the availability of historical examples of both proven fraudulent use and non-fraudulent use.

A prototype procurement card monitoring solution was designed by American Heuristics Corporation (AHC) with the support of The Modeling Agency (TMA) at the University of Illinois (UI). UI did not have a historical database to support misuse detection in the manner associated with accepted fraud detection practice. To date, no verified abusive transactions have been captured. This obviates the use of detection systems that require examples of fraudulent usage patterns. This situation presents a challenge and an opportunity. The use of adaptive pattern recognition technology was proposed in order to provide a means of early detection of abusive usage patterns and direct audits so as to reduce inappropriate procurement card use.

The need for non-linear pattern-based modeling

As means of data collection have become more capable, the need for nonlinear modeling techniques has become more and more apparent. Traditional statistical methods rely on an assumption of linearity. However, since most of the data collected concerns or is the result of human behavior and humans rarely behave linearly, methods that assume linear separability are ultimately doomed to failure. Furthermore, data collection streams are broadening. The number of variables of concern to modelers has increased by at least an order of magnitude. Traditional methods simply were not designed to work with one hundred or more variables.

In answer to this, the last decade has seen the emergence of neural networks as means of non-linear modeling. The devices resulted from the efforts of a number of cognitive scientists to mimic learning and memory in the human brain. The back-propagation neural network, in particular has proven successful in creating useful models from large masses of complex data. The algorithm has been successfully applied in variety of settings including direct marketing, intelligence and fraud detection. Because of its pattern recognition nature it has proven robust with respect to missing data and other data irregularities.

Deficiencies in back-propogation

The back-propagation neural network is not without its shortcomings. It requires large amounts of relevant data on all possible classes. Its usefulness is limited to the range within which it was trained. New patterns are, as often as not, misclassified. The training cycle is highly iterative and time consuming. Finally, the contribution of individual variables to the result is difficult to ascertain. The very qualities that make the algorithm robust, make the results nearly impossible to explain.

Overcoming back-prop limitations

There are at least three algorithms that address the deficiencies referred to above. They are Fuzzy Adaptive Resonance Theory networks, Adaptive Feature Maps and AHC's Adaptive Fuzzy Feature Map. All three enjoy the advantage of having the weights retain meaning within a real world data space. That is, one can, by looking at the vector of weights connected to a particular node, visualize the prototypical pattern that node represents. Thus explain functions that simply compare the weights between two or more nodes and reveal the variables that are discriminatory are easily written. It is entirely possible, for example, to de-scale the weights and subject them to a rule-generating algorithm (ID3 for example) and produce a usable knowledge base automatically.

The three methods are not limited by the training data available. Each contains a mechanism for recognizing novel patterns in the data stream to which they are attached. Thus they are capable of continual learning and retention of experience. Proper use of this feature means that training is a "one pass" procedure. That is, each pattern vector only needs to be presented once in order to be classified appropriately and impact a set of weights. While the algorithms become more adept at their job over time, they also produce meaningful results almost immediately.

Problems with Fuzzy ART and Feature Mapping

Both Fuzzy ART and Adaptive Feature Maps have inherent defects that limit their use. In the case of Fuzzy ART, the problem lies in the learning function. The fuzzy AND operator (min(x,y)) is a major factor in the formula. Its use results in a constant trend toward minimization. Subsequently, over time, details of the pattern are lost and inappropriate pattern proliferation occurs. That is, pattern vectors are seen as being novel when they would have been seen as belonging to a specific class at an earlier time. The primary problem with feature mapping is that it uses distance as the classification determinant. Patterns that are close in an Euclidean sense are grouped together in the same class. Now, it is easy to envision two distinct patterns that are sufficiently close in Euclidean distance to be placed in the same class. It is important to note at this point, that Fuzzy ART's fuzzy pattern match does not result in wrongly classified patterns as in feature mapping and feature mapping does not result in the degradation of the prototypical pattern, as does Fuzzy ART.

The Adaptive Fuzzy Feature Map

Recognizing the problems associated with both Fuzzy ART and feature maps, AHC has developed an algorithm that combines some of Fuzzy ART with feature mapping so that the major problems are obviated. The Adaptive Fuzzy Feature Map (AFFM, patent pending) uses Euclidean distance as a ranking parameter and in the learning function. It therefore retains pattern detail over time. It employs fuzzy pattern matching to determine which node a pattern vector is associated with. In the node selection process, distance and fuzzy pattern matching are resonated in the following manner. First, each of the organized nodes is ranked according to how close they are to the current pattern vector. Then, beginning with the closest node, a fuzzy match parameter is computed. If the match parameter meets or exceeds a pre-defined minimum, the node is selected and its weights are adjusted so as to move the node slightly closer to the pattern. If no node is selected, then a new node is organized so that its weights are equal to the pattern vector.

AFFM retains the advantages of Fuzzy ART and feature mapping and eliminates the two problem areas. It is still a "one pass" method and as such can be used in real time and near real time systems. Because the weights represent a coordinate in an N-dimensional space it is easy to translate them back to the real world data space for analysis. Indeed, multi-dimensional displays may be derived directly from AFFM weights. Correlation with events can be added simply as a frequency counter or rule generation algorithms may be employed as well.

Adaptive pattern recognition applied to misuse detection

Adaptive pattern recognition also addresses a major shortcoming of supervised modeling methods i.e., data limitation. Because of supervised modeling method's need for historical cases of both normal and abusive behaviors, supporting databases very often are limiting even where there is an apparent abundance of data. While abusive use losses are significant, the actual occurrence is rare with respect to legitimate use. Where commonly used modeling methods require that the ratio of normal cases to abnormal cases is close to one to one, experience has shown that three to one is a practical limit.

The FBI estimates that only about two percent of business transactions are fraudulent and that only a few of those are proven. In a typical credit card setting, out of one million transactions only one thousand fraudulent transactions might be found. This being the case and following the guidelines for accepted modeling practice, only four thousand records can be used for model development. If the same expectations of misuse hold for the University P-Card system, the data will be severely limiting. Adaptive modeling techniques are designed to detect rare and novel events and do not operate under the same conditions as supervised modeling methods. There is no need to balance normal and abnormal transactions.

The second major shortcoming of supervised modeling methods is that they are static. Once trained, they are fixed and only able to accurately recognize behaviors presented to them during development. When presented with new behaviors, the model will classify them according to the behaviors presented during training. A misclassification is likely to result. Adaptive modeling methods are fluid. If a new pattern of behavior is presented to them, they spawn a new class based on that behavior and gain experience about it as more observations are made.

Adaptive pattern recognition technologies primarily sort an amorphous collection of behavioral patterns into a number of relatively homogenous clusters. Once this has taken place, the results of an audit of a small sample of instances in the cluster in all likelihood apply to the cluster as a whole. This provides for early detection of abusive behavior. It likely that P-Card misuse patterns will differ from normal patterns. An adaptive system will note this as novel and trigger an alert. The transaction then can be investigated immediately and a determination of legitimacy can be made. As the system gains experience, auditing resources can be directed toward those usage behaviors more likely to be abusive. This kind of audit targeting should ultimately deter P-Card misuse based a relatively high probability that violators will be caught.

Advantages of the adaptive approach

The primary advantage stemming from the use of AFFM is that the University can take a proactive approach to P-Card monitoring. Supervised methods, such as neural networks require a substantial number of proven cases reflecting misuse for model development. It is possible to generate artificial misuse instances, but it is unlikely that they will accurately reflect real cases. New patterns of P-Card misuse are likely to occur. The supervised methods produce static models and are likely to miss novel patterns of misuse. In order for P-Card misuse to be detected, it must exhibit a pattern that is somewhat distinctive. Adaptive pattern recognition will sort out the various patterns of use and when coupled with directed audits, be able to detect abusive behaviors as they happen.

Solution benefits

When developed further, the procurement card transaction monitoring prototype should help organizations to:

  • Migrate to procurement cards sooner because of the reduced risk
  • Raise the transaction limit for procurement card purchases
  • Detect and prevent misuse down to the transaction level
  • Report usage trends and forecast system growth and resource needs
  • Allow for more efficient use of investigating staff through targeted audits

Adapted for newsletter by Eric A. King

Top of Page


Previous Article  |  Table of Contents  |  Next Article