MITIGATING RISK WITH DM WHEN MIGRATING TO PROCUREMENT CARDS
by Ben A. Hitt, Ph.D.
Many large organizations are moving to a system of acquisition through the
use of procurement cards. Purchasing agents use the cards much in the same way
as credit cards or debit cards. Procurement card systems are implemented to
streamline the procurement process and reduce the time and paperwork
associated with the purchase order system used previously. However, while the
procurement card system results in significant savings in time and processing
costs, it suffers from an increased potential for misuse.
Most procurement card systems capture transaction data effectively. This
feature of the system holds a key for mitigation of the misuse risk mentioned
above. The commercial credit card industry has recognized for some time that
their customers establish habitual patterns of use and that any substantial
deviation from this pattern of use is suspect of fraudulent use of a stolen
card or card information. It is likely that purchasing agents using
procurement cards ("P-Cards") appropriately will also form patterns of usage
behaviors and that abusive use patterns will differ.
Therefore, it would seem that application of fraud detection methods to
procurement card use is appropriate. However, fraud detection systems to date
have relied on the availability of historical examples of both proven
fraudulent use and non-fraudulent use.
A prototype procurement card monitoring solution was designed by American
Heuristics Corporation (AHC) with the support of The Modeling Agency (TMA) at
the University of Illinois (UI). UI did not have a historical database to
support misuse detection in the manner associated with accepted fraud
detection practice. To date, no verified abusive transactions have been
captured. This obviates the use of detection systems that require examples of
fraudulent usage patterns. This situation presents a challenge and an
opportunity. The use of adaptive pattern recognition technology was proposed
in order to provide a means of early detection of abusive usage patterns and
direct audits so as to reduce inappropriate procurement card use.
The need for non-linear pattern-based modeling
As means of data collection have become more capable, the need for nonlinear
modeling techniques has become more and more apparent. Traditional
statistical methods rely on an assumption of linearity. However, since most of
the data collected concerns or is the result of human behavior and humans
rarely behave linearly, methods that assume linear separability are ultimately
doomed to failure. Furthermore, data collection streams are broadening. The
number of variables of concern to modelers has increased by at least an order
of magnitude. Traditional methods simply were not designed to work with one
hundred or more variables.
In answer to this, the last decade has seen the emergence of neural networks
as means of non-linear modeling. The devices resulted from the efforts of a
number of cognitive scientists to mimic learning and memory in the human
brain. The back-propagation neural network, in particular has proven
successful in creating useful models from large masses of complex data. The
algorithm has been successfully applied in variety of settings including
direct marketing, intelligence and fraud detection. Because of its pattern
recognition nature it has proven robust with respect to missing data and other
data irregularities.
Deficiencies in back-propogation
The back-propagation neural network is not without its shortcomings. It
requires large amounts of relevant data on all possible classes. Its
usefulness is limited to the range within which it was trained. New patterns
are, as often as not, misclassified. The training cycle is highly iterative
and time consuming. Finally, the contribution of individual variables to the
result is difficult to ascertain. The very qualities that make the algorithm
robust, make the results nearly impossible to explain.
Overcoming back-prop limitations
There are at least three algorithms that address the deficiencies referred
to above. They are Fuzzy Adaptive Resonance Theory networks, Adaptive Feature
Maps and AHC's Adaptive Fuzzy Feature Map. All three enjoy the advantage of
having the weights retain meaning within a real world data space. That is, one
can, by looking at the vector of weights connected to a particular node,
visualize the prototypical pattern that node represents. Thus explain
functions that simply compare the weights between two or more nodes and reveal
the variables that are discriminatory are easily written. It is entirely
possible, for example, to de-scale the weights and subject them to a
rule-generating
algorithm (ID3 for example) and produce a usable knowledge base
automatically.
The three methods are not limited by the training data available. Each
contains a mechanism for recognizing novel patterns in the data stream to
which they are attached. Thus they are capable of continual learning and
retention of experience. Proper use of this feature means that training is a
"one pass" procedure. That is, each pattern vector only needs to be presented
once in order to be classified appropriately and impact a set of weights.
While the algorithms become more adept at their job over time, they also
produce meaningful results almost immediately.
Problems with Fuzzy ART and Feature Mapping
Both Fuzzy ART and Adaptive Feature Maps have inherent defects that limit
their use. In the case of Fuzzy ART, the problem lies in the learning
function. The fuzzy AND operator (min(x,y)) is a major factor in the formula.
Its use results in a constant trend toward minimization. Subsequently, over
time, details of the pattern are lost and inappropriate pattern proliferation
occurs. That is, pattern vectors are seen as being novel when they would have
been seen as belonging to a specific class at an earlier time. The primary
problem with feature mapping is that it uses distance as the classification
determinant. Patterns that are close in an Euclidean sense are grouped
together in the same class. Now, it is easy to envision two distinct patterns
that are sufficiently close in Euclidean distance to be placed in the same
class. It is important to note at this point, that Fuzzy ART's fuzzy pattern
match does not result in wrongly classified patterns as in feature mapping and
feature mapping does not result in the degradation of the prototypical
pattern, as does Fuzzy ART.
The Adaptive Fuzzy Feature Map
Recognizing the problems associated with both Fuzzy ART and feature maps,
AHC has developed an algorithm that combines some of Fuzzy ART with feature
mapping so that the major problems are obviated. The Adaptive Fuzzy Feature
Map (AFFM, patent pending) uses Euclidean distance as a ranking parameter and
in the learning function. It therefore retains pattern detail over time. It
employs fuzzy pattern matching to determine which node a pattern vector is
associated with. In the node selection process, distance and fuzzy pattern
matching are resonated in the following manner. First, each of the organized
nodes is ranked according to how close they are to the current pattern vector.
Then, beginning with the closest node, a fuzzy match parameter is computed. If
the match parameter meets or exceeds a pre-defined minimum, the node is
selected and its weights are adjusted so as to move the node slightly closer
to the pattern. If no node is selected, then a new node is organized so that
its weights are equal to the pattern vector.
AFFM retains the advantages of Fuzzy ART and feature mapping and eliminates
the two problem areas. It is still a "one pass" method and as such can be used
in real time and near real time systems. Because the weights represent a
coordinate in an N-dimensional space it is easy to translate them back to the
real world data space for analysis. Indeed, multi-dimensional displays may be
derived directly from AFFM weights. Correlation with events can be added
simply as a frequency counter or rule generation algorithms may be employed as
well.
Adaptive pattern recognition applied to misuse detection
Adaptive pattern recognition also addresses a major shortcoming of
supervised modeling methods i.e., data limitation. Because of supervised
modeling method's need for historical cases of both normal and abusive
behaviors, supporting databases very often are limiting even where there is an
apparent abundance of data. While abusive use losses are significant, the
actual occurrence is rare with respect to legitimate use. Where commonly used
modeling methods require that the ratio of normal cases to abnormal cases is
close to one to one, experience has shown that three to one is a practical
limit.
The FBI estimates that only about two percent of business transactions are
fraudulent and that only a few of those are proven. In a typical credit card
setting, out of one million transactions only one thousand fraudulent
transactions might be found. This being the case and following the guidelines
for accepted modeling practice, only four thousand records can be used for
model development. If the same expectations of misuse hold for the University
P-Card system, the data will be severely limiting. Adaptive modeling
techniques are designed to detect rare and novel events and do not operate
under the same conditions as supervised modeling methods. There is no need to
balance normal and abnormal transactions.
The second major shortcoming of supervised modeling methods is that they are
static. Once trained, they are fixed and only able to accurately recognize
behaviors presented to them during development. When presented with new
behaviors, the model will classify them according to the behaviors presented
during training. A misclassification is likely to result. Adaptive modeling
methods are fluid. If a new pattern of behavior is presented to them, they
spawn a new class based on that behavior and gain experience about it as more
observations are made.
Adaptive pattern recognition technologies primarily sort an amorphous
collection of behavioral patterns into a number of relatively homogenous
clusters. Once this has taken place, the results of an audit of a small sample
of instances in the cluster in all likelihood apply to the cluster as a whole.
This provides for early detection of abusive behavior. It likely that P-Card
misuse patterns will differ from normal patterns. An adaptive system will note
this as novel and trigger an alert. The transaction then can be investigated
immediately and a determination of legitimacy can be made. As the system gains
experience, auditing resources can be directed toward those usage behaviors
more likely to be abusive. This kind of audit targeting should ultimately
deter P-Card misuse based a relatively high probability that violators will be
caught.
Advantages of the adaptive approach
The primary advantage stemming from the use of AFFM is that the University
can take a proactive approach to P-Card monitoring. Supervised methods, such
as neural networks require a substantial number of proven cases reflecting
misuse for model development. It is possible to generate artificial misuse
instances, but it is unlikely that they will accurately reflect real cases.
New patterns of P-Card misuse are likely to occur. The supervised methods
produce static models and are likely to miss novel patterns of misuse. In
order for P-Card misuse to be detected, it must exhibit a pattern that is
somewhat distinctive. Adaptive pattern recognition will sort out the various
patterns of use and when coupled with directed audits, be able to detect
abusive behaviors as they happen.
Solution benefits
When developed further, the procurement card transaction monitoring
prototype should help organizations to:
- Migrate to procurement cards sooner because of the reduced risk
- Raise the transaction limit for procurement card purchases
- Detect and prevent misuse down to the transaction level
- Report usage trends and forecast system growth and resource needs
- Allow for more efficient use of investigating staff through targeted
audits
Adapted for newsletter by Eric A. King
|