Data mining techniques for risk assessment (1998-2000)

Funding body: The Royal Mail

The project deals with the use of data mining techniques on the Royal Mail risk database. A sample database was supplied on which encouraging results were found. The data mining techniques employed here each attempt to find patterns and trends in a database with greater accuracy than standard statistical techniques. These are designed to find relationships between seemingly unrelated sets of data. The data in the risk database consists of several attributes of a Post Office and the number of incidents it has suffered in the past three years. The task of the data mining tool is to find what (if any) reasons there are behind an office being more prone to incident than another. Each technique had different errors on the Royal Mail database.

Typically, the error was around 20-25% depending on the technique used. The inclusion of the postcode information in the South West database, late on in the project has yielded errors of less than 1%. Each of the techniques has a very different output and the collation of this information is one of the difficult points of the project. Generally, these techniques output a set of rules in IF?AND?.THEN format. These are understandable, but if there are large numbers of them it can be tedious applying these rules to new cases. The use of the results of these data mining techniques requires some automation, especially with rulesets of 50 rules or more. To enable this, an easy-to-use program was developed to allow users to test a new Post Office against the results from the data mining techniques. It is hoped that this flexible piece of software will allow the inclusion of the results of any of the techniques and to test any office regardless of how much information about it is known.

Back to Artificial Intelligence research and applications