Inferring Interpretable Pattern-Based Classification Models from Data Using Frequent Pattern Discovery Algorithms and Maximum Entropy Probability Distributions

One of the central challenges in business and medical applications is the classification problem. Classification is the development of a mapping from a set of measurable characteristics, or attributes of an object to a class label for the object. Examples are using credit card histories to identify fraudulent use, or mapping the results of a patient’s medical tests to their diagnosis. The field of machine learning has developed tools to infer such classification mappings from a collection of objects, called the training data for which both the attributes and the class label are known.

In our research, we have developed a novel combination of machine learning and statistical techniques that generate classification mappings that are straightforward to interpret and capable of classification accuracies rivaling less interpretable methods, such as neural networks and support vector machines. The classification algorithm is a combination of frequent pattern discovery tools with maximum entropy probability models. Frequent pattern discovery tools allow the discovery and enumeration of frequent multi-variate patterns in data, while maximum entropy models are used to build Bayesian probability models from these multi-variate patterns. It is our thesis that this combination of approaches allows users to generate high accuracy classification mappings which are interpretable by the user. It is also our thesis that the inference of such mappings is practicable, requiring run times comparable to other machine learning methods.

In our experiments, we found that combining frequent patterns and maximum entropy models to generate classifier mappings produces accurate classifiers on a wide variety of classification problems. The relatively simple parameterized form of the classifier mappings allowed us to immediately interpret the impact and importance of each selected pattern on the class predictions. Finally, with careful implementation, the time required to infer these mappings is comparable with other classification algorithms.