Boosting association rule mining in large datasets via Gibbs sampling.

Proceedings of the National Academy of Sciences of the United States of America

PubMedID: 27091963

Qian G, Rao CR, Sun X, Wu Y. Boosting association rule mining in large datasets via Gibbs sampling. Proc Natl Acad Sci USA. 2016;.
Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling-induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm.