ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Discovering Frequent Episodes and Learning Hidden Markov Models: A Formal Connection

Srivatsan, Laxman and Sastry, PS and Unnikrishnan, KP (2005) Discovering Frequent Episodes and Learning Hidden Markov Models: A Formal Connection. In: IEEE Transactions on Knowledge and Data Engineering, 17 (11). pp. 1505-17.


Download (710kB)


This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.

Item Type: Journal Article
Publication: IEEE Transactions on Knowledge and Data Engineering
Publisher: IEEE
Additional Information: �©1990 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Keywords: Temporal data mining;sequential data;frequent episodes;Hidden Markov Models;statistical significance
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 02 Jan 2006
Last Modified: 19 Sep 2010 04:22
URI: http://eprints.iisc.ac.in/id/eprint/4601

Actions (login required)

View Item View Item