Fast mining of frequent tree structures by hashing and indexing
In: Information and Software Technology. ELSEVIER SCIENCE BV: Amsterdam. ISSN 0950-5849; e-ISSN 1873-6025, meer
| |
| Author keywords |
Tree mining, Hashing, Semistructured data, Association rules |
| Auteurs | | Top |
- Katsaros, D.
- Nanopoulos, A.
- Manolopoulos, Y.
|
|
|
| Abstract |
Hierarchical semistructured data arise frequently in the Web, or in biological information processing applications. Semistructured objects describing the same type of information have similar but not identical structure. Usually they share some common ‘schema’. Finding the common schema of a collection of semistructured objects is a very important task and due to the huge amount of such data encountered, data mining techniques have been employed. In this paper, we study the problem of discovering frequently occurring structures in semistructured objects using the notion of association rules. We identify that discovering the frequent structures in the early phases of the mining procedure is the dominant cost and we provide a fast algorithm addressing this issue. We present experimental results, which demonstrate the superiority of the proposed algorithm and also its efficiency in reducing dramatically the processing cost. |
|