The multicore architecture has been the norm for all computing systems in recent years as it provides the CPU-level support of parallelism.However, existing algorithms for processing XML streams do not fully take advantage of the facility since they have not been devised to run in parallel.In this article, we propose several methods to parallelize the finite state automata (FSA)-based XML stream processing technique efficiently.We transform a large collection of XPath expressions into multiple FSA-based query indexes and then process XML streams in parallel by virtue of the index-level parallelism.Each core works only with its own query index so that no synchronization issue occurs while filtering XML streams with multiple path patterns given by users.We also present an in-memory MapReduce model that enables to process a large collection of twig pattern joins over XML streams simultaneously. Twig pattern joins in our approach are performed by multiple H/W threads in a shared and balanced way.Extensive experiments show that our algorithm outperforms conventional algorithms with an 8-core CPU by up to ten times for processing 10 million XPath expressions over XML streams.
Keyword
stream data processing; XML; query processing; parallel processing; multicore