We have designed and implemented a new data processing framework called ‘‘Many-task computing On HAdoop’’
(MOHA) which aims to effectively support fine-grained many-task applications that can show another type of dataintensive
workloads in the YARN-based Hadoop 2.0 platform. MOHA is developed as one of Hadoop YARN applications
so that it can transparently co-host existing many-task computing (MTC) applications with other data processing workflows
such as MapReduce in a single Hadoop cluster. In this paper, we investigate main characteristics of two well-known opensource
message broker middleware systems (Apache ActiveMQ and Kafka) and their implications on a many-task management
scheme in our MOHA framework. Through our extensive experiments with a real MTC application, we
demonstrate and discuss trade-offs between parallelism and load balancing of data access patterns in message broker
middleware systems for Many-Task Computing on Hadoop.