Introduction to HiBench
Data generation methods based on the HiBench tool are provided to facilitate the generation of large-scale datasets for algorithm performance tests.
HiBench is a big data benchmark suite that helps evaluate the performance metrics of different big data platforms including throughput, computation speed, and system resource utilization. It contains a group of Hadoop, Spark, and Storm workloads and provides the following functions and algorithms: Sort, WordCount, TeraSort, Repartition, Sleep, SQL, PageRank, Nutch indexing, Bayes, K-means, NWeight, and enhanced DFSIO. In addition, it contains multi-streaming workloads, such as Spark streaming, Flink, Storm, and Gearpump.
Parent topic: Generating a Dataset Using HiBench