Rate This Document
Findability
Accuracy
Completeness
Readability

Generating Data

Prerequisites

  • Hadoop and HBase have been deployed in the cluster.
  • The BulkLoad package has been downloaded and decompressed.

Procedure

Execute the put_data_byStage.py script to generate 1 TB data . To speed up data generation, store the data on multiple drives. This document uses 11 drives (data2 to data12 in the /srv/BigData/hadoop directory) as an example to generate 11 CSV files.

1
python put_data_byStage.py 1 1073741824 /srv/BigData/hadoop/data2/Bulkload_data2.csv /srv/BigData/hadoop/data3/Bulkload_data3.csv /srv/BigData/hadoop/data4/Bulkload_data4.csv /srv/BigData/hadoop/data5/Bulkload_data5.csv /srv/BigData/hadoop/data6/Bulkload_data6.csv /srv/BigData/hadoop/data7/Bulkload_data7.csv /srv/BigData/hadoop/data8/Bulkload_data8.csv /srv/BigData/hadoop/data9/Bulkload_data9.csv /srv/BigData/hadoop/data10/Bulkload_data10.csv /srv/BigData/hadoop/data11/Bulkload_data11.csv /srv/BigData/hadoop/data12/Bulkload_data12.csv