Prerequisites

  • Hadoop2.x or later cluster
  • Hive 1.1(Tez)
  • Impala 2.2
  • Presto 0.125
  • Drill 1.1
  • SparkSQL 1.6

  • 测试环境3台物理机

    • server1 : CPU total : 32 ; mem total : 126 g ; disk hdfs: 22.4 TiB
    • server2 : CPU total : 32 ; mem total : 126 g ; disk hdfs: 7.7 TiB TiB
    • server3 : CPU total : 8 ; mem total : 126 g ; disk hdfs: 500.7 GiB TiB

Compile and package the appropriate data generator.

1
[whoami@apache-server gitlib]$ git clone https://github.com/hortonworks/hive-testbench.git
[whoami@apache-server gitlib]$ cd hive-testbench/

[whoami@apache-server hive-testbench]$ ./tpcds-build.sh

[whoami@apache-server gitlib]$ tar -zcvf hive-testbench.tar.gz hive-testbench[

[whoami@apache-server gitlib]$ du -sh hive-testbench.tar.gz 133M    hive-testbench.tar.gz

Generate and load the data.

1
[hadoop@server1 tpch]$ tar -zxvf hive-testbench.tar.gz

[hadoop@server1 tpch]$ cd hive-testbench

[hadoop@server1 hive-testbench]$  cat test.sh 
FORMAT=rcfile ./tpcds-setup.sh 1000

[hadoop@server1 hive-testbench]$ nohup sh test.sh > test.log &

[hadoop@server1 hive-testbench]$ sh tpc-ds-test.sh
ls: `/tmp/tpcds-generate/1000': No such file or directory
Generating data at scale factor 1000.
WARNING: Use "yarn jar" to launch YARN applications.
15/11/23 01:28:39 INFO impl.TimelineClientImpl: Timeline service address: http://server2:8188/ws/v1/timeline/
15/11/23 01:28:40 INFO client.RMProxy: Connecting to ResourceManager at server2/192.168.111.201:8050
15/11/23 01:28:40 INFO input.FileInputFormat: Total input paths to process : 1
15/11/23 01:28:40 INFO mapreduce.JobSubmitter: number of splits:1000
15/11/23 01:28:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1448155433956_0001
15/11/23 01:28:41 INFO impl.YarnClientImpl: Submitted application application_1448155433956_0001
15/11/23 01:28:41 INFO mapreduce.Job: The url to track the job: http://server2:8088/proxy/application_1448155433956_0001/
15/11/23 01:28:41 INFO mapreduce.Job: Running job: job_1448155433956_0001
15/11/23 01:28:57 INFO mapreduce.Job: Job job_1448155433956_0001 running in uber mode : false
15/11/23 01:28:57 INFO mapreduce.Job:  map 0% reduce 0%
15/11/23 01:31:53 INFO mapreduce.Job:  map 1% reduce 0%
TPC-DS text data generation complete.
Loading text data into external tables.
Optimizing table store_sales (1/24).
Optimizing table store_returns (2/24).
Optimizing table web_sales (3/24).
Optimizing table web_returns (4/24).
Optimizing table catalog_sales (5/24).
Optimizing table catalog_returns (6/24).
Optimizing table inventory (7/24).
Optimizing table date_dim (8/24).
Optimizing table time_dim (9/24).
Optimizing table item (10/24).
Optimizing table customer (11/24).
Optimizing table customer_demographics (12/24).
Optimizing table household_demographics (13/24).
Optimizing table customer_address (14/24).
Optimizing table store (15/24).
Optimizing table promotion (16/24).
Optimizing table warehouse (17/24).
Optimizing table ship_mode (18/24).
Optimizing table reason (19/24).
Optimizing table income_band (20/24).
Optimizing table call_center (21/24).
Optimizing table web_page (22/24).
Optimizing table catalog_page (23/24).
Optimizing table web_site (24/24).
Data loaded into database tpcds_bin_partitioned_rcfile_1000.

Run queries.

1
Hive-tpcds testing:
    修改runSuite.pl脚本,指定自己生成的hive库-->>
    
    [hadoop@bigdata-test-server1 hive-testbench]$ cat runSuite.pl |grep tpcds_bin_partitioned_
        'tpcds' => "tpcds_bin_partitioned_rcfile_$scale",

    [hadoop@bigdata-test-server1 hive-testbench]$ sh ./runSuite_hive.sh 

Impala-tpcds testing:

原创文章,转载请注明: 转载自whoami的博客
本博客的文章集合: http://www.itweet.cn/archives/