Prerequisites
- Hadoop2.x or later cluster
- Hive 1.1(Tez)
- Impala 2.2
- Presto 0.125
- Drill 1.1
-
SparkSQL 1.6
-
测试环境3台物理机
- server1 : CPU total : 32 ; mem total : 126 g ; disk hdfs: 22.4 TiB
- server2 : CPU total : 32 ; mem total : 126 g ; disk hdfs: 7.7 TiB TiB
- server3 : CPU total : 8 ; mem total : 126 g ; disk hdfs: 500.7 GiB TiB
Compile and package the appropriate data generator.
1
|
[whoami@apache-server gitlib]$ git clone https://github.com/hortonworks/hive-testbench.git
[whoami@apache-server gitlib]$ cd hive-testbench/
[whoami@apache-server hive-testbench]$ ./tpcds-build.sh
[whoami@apache-server gitlib]$ tar -zcvf hive-testbench.tar.gz hive-testbench[
[whoami@apache-server gitlib]$ du -sh hive-testbench.tar.gz 133M hive-testbench.tar.gz
|
Generate and load the data.
1
|
[hadoop@server1 tpch]$ tar -zxvf hive-testbench.tar.gz
[hadoop@server1 tpch]$ cd hive-testbench
[hadoop@server1 hive-testbench]$ cat test.sh
FORMAT=rcfile ./tpcds-setup.sh 1000
[hadoop@server1 hive-testbench]$ nohup sh test.sh > test.log &
[hadoop@server1 hive-testbench]$ sh tpc-ds-test.sh
ls: `/tmp/tpcds-generate/1000': No such file or directory
Generating data at scale factor 1000.
WARNING: Use "yarn jar" to launch YARN applications.
15/11/23 01:28:39 INFO impl.TimelineClientImpl: Timeline service address: http://server2:8188/ws/v1/timeline/
15/11/23 01:28:40 INFO client.RMProxy: Connecting to ResourceManager at server2/192.168.111.201:8050
15/11/23 01:28:40 INFO input.FileInputFormat: Total input paths to process : 1
15/11/23 01:28:40 INFO mapreduce.JobSubmitter: number of splits:1000
15/11/23 01:28:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1448155433956_0001
15/11/23 01:28:41 INFO impl.YarnClientImpl: Submitted application application_1448155433956_0001
15/11/23 01:28:41 INFO mapreduce.Job: The url to track the job: http://server2:8088/proxy/application_1448155433956_0001/
15/11/23 01:28:41 INFO mapreduce.Job: Running job: job_1448155433956_0001
15/11/23 01:28:57 INFO mapreduce.Job: Job job_1448155433956_0001 running in uber mode : false
15/11/23 01:28:57 INFO mapreduce.Job: map 0% reduce 0%
15/11/23 01:31:53 INFO mapreduce.Job: map 1% reduce 0%
TPC-DS text data generation complete.
Loading text data into external tables.
Optimizing table store_sales (1/24).
Optimizing table store_returns (2/24).
Optimizing table web_sales (3/24).
Optimizing table web_returns (4/24).
Optimizing table catalog_sales (5/24).
Optimizing table catalog_returns (6/24).
Optimizing table inventory (7/24).
Optimizing table date_dim (8/24).
Optimizing table time_dim (9/24).
Optimizing table item (10/24).
Optimizing table customer (11/24).
Optimizing table customer_demographics (12/24).
Optimizing table household_demographics (13/24).
Optimizing table customer_address (14/24).
Optimizing table store (15/24).
Optimizing table promotion (16/24).
Optimizing table warehouse (17/24).
Optimizing table ship_mode (18/24).
Optimizing table reason (19/24).
Optimizing table income_band (20/24).
Optimizing table call_center (21/24).
Optimizing table web_page (22/24).
Optimizing table catalog_page (23/24).
Optimizing table web_site (24/24).
Data loaded into database tpcds_bin_partitioned_rcfile_1000.
|
Run queries.
1
|
Hive-tpcds testing:
修改runSuite.pl脚本,指定自己生成的hive库-->>
[hadoop@bigdata-test-server1 hive-testbench]$ cat runSuite.pl |grep tpcds_bin_partitioned_
'tpcds' => "tpcds_bin_partitioned_rcfile_$scale",
[hadoop@bigdata-test-server1 hive-testbench]$ sh ./runSuite_hive.sh
Impala-tpcds testing:
|
原创文章,转载请注明: 转载自whoami的博客
本博客的文章集合: http://www.itweet.cn/archives/
-
(0) 回复 (0)