hadoop - hadoop: 疑似分散環境で複数のレデューサーを使用していますか?

Question

私はhadoopの初心者です。疑似分散モードで Hadoop セットアップを正常に構成しました。オプション-D mapred.reduce.tasks=2（hadoop-streamingを使用）で複数のリデューサーが必要です。ただし、レデューサーはまだ 1 つしかありません。

Google によると、mapred.LocalJobRunner はレデューサーの数を 1 に制限していると確信しています。

私の Hadoop 構成ファイル:

[admin@localhost string-count-hadoop]$ cat ~/hadoop-1.1.2/conf/core-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/admin/hadoop-data/tmp</value>
    </property>
</configuration>



[admin@localhost string-count-hadoop]$ cat ~/hadoop-1.1.2/conf/mapred-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
    </property>
</configuration>



[admin@localhost string-count-hadoop]$ cat ~/hadoop-1.1.2/conf/hdfs-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.name.dir</name>
        <value>/home/admin/hadoop-data/name</value>
    </property>

    <property>
        <name>dfs.data.dir</name>
        <value>/home/admin/hadoop-data/data</value>
    </property>

    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property> 
</configuration>

私が仕事を始める方法：

[admin@localhost string-count-hadoop]$ cat hadoop-startjob.sh 
#!/bin/sh

~/hadoop-1.1.2/bin/hadoop jar ~/hadoop-1.1.2/contrib/streaming/hadoop-streaming-1.1.2.jar \
        -D mapred.job.name=string-count \
        -D mapred.reduce.tasks=2 \
        -mapper  mapper  \
        -file    mapper  \
        -reducer reducer \
        -file    reducer \
        -input   $1      \
        -output  $2

[admin@localhost string-count-hadoop]$ ./hadoop-startjob.sh /z/programming/testdata/items_sequence /z/output
packageJobJar: [mapper, reducer] [] /tmp/streamjob837249979139287589.jar tmpDir=null
13/07/17 20:21:10 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/07/17 20:21:10 WARN snappy.LoadSnappy: Snappy native library not loaded
13/07/17 20:21:10 INFO mapred.FileInputFormat: Total input paths to process : 1
13/07/17 20:21:11 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir.
...
...

score 1 · Accepted Answer

core-site.xml のプロパティを変更してみてください

<property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
 </property>

に、

<property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000/</value>
 </property>

9000の後に / を追加して、すべてのデーモンを再起動します。

hadoop - hadoop: 疑似分散環境で複数のレデューサーを使用していますか?

1 に答える 1

Related

Reference