4

テスト HDFS クラスターの空き容量をすべて使い果たしたため、HBase がクラッシュしました。スペースをクリーンアップした後、HBase を再起動できましたが、起動後に分散ログ分割ジョブが失敗し続けます。ジョブは次のようになります。

Splitting log file hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 into a temporary staging area.

Regionserver はしばらくの間、ファイルのリースを取得しようとしています:

2013-10-24 11:50:47,662 DEBUG org.apache.hadoop.hbase.regionserver.SplitLogWorker: tasks arrived or departed
2013-10-24 11:50:47,671 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker host-4,60020,1382614844870 acquired task /hbase/splitlog/hdfs%3A%2F%2F192.168.249.1%3A9000%2Fhdfs%2Fhbase%2F.logs%2Fhost-3%2C60020%2C1382113928374-splitting%2Fhost-3%252C60020%252C1382113928374.1382523937002
2013-10-24 11:50:47,672 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog: hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002, length=41274332
2013-10-24 11:50:47,672 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recovering lease on dfs file hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002
2013-10-24 11:50:47,673 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=0 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 1ms
2013-10-24 11:50:50,674 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=1 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 3002ms
2013-10-24 11:50:51,674 DEBUG org.apache.hadoop.hbase.util.FSHDFSUtils: isFileClosed not available
2013-10-24 11:51:51,680 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=2 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 64008ms

次に、マスターはジョブを中止します。

2013-10-24 11:55:48,685 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
2013-10-24 11:55:48,687 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 interrupted, resigning
java.io.InterruptedIOException
    at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:136)
    at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverFileLease(FSHDFSUtils.java:54)
    at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:780)
    at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:414)
    at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
    at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112)
    at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
    at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
    at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
    at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.InterruptedException: sleep interrupted
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:118)
    ... 9 more

問題は、このファイルが既に開いているため、このファイルのリースを取得できないリージョンサーバーにあるようですsudo -u hdfs hadoop fsck /hdfs/hbase/.logs/ -openforwrite

OPENFORWRITE: /hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 41274332 bytes, 1 block(s), OPENFORWRITE:
/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002:  Under replicated blk_1073337163743094520_3534698. Target Replicas is 3 but found 2 replica(s).

HBase をシャットダウンしようとしましたが、ファイルは OPENFORWRITE のままです。このフラグを削除するにはどうすればよいですか?

ps>Hadoop 1.0.1、HBase 0.94.12

4

0 に答える 0