WebHDFS を使用して、まったく同じ情報を取得できます:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
これは、Python で使用する方が使いやすいかもしれません。
例:
ファイル/ディレクトリのステータス HTTP GET リクエストを送信します。
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILESTATUS"
The client receives a response with a FileStatus JSON object:
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
{
"FileStatus":
{
"accessTime" : 0,
"blockSize" : 0,
"group" : "supergroup",
"length" : 0, //in bytes, zero for directories
"modificationTime": 1320173277227,
"owner" : "webuser",
"pathSuffix" : "",
"permission" : "777",
"replication" : 0,
"type" : "DIRECTORY" //enum {FILE, DIRECTORY}
}
}
ディレクトリを一覧表示する HTTP GET 要求を送信します。
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS"
The client receives a response with a FileStatuses JSON object:
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 427
{
"FileStatuses":
{
"FileStatus":
[
{
"accessTime" : 1320171722771,
"blockSize" : 33554432,
"group" : "supergroup",
"length" : 24930,
"modificationTime": 1320171722771,
"owner" : "webuser",
"pathSuffix" : "a.patch",
"permission" : "644",
"replication" : 1,
"type" : "FILE"
},
{
"accessTime" : 0,
"blockSize" : 0,
"group" : "supergroup",
"length" : 0,
"modificationTime": 1320895981256,
"owner" : "szetszwo",
"pathSuffix" : "bar",
"permission" : "711",
"replication" : 0,
"type" : "DIRECTORY"
},
...
]
}
}