0

Googleのajax検索結果(例:http ://ajax.googleapis.com/ajax/services/search/web?v = 1.0&q = filetype:pdf )を取得してすべてのファイルをダウンロードするスクリプトを作成しようとしています。今のところ、応答をPython辞書に変換しようとして立ち往生しているので、簡単に移動できます。

import subprocess
import ast

subprocess.call("curl -G -d 'q=filetype:pdf&v=1.0' http://ajax.googleapis.com/ajax/services/search/web  > output",stderr=subprocess.STDOUT,shell=True)
file = open('output','r')
contents = file.read()
output_dict = ast.literal_eval(contents)
print output_dict

実行すると、次のようになります。

$ python script.py 
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload  Upload   Total   Spent    Left  Speed
100  2643    0  2643    0     0  15926      0 --:--:-- --:--:-- --:--:-- 26696
Traceback (most recent call last):
  File "script.py", line 7, in <module>
    output_dict = ast.literal_eval(contents)
  File "/usr/lib/python2.7/ast.py", line 80, in literal_eval
    return _convert(node_or_string)
  File "/usr/lib/python2.7/ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "/usr/lib/python2.7/ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "/usr/lib/python2.7/ast.py", line 79, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

ファイルは次のようになります。

{"responseData": {"results":[{"GsearchResultClass":"GwebSearch",
                                    "unescapedUrl":"http://www.foundationdb.com/AlphaLicenseAgreement.pdf",
                                             "url":"http://www.foundationdb.com/AlphaLicenseAgreement.pdf",
                                      "visibleUrl":"www.foundationdb.com",
                                        "cacheUrl":"http://www.google.com/search?q\u003dcache:W7zhFlfbm6UJ:www.foundationdb.com",
                                           "title":"FoundationDB Alpha Software Evaluation License Agreement",
                               "titleNoFormatting":"FoundationDB Alpha Software Evaluation License Agreement",
                                         "content":"FOUNDATIONDB. ALPHA SOFTWARE EVALUATION LICENSE AGREEMENT.   PLEASE READ CAREFULLY THE TERMS OF THIS ALPHA SOFTWARE \u003cb\u003e...\u003c/b\u003e",
                                      "fileFormat":"PDF/Adobe Acrobat"
                             },
                             {"GsearchResultClass":"GwebSearch",
                                    "unescapedUrl":"https://subreg.cz/registration_agreement.pdf",
                                             "url":"https://subreg.cz/registration_agreement.pdf",
                                      "visibleUrl":"subreg.cz",
                                        "cacheUrl":"http://www.google.com/search?q\u003dcache:ODtRmQsiHD0J:subreg.cz",
                                           "title":"Registration Agreement",
                               "titleNoFormatting":"Registration Agreement",
                                         "content":"Registration Agreement. In order to complete the registration process you must   read and agree to be bound by all terms and conditions herein. TERMS AND \u003cb\u003e...\u003c/b\u003e",
                                      "fileFormat":"PDF/Adobe Acrobat"
                             },
                             {"GsearchResultClass":"GwebSearch",
                                    "unescapedUrl":"http://supportdetails.com/export.pdf",
                                             "url":"http://supportdetails.com/export.pdf",
                                      "visibleUrl":"supportdetails.com",
                                        "cacheUrl":"http://www.google.com/search?q\u003dcache:h0LvxrTTKzIJ:supportdetails.com",
                                           "title":"Export PDF - Support Details",
                               "titleNoFormatting":"Export PDF - Support Details",
                                         "content":"",
                                      "fileFormat":"PDF/Adobe Acrobat"
                             },
                             {"GsearchResultClass":"GwebSearch",
                                    "unescapedUrl":"http://www.fws.gov/le/pdf/travelpetbird.pdf",
                                             "url":"http://www.fws.gov/le/pdf/travelpetbird.pdf",
                                      "visibleUrl":"www.fws.gov",
                                        "cacheUrl":"",
                                           "title":"pet bird",
                               "titleNoFormatting":"pet bird",
                                         "content":"U.S. Fish \u0026amp; Wildlife Service. Traveling Abroad with. Your Pet Bird. The Wild Bird   Conservation Act (Act), a significant step in international conservation efforts to \u003cb\u003e...\u003c/b\u003e",
                                      "fileFormat":"PDF/Adobe Acrobat"
                             }],
                  "cursor":{"resultCount":"72,800,000",
                                  "pages":[{"start":"0","label":1},
                                           {"start":"4","label":2},
                                           {"start":"8","label":3},
                                           {"start":"12","label":4},
                                           {"start":"16","label":5},
                                           {"start":"20","label":6},
                                           {"start":"24","label":7},
                                           {"start":"28","label":8}],
                            "estimatedResultCount":"72800000",
                            "currentPageIndex":0,
                             "moreResultsUrl":"http://www.google.com/search?oe\u003dutf8\u0026ie\u003dutf8\u0026source\u003duds\u0026start\u003d0\u0026hl\u003den\u0026q\u003dfiletype:pdf","searchResultTime":"0.04"
                           }
                  },
  "responseDetails": null,
  "responseStatus": 200
 }

フォーマットするのに永遠にかかった神

4

1 に答える 1

1

GoogleはJSONを返すため、現在使用しているastjsonモジュールの代わりにモジュールを使用してください。

file = open('output','r')
output_dict = json.load(file)

また、curlに依存する代わりに、urllib2モジュールを調べてURL応答をロードすることもできます。

于 2012-09-03T22:43:59.940 に答える