python - Google 検索を解析する

Question

Google検索に入力した内容を解析できるスクリプトを作成したいと思います（できればPythonで、他の言語は問題ありません）。「cats」を検索するとします。次に、文字列「cats」を解析して、たとえば、コンピューターの .txt ファイルに追加できるようにしたいとします。

したがって、私の検索が「猫」、「犬」、「牛」である場合、次のような .txt ファイルを作成できます。

猫犬牛

検索バーを解析して入力された文字列を返すことができる API を知っている人はいますか? または、文字列にキャストできるオブジェクトはありますか?

編集:クロム拡張機能などを作成したくありませんが、これを実行できるターミナルで実行できる python (または bash または ruby) スクリプトを作成することをお勧めします。

ありがとう

score 1 · Accepted Answer

2つの一般的なソリューションを提供できます1）Googleには検索エンジンAPIがあります https://developers.google.com/products/#google-search（1日あたり 100リクエストに制限があります）

カットされたコード:

def gapi_parser(args):
    query = args.text; count = args.max_sites
    import config
    api_key = config.api_key 
    cx = config.cx 

    #Note: This API returns up to the first 100 results only. 
    #https://developers.google.com/custom-search/v1/using_rest?hl=ru-RU#WorkingResults

    results = []; domains = set(); errors = []; start = 1
    while True:
        req = 'https://www.googleapis.com/customsearch/v1?key={key}&cx={cx}&q={q}&alt=json&start={start}'.format(key=api_key, cx=cx, q=query, start=start)
        if start>=100: #google API does not can do more
            break
        con = urllib2.urlopen(req) 
        if con.getcode()==200:
            data = con.read()
            j = json.loads(data)
            start = int(j['queries']['nextPage'][0]['startIndex'])
            for item in j['items']:
                match = re.search('^(https?://)?\w(\w|\.|-)+', item['link'])
                if match: 
                    domain = match.group(0)
                    if domain not in results:
                        results.append(domain)
                    domains.update([domain])
                else:
                    errors.append('Can`t recognize domain: %s' % item['link'])
            if len(domains) >= args.max_sites:
                 break 

    print
    for error in errors:
        print error
return (results, domains)

2) 実際のブラウザーインスタンスでページを解析する selenuim ベースのスクリプトを作成しましたが、このソリューションにはいくつかの制限があります。たとえば、ロボットのように検索を実行する場合のキャプチャです。

score 0 · Accepted Answer

考慮すべきいくつかのオプションと、その長所と短所を次に示します。

URL:
- 利点: Chris が述べたように、URL にアクセスして手動で変更することはオプションです。このためのスクリプトを書くのは簡単なはずです。必要に応じて、私の perl スクリプトを送信できます。
- 短所：できるかどうかわかりません。以前にそのための perl スクリプトを作成しましたが、Google は、Google インターフェースの外でそのサービスを使用できないと述べているため、機能しませんでした。同じ問題に直面する可能性があります
Google の検索 API:
- 利点: 人気のある選択肢。良いドキュメント。それは安全な選択であるべきです
- 不利な点: Google の制限。
他の検索エンジンを調べる:
- 利点: Google と同じ制限がない可能性があります。より自由に遊べる検索エンジンが見つかるかもしれません。
- 不利な点: Google ほど良い結果が得られない

python - Google 検索を解析する

3 に答える 3

Related

Reference