python-2.7 - Selenium + Geckodriver のトラブルシューティング

Question

Python で Selenium を使用して Firefox gecko ドライバーを使用してフォーラムの投稿タイトルをスクレイピングしていますが、理解できない障害にぶつかりました。

~$ geckodriver --version
geckodriver 0.19.0

The source code of this program is available from
testing/geckodriver in https://hg.mozilla.org/mozilla-central.

This program is subject to the terms of the Mozilla Public License 2.0.
You can obtain a copy of the license at https://mozilla.org/MPL/2.0/.

フォーラムから数年分の過去の投稿タイトルをこすり取ろうとしていますが、私のコードはしばらくの間正常に動作します。私は座って約 20 ～ 30 分間動作するのを見ましたが、本来の動作を正確に実行します。しかし、その後スクリプトを開始して就寝し、翌朝目を覚ますと、約 22,000 件の投稿が処理されていることがわかりました。私が現在スクレイピングしているサイトには、1 ページあたり 25 の投稿があるため、クラッシュする前に ~880 の個別の URL を通過しました。

クラッシュすると、次のエラーがスローされます。

WebDriverException: Message: Tried to run command without establishing a connection

最初、私のコードは次のようになりました。

FirefoxProfile = webdriver.FirefoxProfile('/home/me/jupyter-notebooks/FirefoxProfile/')
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
driver.close()

私も試しました：

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
    driver.close()

と

for url in urls:
    driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
    driver.get(url)
    ### code to process page here ###
    driver.close()

3 つのシナリオすべてで同じエラーが発生しますが、それはかなり長い間正常に実行されていた後であり、なぜ失敗したのかを判断する方法がわかりません。

数百の URL が正常に処理された後にこのエラーが発生する理由を特定するにはどうすればよいですか? または、この多くのページを処理するために、Selenium/Firefox で従わないベストプラクティスがありますか?

score 0 · Accepted Answer

3 つのコードブロックはすべてほぼ完璧でしたが、次のような小さな欠陥がありました。

最初のコードブロックは次のとおりです。

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
driver.close()

コードブロックはかなり有望に見えますが、1 つの問題はありません。の最後のステップで、の代わりにをBest Practices呼び出したに違いありません。&の違いを見つけることができます。driver.quit()driver.close()webdriverSystem Memorydriver.close()driver.quit() here

2 番目のコードブロックは次のとおりです。

driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
    driver.get(url)
    ### code to process page here ###
    driver.close()

このブロックはエラーが発生しやすいです。実行がfor()ループに入り、 urlfinally で動作したら、Browser Session/Instance. そのため、実行が 2 回目の反復のループを開始すると、スクリプトdriver.get(url)が存在しないためエラーが発生しActive Browser Sessionます。

3 番目のコードブロックは次のとおりです。

for url in urls:
    driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
    driver.get(url)
    ### code to process page here ###
    driver.close()

コードブロックは、最初のコードブロックと同じ問題を除いて、かなり構成されているように見えます。driver.quit()最後のステップで、 whichの代わりにdriver.close()which を呼び出す必要がありwebdriver、System Memory. ぶら下がっているwebdriverインスタンスが雑用を作成し、ある時点でポートを占有し続けるとWebDriver、空きポートを見つけることができなかったり、新しいBrowser Session/Connection. したがって、WebDriverException としてエラーが表示されます: メッセージ: 接続を確立せずにコマンドを実行しようとしました

解決：

新しいBest Practicesインスタンスと新しい. driver.quit()_driver.close()WebDriverWeb Browser Session

python-2.7 - Selenium + Geckodriver のトラブルシューティング

1 に答える 1

解決 ：

Related

Reference

解決：