こんにちは、検索のために、指定された 10 ページのすべてのリンクを抽出しようとしていますssh
。
最初のページから最初の 10 個のリンクを抽出し、JavaScript を読み込んだ後、最初のページを 1 回クリックして次の 10 個のリンクを抽出することができますが、3 番目のページに移動しようとすると、エラー。
これは私のコードです:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import requests
import re
links = []
driver = webdriver.Firefox()
driver.get("http://pastebin.com/search?q=ssh")
# wait for the search results to be loaded
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".gsc-result-info")))
for link in driver.find_elements_by_xpath("//div[@class='gs-title']/a[@class='gs-title']"):
if link.get_attribute("href") != None:
print link.get_attribute("href")
# get all search results links
for page in driver.find_elements_by_xpath("//div[@class='gsc-cursor-page']"):
driver.implicitly_wait(10) # seconds
page.click()
for link in driver.find_elements_by_xpath("//div[@class='gs-title']/a[@class='gs-title']"):
if link.get_attribute("href") != None:
print link.get_attribute("href")
そして、これは私が取得できるものと、私が取るエラーです:
python pastebinselenium.py
http://pastebin.com/u/ssh
http://pastebin.com/gsQWBEZP
http://pastebin.com/gfA12TWk
http://pastebin.com/udWMWdPR
http://pastebin.com/J55238CB
http://pastebin.com/DN2aHvRr
http://pastebin.com/f0rh66kU
http://pastebin.com/3zvY3DSm
http://pastebin.com/fqHVJGEm
http://pastebin.com/3aB7h0fm
http://pastebin.com/3uBAxXu3
http://pastebin.com/cxjRqeSh
http://pastebin.com/5nJPNr3Q
http://pastebin.com/qV0rPNfP
http://pastebin.com/zubt2Yc7
http://pastebin.com/jFrjWYpE
http://pastebin.com/DU7yqjQ1
http://pastebin.com/AFtWHmtE
http://pastebin.com/UVP5behK
http://pastebin.com/hP7XTyv1
Traceback (most recent call last):
File "pastebinselenium.py", line 21, in <module>
page.click()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 74, in click
self._execute(Command.CLICK_ELEMENT)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 457, in _execute
return self._parent.execute(command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 233, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: Element not found in the cache - perhaps the page has changed since it was looked up
Stacktrace:
at fxdriver.cache.getElementAt (resource://fxdriver/modules/web-element-cache.js:9454)
at Utils.getElementAt (file:///tmp/tmpzhZSEC/extensions/fxdriver@googlecode.com/components/command-processor.js:9039)
at fxdriver.preconditions.visible (file:///tmp/tmpzhZSEC/extensions/fxdriver@googlecode.com/components/command-processor.js:10090)
at DelayedCommand.prototype.checkPreconditions_ (file:///tmp/tmpzhZSEC/extensions/fxdriver@googlecode.com/components/command-processor.js:12644)
at DelayedCommand.prototype.executeInternal_/h (file:///tmp/tmpzhZSEC/extensions/fxdriver@googlecode.com/components/command-processor.js:12661)
at fxdriver.Timer.prototype.setTimeout/<.notify (file:///tmp/tmpzhZSEC/extensions/fxdriver@googlecode.com/components/command-processor.js:625)
10 ページ (合計 100) から 10 個のリンクを取得したいのですが、20 個しか抽出できません =(
私もこれを試しました:
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".gsc-cursor-box")))
の直前click
ですが、成功しません。