0

コードは次のとおりです。

from pyquery import PyQuery

content = '''<td field="exceptions"><div style="white-space:normal;height:auto;" \
class="datagrid-cell datagrid-cell-c2-exceptions">Traceback (most recent call last):<br>\
  File "./crawler.py", line 381, in &lt;module&gt;<br>   \
   crawler.start()<br>  File "./crawler.py", line 153, in start<br> \
      raise RemoteTransportException(e)<br>RemoteTransportException: \
      This socket is already used by another greenlet: &lt;bound method Waiter.\
      switch of &lt;gevent.hub.Waiter object at 0x7f64d499d6e0&gt;&gt;<br></div></td>'''
pq = PyQuery(content)

for content in pq('td div'):
    print content.text # get Traceback (most recent call last):


for content in pq('td div'):
    for sub in content.getchildren():
        print sub.text


# Traceback (most recent call last):
# None
# None
# None
# None
# None
# None

td divあなたが得るように、要素のコンテンツを取得したいのですが、そうでなければなりません

Traceback (most recent call last):
File "./crawler.py", line 381, in <module>
crawler.start()
File "./crawler.py", line 153, in start
raise RemoteTransportException(e)
RemoteTransportException: This socket is already used by another greenlet: <bound method Waiter.switch of <gevent.hub.Waiter object at 0x7f64d499d6e0>>

しかし、私はちょうど得 Traceback (most recent call last):ました。では、サブラベルが含まれるすべてのテキストを見つけるにはどうすればよいtd divでしょうか?

4

1 に答える 1

1

代わりに BeautifulSoup を使用できます。

import bs4
soup = bs4.BeautifulSoup(content)
soup.find('td').find('div').text
u'Traceback (most recent call last):  File "./crawler.py", line 381, in <module>      crawler.start()  File "./crawler.py", line 153, in start       raise RemoteTransportException(e)RemoteTransportException:       This socket is already used by another greenlet: <bound method Waiter.      switch of <gevent.hub.Waiter object at 0x7f64d499d6e0>>'
于 2015-10-16T03:30:02.727 に答える