python - Python の gdata モジュールですべての YouTube コメントを取得するには?

Question

一度に 1 ページずつ移動するのではなく、特定のビデオからすべてのコメントを取得しようとしています。

from gdata import youtube as yt
from gdata.youtube import service as yts

client = yts.YouTubeService()
client.ClientLogin(username, pwd) #the pwd might need to be application specific fyi

comments = client.GetYouTubeVideoComments(video_id='the_id')
a_comment = comments.entry[0]

上記のコードでは、おそらく最新のコメントである単一のコメントを取得できますが、すべてのコメントを一度に取得する方法を探しています。これはPythonのgdataモジュールで可能ですか?

コメント用の Youtube API ドキュメント、コメントフィードドキュメント、および Python APIドキュメント

score 7 · Accepted Answer

以下は、 Python YouTube APIを使用して要求したことを実現します。

from gdata.youtube import service

USERNAME = 'username@gmail.com'
PASSWORD = 'a_very_long_password'
VIDEO_ID = 'wf_IIbT8HGk'

def comments_generator(client, video_id):
    comment_feed = client.GetYouTubeVideoCommentFeed(video_id=video_id)
    while comment_feed is not None:
        for comment in comment_feed.entry:
             yield comment
        next_link = comment_feed.GetNextLink()
        if next_link is None:
             comment_feed = None
        else:
             comment_feed = client.GetYouTubeVideoCommentFeed(next_link.href)

client = service.YouTubeService()
client.ClientLogin(USERNAME, PASSWORD)

for comment in comments_generator(client, VIDEO_ID):
    author_name = comment.author[0].name.text
    text = comment.content.text
    print("{}: {}".format(author_name, text))

残念ながら、API は取得できるエントリの数を1000に制限しています。これは、手作りのGetYouTubeVideoCommentFeedURL パラメーターを使用して微調整したバージョンを試したときに発生したエラーです。

gdata.service.RequestError: {'status': 400, 'body': 'You cannot request beyond item 1000.', 'reason': 'Bad Request'}

API の他のフィードでエントリを取得する場合も、同じ原則が適用されることに注意してください。

GetYouTubeVideoCommentFeedURL パラメータを手作りする場合、その形式は次のとおりです。

'https://gdata.youtube.com/feeds/api/videos/{video_id}/comments?start-index={sta‌rt_index}&max-results={max_results}'

次の制限が適用されます:start-index <= 1000およびmax-results <= 50.

score 2 · Accepted Answer

今のところ唯一の解決策ですが、API を使用しておらず、数千のコメントがあると遅くなります。

import bs4, re, urllib2
#grab the page source for vide
data = urllib2.urlopen(r'http://www.youtube.com/all_comments?v=video_id') #example XhFtHW4YB7M
#pull out comments
soup = bs4.BeautifulSoup(data)
cmnts = soup.findAll(attrs={'class': 'comment yt-tile-default'})
#do something with them, ie count them
print len(cmnts)

「クラス」は組み込みの python 名であるため、通常のパラメーターで dict を使用しているため、ここに示すように正規表現またはラムダを介して「startwith」の通常の検索を実行できないことに注意してください。また、BeautifulSoup のせいでかなり遅くなりますが、何らかの理由で一致するタグが見つからないためetree、慣れる必要があります。minidom後prettyfying()でさえbs4

python - Python の gdata モジュールですべての YouTube コメントを取得するには?

2 に答える 2

Related

Reference