現在、Google CSE を使用してプロジェクトのスクレイピングを行おうとしています。これは私の初めてのスクレイピングです。数四半期前に学校で python のクラスを受講しましたが、スクレイピングは最後のトピックの 1 つになる予定でしたが、実際にそれを行うことはありませんでした。ともかく...
これが私がやろうとしていることです:
Google CSE を使用して、「バード ウォッチング」と「バード フィーディング」の Google ニュースの結果を取得します。クエリの結果から、記事のタイトル、記事のリンク、および公開日を取得したいと考えています。次に、それをすべてcsvに書きたいと思います。
これが私がこれまでに得たものです( https://gist.github.com/nikhilkumarsingh/5bce182ed57ae73f6cbde52fe846991bからの多くの助けを借りて、他の誰かがCSEの紹介を探しているなら、これは素晴らしいです!!):
for ループを使用してタイトルとリンクを返し、クエリ結果を取得します。今のところ、結果が得られていることを確認するために印刷しています。後でcsvに書きます。私のクエリ結果オブジェクトは「result」という名前のディクショナリで、次のようになります (投稿しようとしている膨大な量のコードについては申し訳ありませんが、私の問題はネストに関係しているため、これが最も明確な説明方法であると考えました):
{'kind': 'customsearch#search', 'url': {'type': 'application/json',
'template': 'https://www.googleapis.com/customsearch/v1?q=
{searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe=
{safe?}&cx={cx?}&sort={sort?}&filter={filter?}&gl={gl?}&cr=
{cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq=
{hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter=
{siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms=
{excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&relatedSite=
{relatedSite?}&dateRestrict={dateRestrict?}&lowRange=
{lowRange?}&highRange={highRange?}&searchType={searchType}&fileType=
{fileType?}&rights={rights?}&imgSize={imgSize?}&imgType=
{imgType?}&imgColorType={imgColorType?}&imgDominantColor=
{imgDominantColor?}&alt=json'}, 'queries': {'request': [{'title': 'Google
Custom Search - bird watching', 'totalResults': '104000', 'searchTerms':
'bird watching', 'count': 10, 'startIndex': 1, 'inputEncoding': 'utf8',
'outputEncoding': 'utf8', 'safe': 'off', 'cx':
'017465438656188383295:ul7lxhkonwq'}], 'nextPage': [{'title': 'Google
Custom Search - bird watching', 'totalResults': '104000', 'searchTerms':
'bird watching', 'count': 10, 'startIndex': 11, 'inputEncoding': 'utf8',
'outputEncoding': 'utf8', 'safe': 'off', 'cx':
'017465438656188383295:ul7lxhkonwq'}]}, 'context': {'title': 'google
news'}, 'searchInformation': {'searchTime': 0.491713,
'formattedSearchTime': '0.49', 'totalResults': '104000', 'formattedTotalResults': '104,000'}, 'items': [{'kind':
'customsearch#result', 'title': 'Amy Cooper: White woman who called police
on a black man in ...', 'htmlTitle': 'Amy Cooper: White woman who called
police on a black man in ...', 'link':
'https://news.google.com/articles/CAIiEDCQPCzyU2erjQLyLr_nLqUqGQgEKhAIACoH
CAowocv1CjCSptoCMPrTpgU?hl=en-US&gl=US&ceid=US%3Aen', 'displayLink':
'news.google.com', 'snippet': 'May 26, 2020 ... White woman who called
police on a black man bird-watching in Central Park \nhas been fired. By
Amir Vera and Laura Ly, CNN. Updated 4:21\xa0...', 'htmlSnippet': 'May 26,
2020 <b>...</b> White woman who called police on a black man <b>bird</b>-
<b>watching</b> in Central Park <br>\nhas been fired. By Amir Vera and
Laura Ly, CNN. Updated 4:21 ...', 'formattedUrl':
'https://news.google.com/.../CAIiEDCQPCzyU2erjQLyLr_
nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?...', 'htmlFormattedUrl':
'https://news.google.com/.../CAIiEDCQPCzyU2erjQLyLr_
nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?...', 'pagemap': {'thumbnail':
[{'src': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-
park-video-dog-video-african-american-trnd-screengrab-super-tease.jpg'}],
'metatags': [{'template-top': 'us,news,art-vid-vls-col,col-top-news',
'og:image': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-
central-park-video-dog-video-african-american-trnd-screengrab-super-
tease.jpg', 'twitter:card': 'summary_large_image', 'og:image:width':
'1100', 'theme-color': '#000000', 'og:site_name': 'CNN', 'section': 'us',
'vr:canonical': 'https://www.cnn.com/2020/05/26/us/central-park-video-dog-
video-african-american-trnd/index.html', 'article:content-tier': 'free',
'og:description': 'The white woman who called police on a black man in
Central Park during an encounter involving her unleashed dog has been
fired from her job, her employer said Tuesday.', 'twitter:image':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg', 'og:pubdate': '2020-05-26T06:19:40Z', 'lastmod': '2020-05-26T20:21:18Z', 'pubdate': '2020-05-26T06:19:40Z', 'twitter:title': 'White woman who called police on a black man bird-watching in Central Park has been fired', 'og:type': 'article', 'thumbnail': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'author': 'Amir Vera and Laura Ly, CNN', 'og:title': 'White woman who
called police on a black man bird-watching in Central Park has been
fired', 'og:image:height': '619', 'fb:pages': '5550296508,18793419640',
'referrer': 'unsafe-url', 'fb:app_id': '80401312489', 'viewport':
'width=device-width, initial-scale=1.0, minimum-scale=1.0',
'twitter:description': 'The white woman who called police on a black man
in Central Park during an encounter involving her unleashed dog has been
fired from her job, her employer said Tuesday.', 'og:url':
'https://www.cnn.com/2020/05/26/us/central-park-video-dog-video-african-
american-trnd/index.html', 'article:opinion': 'false'}], 'cse_image':
[{'src': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-
park-video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'width': '299', 'type': '1', 'height': '168'}], 'newsarticle': [{'image':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'keywords': 'us, Amy Cooper: White woman who called police on a black man
in Central Park has been fired - CNN', 'author': 'Amir Vera and Laura Ly,
CNN', 'ispartof': 'news', 'description': 'The white woman who called
police on a black man in Central Park during an encounter involving her
unleashed dog has been fired from her job, her employer said Tuesday.',
'datecreated': '2020-05-26T06:19:40Z', 'url':
'https://www.cnn.com/2020/05/26/us/central-park-video-dog-video-african-
american-trnd/index.html', 'articlebody': '(CNN)The white woman who called
police on a black man in Central Park during an encounter involving her
unleashed dog has been fired from her job, her employer said
Tuesday."Following our internal...', 'datemodified': '2020-05-
26T20:21:18Z', 'articlesection': 'us', 'alternativeheadline': 'White woman who called police on a black man bird-watching in Central Park has been
fired', 'headline': 'Amy Cooper: White woman who called police on a black
man in Central Park has been fired - CNN', 'datepublished': '2020-05-
26T06:19:40Z', 'thumbnailurl':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg'}]}}
リンクとタイトルを引き出すための私のコードは次のようになります。
for item in result['items']:
print(item['title'], item['link'])
これが私が立ち往生しているものです:
記事が公開された日付のキーである「pubdate」は、多くの辞書やリスト内にネストされています。ループでそれを引き出すのに非常に苦労しています。ネスティングは、それがループであろうとデータ構造であろうと、おそらくコーディングにおける私の最大の弱点です。
私が興味を持っているすべての情報を含むキーは、辞書のリストである値を持つ「items」です。
'items': [{'kind': 'customsearch#result', 'title': 'Amy Cooper: White
woman who called police on a black man in ...', 'htmlTitle': 'Amy Cooper:
White woman who called police on a black man in ...', 'link':
'https://news.google.com/articles/CAIiEDCQPCzyU2erjQLyLr_nLqUqGQgEKhAIACoH
CAowocv1CjCSptoCMPrTpgU?hl=en-US&gl=US&ceid=US%3Aen', 'displayLink':
'news.google.com', 'snippet': 'May 26, 2020 ... White woman who called
police on a black man bird-watching in Central Park \nhas been fired. By
Amir Vera and Laura Ly, CNN. Updated 4:21\xa0...', 'htmlSnippet': 'May 26,
2020 <b>...</b> White woman who called police on a black man <b>bird</b>-
<b>watching</b> in Central Park <br>\nhas been fired. By Amir Vera and
Laura Ly, CNN. Updated 4:21 ...', 'formattedUrl':
'https://news.google.com/.../CAIiEDCQPCzyU2erjQLyLr_
nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?...', 'htmlFormattedUrl':
'https://news.google.com/.../CAIiEDCQPCzyU2erjQLyLr_
nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?...', 'pagemap': {'thumbnail':
[{'src': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-
park-video-dog-video-african-american-trnd-screengrab-super-tease.jpg'}],
'metatags': [{'template-top': 'us,news,art-vid-vls-col,col-top-news',
'og:image': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-
central-park-video-dog-video-african-american-trnd-screengrab-super-
tease.jpg', 'twitter:card': 'summary_large_image', 'og:image:width':
'1100', 'theme-color': '#000000', 'og:site_name': 'CNN', 'section': 'us',
'vr:canonical': 'https://www.cnn.com/2020/05/26/us/central-park-video-dog-
video-african-american-trnd/index.html', 'article:content-tier': 'free',
'og:description': 'The white woman who called police on a black man in
Central Park during an encounter involving her unleashed dog has been
fired from her job, her employer said Tuesday.', 'twitter:image':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'og:pubdate': '2020-05-26T06:19:40Z', 'lastmod': '2020-05-26T20:21:18Z',
'pubdate': '2020-05-26T06:19:40Z', 'twitter:title': 'White woman who
called police on a black man bird-watching in Central Park has been
fired', 'og:type': 'article', 'thumbnail':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'author': 'Amir Vera and Laura Ly, CNN', 'og:title': 'White woman who
called police on a black man bird-watching in Central Park has been
fired', 'og:image:height': '619', 'fb:pages': '5550296508,18793419640',
'referrer': 'unsafe-url', 'fb:app_id': '80401312489', 'viewport':
'width=device-width, initial-scale=1.0, minimum-scale=1.0',
'twitter:description': 'The white woman who called police on a black man
in Central Park during an encounter involving her unleashed dog has been
fired from her job, her employer said Tuesday.', 'og:url':
'https://www.cnn.com/2020/05/26/us/central-park-video-dog-video-african-
american-trnd/index.html', 'article:opinion': 'false'}]
リスト aka = result['items'][0] のこの最初の辞書内に、キー「pagemap」があります。その値は、値が辞書のリストであるキー「metatags」を持つ別の辞書です。このリストの最初のインデックスには、'pubdate' を探している値のキーを持つ辞書が含まれています (この値を簡単に見つけることができるように、コード ブロックにいくつかのスペースを入れています)。
'metatags': [{'template-top': 'us,news,art-vid-vls-col,col-top-news',
'og:image': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-
central-park-video-dog-video-african-american-trnd-screengrab-super-
tease.jpg', 'twitter:card': 'summary_large_image', 'og:image:width':
'1100', 'theme-color': '#000000', 'og:site_name': 'CNN', 'section': 'us',
'vr:canonical': 'https://www.cnn.com/2020/05/26/us/central-park-video-
dog-video-african-american-trnd/index.html', 'article:content-tier':
'free', 'og:description': 'The white woman who called police on a black
man in Central Park during an encounter involving her unleashed dog has
been fired from her job, her employer said Tuesday.', 'twitter:image':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'og:pubdate': '2020-05-26T06:19:40Z', 'lastmod': '2020-05-26T20:21:18Z',
'pubdate': '2020-05-26T06:19:40Z', 'twitter:title': 'White woman who
called police on a black man bird-watching in Central Park has been
fired', 'og:type': 'article', 'thumbnail':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'author': 'Amir Vera and Laura Ly, CNN', 'og:title': 'White woman who
called police on a black man bird-watching in Central Park has been
fired', 'og:image:height': '619', 'fb:pages': '5550296508,18793419640',
'referrer': 'unsafe-url', 'fb:app_id': '80401312489', 'viewport':
'width=device-width, initial-scale=1.0, minimum-scale=1.0',
'twitter:description': 'The white woman who called police on a black man
in Central Park during an encounter involving her unleashed dog has been
fired from her job, her employer said Tuesday.', 'og:url':
'https://www.cnn.com/2020/05/26/us/central-park-video-dog-video-african-
american-trnd/index.html', 'article:opinion': 'false'}]
うまくいけば、このかなり危険な巣の構造を私と一緒にたどることができます...
理想的には、私が探しているのは、私に返してくれるループです:
Amy Cooper: White woman who called police on a black man in ... https://news.google.com/articles/CAIiEDCQPCzyU2erjQLyLr_nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?hl=en-US&gl=US&ceid=US%3Aen
2020-05-26T06:19:40Z
クエリ結果の次のストーリーについても同様です。
私が得た最も近いものは次のとおりです。
for item in result['items']:
print(item['title'], item['link'])
for date in result['items'][0]['pagemap']['metatags']:
print (date['pubdate'])
これは近いですが、ループが次のストーリーに移動しても、最初のストーリーの日付のみが返されます。
Amy Cooper: White woman who called police on a black man in ... https://news.google.com/articles/CAIiEDCQPCzyU2erjQLyLr_nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?hl=en-US&gl=US&ceid=US%3Aen
2020-05-26T06:19:40Z
Christian Cooper shouldn't need a Harvard degree to survive birding ... https://news.google.com/articles/CAIiEOCKmxd9S5s5cwM5xs0AivoqGAgEKg8IACoHCAowjtSUCjC30XQwzqe5AQ?hl=en-US&gl=US&ceid=US%3Aen
2020-05-26T06:19:40Z
People called police on this black birdwatcher so many times that he ... https://news.google.com/articles/CAIiEOkNNX95htD_KKDYihI5JcoqGAgEKg8IACoHCAowjtSUCjC30XQwzqe5AQ?hl=en-US&gl=US&ceid=US%3Aen
2020-05-26T06:19:40Z
A black man bird-watching in Central Park asked a white woman to ... https://news.google.com/articles/CAIiENZfU5G5gfmzo2CysHOaY0sqFQgEKg0IACoGCAowuLUIMNFnMLnhAg?hl=en-US&gl=US&ceid=US%3Aen
2020-05-26T06:19:40Z
What's a Tough Call in Bird Watching? Identifying a Gull - WSJ https://news.google.com/articles/CAIiEMKd4gQ1olRNd5T2Ndlpiu8qGAgEKg8IACoHCAow1tzJATDnyxUwuK20AQ
2020-05-26T06:19:40Z
Any advice, tips, help, or words of nested for loop wisdom would be greatly appreciated!!!!