0

この URL ( https://edition.cnn.com/search/?q=%20news&size=10&from=5540&page=555 )

私の目的はすべてのニュースリストを取得することです

url s html code(contain newss url 内)

 <div class="cnn-search__result-thumbnail">         
 <a href="https://www.cnn.com/2018/03/27/asia/north-korea-kim-jong-un-china- visit/index.html">
   <img src="./Search CNN - Videos, Pictures, and News - 
      CNN.com_files/180328104116china-xi-kim-story-body.jpg">
   </a> 

URL のニュース リストを取得できません

https://edition.cnn.com/search/?q=%20news&size=10&from=5550&page=556 `s リンク

https://edition.cnn.com/search/?q=%20news&size=10&from=5560&page=557 `s リンクは同じ

私のソースコード

def freeze_support():
 '''
 Check whether this is a fake forked process in a frozen executable.
 If so then run code specified by commandline and exit.
 '''
 if sys.platform == 'win32' and getattr(sys, 'frozen', False):
     from multiprocessing.forking import freeze_support
     freeze_support()
if __name__ == '__main__':
  freeze_support()
  for x in range(1, 6000):
    url = "https://edition.cnn.com/search/?q=%20news&size=10&from=" + str(x * 10) + "&page=" + str(x + 1)
    cnn_paper = newspaper.build(url, memoize_articles=False)  # ~15 seconds
    print(len(cnn_paper.articles))
    list = []
    for article in cnn_paper.articles:
        if article.url not in url_list:
            list.append(article.url)
4

0 に答える 0