0

バックエンドでページ ( https://www.google.com/flights/#search;f=JFK;t=SFO;d=2014-12-22;r=2014-12-30 )を解析したいneedleJs ( https://github.com/tomas/needle ) と Cheerio ( https://github.com/cheeriojs/cheerio ) です。

URLでgoogle.flightページをリクエストしようとしたところ、得られたのは

> <!DOCTYPE html> <html><head><meta http-equiv="content-type"
> content="text/html; charset=UTF-8"><meta name="flights::gwt:property"
> content="baseUrl=/flights/static/"><title>Flights - Google
> Search</title> <meta name="description" content="Choose your flight
> from a simple list of results, explore destinations on a map, and find
> travel dates with the lowest fare with Flight Search."><script
> language="javascript" type="text/javascript"> var __JS_ILT__ = new
> Date(); </script> <style
> type="text/css">#gbar,#guser{font-size:13px;padding-top:1px
> !important;}#gbar{height:22px}#guser{padding-bottom:7px
> !important;text-align:right}.gbh,.gbd{border-top:1px solid
> #c9d7f1;font-size:1px}.gbh{height:0;position:absolute;top:24px;width:100%}@media
> all{.gb1{height:22px;margin-right:.5em;vertical-align:top}#gbar{float:left}}a.gb1,a.gb4{text-decoration:underline
> !important}a.gb1,a.gb4{color:#00c !important}.gbi .gb4{color:#dd8e27
> !important}.gbf .gb4{color:#900 !important}</style><script
> language="javascript" type="text/javascript"> var __JS_INI__ =
> "[,[,[,[1275,1105,1266,1174,1144,1203,15,16,10]],[,\42USD\42,\42$\42,\42\\u00a4#,##0\42,\42\\u00a4#,##0.00\42,\42#,##0\42,1],[,2,1609.344,\42#,##0\42,\42#,##0.0\42],[,2,2.54,\42#,##0\42],\42google-travel\42,\42typeId:72275\42,\42ADS25WNJCodY_Q5E8phMiFpRv5Mjahv3VPknsbbucoHBj4CEkSDnkC31pFues4Idsta-tlsZHS3HRUa2nBAqRARRiSySmvbRZymx5Q\42,\42.com\42,\42en\42,\42US\42,1,,,,,,\42https://accounts.google.com/ServiceLogin?continue\\u003d_CONTINUE_\42],[,\42BOS\42,\42Boston\42,42.3644444,-71.005278,[\42BOS\42,\42PSM\42],[\42BOS\42],\42United
> States\42,\42MA\42,\42US\42,\42Massachusetts\42,,42.3584308,-71.0597732],[[,\42BOS\42,\42Boston
> Logan International\42,\42Boston\42,\42BOS\42,\42United
> States\42,42.3644444,-71.005278,\42MA\42,\42US\42]],[[,\42ONEWORLD\42,\42Oneworld\42],[,\42SKYTEAM\42,\42SkyTeam\42],[,\42STAR_ALLIANCE\42,\42Star
> Alliance\42]],,[[,\42_fli\42,\42Flights\42],[,\42_mor\42,\42More\42],[,\42_web\42,\42Web\42],[,\42#map\42,\42Maps\42],[,\42nws\42,\42News\42],[,\42shop\42,\42Shopping\42],[,\42isch\42,\42Images\42],[,\42vid\42,\42Videos\42],[,\42bks\42,\42Books\42],[,\42app\42,\42Apps\42]],1,,\42US
> Dollar\42,0]"; </script> <script>(function(){var
> gs=document.createElement('script');var wmm=window.matchMedia;var
> hires=!!wmm && !wmm('(-webkit-device-pixel-ratio:1.0)').matches &&
> !wmm('(-moz-device-pixel-ratio:1.0)').matches;gs.src=!hires?'static/D4C5482E0AEB161B751BBFC9C57F4C0D.cache.js':'static/7521F8CC51FAFDF1C1478664FF1CF4BD.cache.js';gs.type='text/javascript';gs.async=true;document.getElementsByTagName('head')[0].appendChild(gs);})();</script><script
> language="javascript" type="text/javascript"></script> <link
> rel="shortcut icon" href="/favicon.ico"/><meta name="google"
> value="notranslate"></head><body> <script language="javascript"
> type="text/javascript">(function() {  var script =
> document.createElement('script');  script.type = 'text/javascript'; 
> script.src = 'https://ssl.gstatic.com/feedback/api.js'; 
> document.body.appendChild(script);})();</script> <div id=gbar><nobr><a
> class=gb1 href="https://www.google.com/search">Search</a> <a class=gb1
> href="https://www.google.com/search?hl=en&tbm=isch&source=og">Images</a>
> <a class=gb1 href="https://maps.google.com/maps?hl=en">Maps</a> <a
> class=gb1 href="https://play.google.com/?hl=en">Play</a> <a class=gb1
> href="https://www.youtube.com/results">YouTube</a> <a class=gb1
> href="https://news.google.com/nwshp?hl=en">News</a> <a class=gb1
> href="https://mail.google.com/mail/">Gmail</a> <a class=gb1
> href="https://drive.google.com/">Drive</a> <a class=gb1
> style="text-decoration:none"
> href="http://www.google.com/intl/en/options/"><u>More</u>
> &raquo;</a></nobr></div><div class=gbh style=left:0></div><div
> class=gbh style=right:0></div><div id="_BrowserWarning_"
> style="border:1px solid
> #FFE475;background-color:#FEF7CB;padding:4px;text-align: center;color: #222;">Google Flight Search has not been optimized for your browser. For best results, please try Chrome, Firefox 3.5+, Internet Explorer
> 8+, Safari 4+.<a href="#" style="color: #222;"
> onclick="document.getElementById('_BrowserWarning_').style.display='none';
> return false;"> Close</a></div><div id="root"></div></body></html>

これは私が望むページではありません。パフォーマンス上の理由から、リクエスト時に Google が非常に小さなサイズの html をダンプしようとすることが理由だと思います。また、HTML には、DOM がロードされた後に実行される必要な JavaScript コードがいくつかあります。そして、これらのスクリプトが運賃を取得してレンダリングします。私が間違っている場合は修正してください。

では、Google フライト サイトの URL から正しいページを解析するにはどうすればよいでしょうか?? ありがとう。

4

1 に答える 1