html - ダウンロードされたページソースは、レンダリングされたページソースとは異なります

Question

このウェブサイトからデータを取得する予定です

http://www.gpw.pl/akcje_i_pda_notowania_ciagle

（ポーランドの主要な株式市場のサイトです）

サイトのソースをファイルにダウンロードする C++ で書かれたプログラムがあります。しかし、問題は、私が興味を持っているものが含まれていないことです (もちろん、株式の価値)。

このサイトのソースをオプションの「要素を表示」( RMB -> 要素を表示) と比較すると、「要素を表示」に株式の値が含まれていることがわかります。

<td>75.6</td>
<tr class="even red">

などなど...

ダウンロードしたサイトのソースには、この情報がありません。

では、質問が 2 つあります

1) サイトのソースが [要素を表示] オプションと異なるのはなぜですか?

2) 正しいコードをダウンロードできるようにプログラムを転送するにはどうすればよいですか?

   #include <string>  
    #include <iostream>  
    #include "curl/curl.h"
    #include <cstdlib>

    using namespace std;  

    // Write any errors in here  
    static char errorBuffer[CURL_ERROR_SIZE];  

    // Write all expected data in here  
    static string buffer;  

    // This is the writer call back function used by curl  
    static int writer(char *data, size_t size, size_t nmemb,  
                      string *buffer)  
    {  
      // What we will return  
      int result = 0;  

      // Is there anything in the buffer?  
      if (buffer != NULL)  
      {  
        // Append the data to the buffer  
        buffer->append(data, size * nmemb);  

        // How much did we write?  
        result = size * nmemb;  
      }  

      return result;  
    }  

    // You know what this does..  
    void usage()  
    {  
      cout <<"curltest: \n" << endl;  
      cout << "Usage:  curltest url\n" << endl;  
    }   

    /* 
     * The old favorite 
     */  
    int main(int argc, char* argv[])  
    {  
      if (argc > 1)  
      {  
        string url(argv[1]);  

        cout<<"Retrieving "<< url << endl;  

        // Our curl objects  
        CURL *curl;  
        CURLcode result;  

        // Create our curl handle  
        curl = curl_easy_init();  

        if (curl)  
        {  
          // Now set up all of the curl options  
          curl_easy_setopt(curl, CURLOPT_ERRORBUFFER, errorBuffer);  
          curl_easy_setopt(curl, CURLOPT_URL, argv[1]);  
          curl_easy_setopt(curl, CURLOPT_HEADER, 0);  
          curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);  
          curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writer);  
          curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer);  

          // Attempt to retrieve the remote page  
          result = curl_easy_perform(curl);  

          // Always cleanup  
          curl_easy_cleanup(curl);  

          // Did we succeed?  
          if (result == CURLE_OK)  
          {  
            cout << buffer << "\n";  
            exit(0);  
          }  
          else  
          {  
            cout << "Error: [" << result << "] - " << errorBuffer;  
            exit(-1);  
          }  
        }  
      }  
      return 0;
    }

score 0 · Accepted Answer

ページを html ファイル (ファイル/名前を付けて保存) として保存すると、ブラウザに表示されたすべてのデータを含むファイルが取得され、ページソースには見つかりませんでした (私は Chrome を使用しています)。

したがって、コードに 1 つのステップを追加することをお勧めします。

コマンドラインまたは何らかの API をサポートする JavaScript 対応ブラウザからページをダウンロードします (curl で実行できない場合は、Linux の wget または lynx/links/links2/elinks が役立つ可能性があります)。
データを解析します。

score 0 · Accepted Answer

値は JavaScript を使用して入力されるためです。

「ソースを表示」はページの生のソースを表示し、「要素を表示」はドキュメントツリーの現在の状態を表示します。

これを修正する簡単な方法はありません。JavaScript を実行するか、C++ に移植する必要があるためです (そして、取引所で人気がなくなる可能性があります)。

html - ダウンロードされたページ ソースは、レンダリングされたページ ソースとは異なります

2 に答える 2

Related

Reference

html - ダウンロードされたページソースは、レンダリングされたページソースとは異なります