httpwebrequestを使用してc#でこの画面スクレイピングを使用してページのhtmlを取得しようとしています
通常のページをスクラップしようとすると正常に動作しますが、ajaxリクエストで読み込まれているページのhtmlを取得しようとすると、問題が発生します...これが、しようとしたときに送信される2つのリクエストです。そのページを取得します。
ログインをバイパスする通常のリクエスト
POST (http)://example/user/login?destination=/events/Sports HTTP/1.1
Host: example
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: (http)://example/user/login?destination=/events/Sports
Cookie: has_js=1; SESSee201d4242c83ea2671330cdceee4623=qdco8gukcm2pk9offdof1uv3a0
Content-Type: application/x-www-form-urlencoded
Content-Length: 121
name=Username&pass=Password&remember_me=1&form_build_id=form11cb87efa605eb9fb384eb9d2a2c686e&form_id=user_login&op=Go
データを取得するためのAJAXリクエスト
GET (http)://example/views/ajax?name=Sports&view_name=Events&view_display_id=page_1&view_args=Sports&view_path=events%2FSports&view_base_path=events&view_dom_id=1&pager_element=0 HTTP/1.1
Host: example
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: application/json, text/javascript, */*
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
X-Requested-With: XMLHttpRequest
Referer: (http)://example//events/Sports
Cookie: has_js=1; SESSee201d4242c83ea2671330cdceee4623=vd36esbpe8065snbfo39ubhmk3
If-Modified-Since: Wed, 23 May 2012 08:13:51 GMT
この方法でコードを書いてみましたが、うまくいきません。
string sid = String.Empty;
string uri = "http://example/user/login?destination=/events/Sports";
string postData = string.Format("name=UserName&pass=Password&remember_me=1&form_build_id=form-11cb87efa605eb9fb384eb9d2a2c686e&form_id=user_login&op=Go");
byte[] postBytes = Encoding.UTF8.GetBytes(postData);
//web request
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(uri);
req.UserAgent = "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0";
req.KeepAlive = true;
////set the cookie
Cookie cookie = new Cookie();
cookie.Name = "Sports";
cookie.Domain = "SESSee201d4242c83ea2671330cdceee4623";
req.CookieContainer = new CookieContainer();
req.CookieContainer.Add(cookie);
req.Headers.Add("Accept-Encoding", "gzip, deflate");
req.Headers.Add("Accept-Language", "en-us,en;q=0.5");
req.Method = "POST";
req.Host = "example";
req.Referer = "http://example/user/login?destination=/events/Sports";
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
req.ContentType = "application/x-www-form-urlencoded";
req.ContentLength = postBytes.Length;
//getting the request stream and posting data
StreamWriter requestwriter = new StreamWriter(req.GetRequestStream(), System.Text.Encoding.ASCII);
requestwriter.Write(postData);
requestwriter.Close();
string url = "http://example/views/ajax?name=Sports&view_name=Events&view_display_id=page_1&view_args=Sports&view_path=events%2FSports&view_base_path=events&view_dom_id=1&pager_element=0";
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.UserAgent = "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0";
request.KeepAlive = true;
request.Headers.Add("X-Requested-With", "XMLHttpRequest");
request.Headers.Add("Accept-Encoding", "gzip, deflate");
request.Headers.Add("Accept-Language", "en-us,en;q=0.5");
request.Host = "example";
request.Method = "GET";
request.Referer = "http://example//events/Sports";
request.Accept = "application/json, text/javascript, */*";
request.CookieContainer.Add(cookie);
request.ContentType = "text/javascript; charset=utf-8";
try
{
HttpWebResponse res = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(res.GetResponseStream());
sid = sr.ReadToEnd().Trim();
}
catch {}
代わりに文字列sidを取得{ "status": false, "display": "", "messages": "" }
すると、ステータスがtrueになり、いくつかの値が表示されます。