文書型宣言から最後までコンテンツを取得する必要があります。これが私が持っているものです。
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 19 Sep 2012 07:52:41 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Keep-Alive: timeout=20
Status: 200 OK
X-Runtime: 736
ETag: "66644f063945c4d3f6e5471723306c2c"
Cache-Control: no-cache
Set-Cookie: vrid=e93aae30-e45c-012f-9786-001f29cc11ee; domain=.yellowpages.com; path=/; expires=Tue, 19-Sep-2017 07:52:40 GMT
Set-Cookie: parity_analytics=---+%0A%3Avisit_id%3A+u1ncewtmt44s23myff9s5t1tbcd5h%0A%3Avisit_start_time%3A+2012-09-19+07%3A52%3A40.711499+%2B00%3A00%0A%3Alast_page_load%3A+2012-09-19+07%3A52%3A40.711501+%2B00%3A00%0A; path=/; expires=Sat, 19-Sep-2037 07:52:40 GMT
Set-Cookie: _parity_session=BAh7CDoPc2Vzc2lvbl9pZCIlMDIxNjZiMDVkZmMxNWFmMzQ5OGVlNTk3Njg0MTM2NmY6EF9jc3JmX3Rva2VuSSIxbVhHMGNmM1U1K3E1OFo2NTQwVHltTFdZaHREa1lMMnRCVnE1eVFJNFpHQT0GOgZFRjoTZGV4X3Nlc3Npb25faWRJIillOTg5ZjNlMC1lNDVjLTAxMmYtOTc4Yy0wMDFmMjljYzExZWUGOwdG--08b9db1ba698882287f47a60e34c0c1e227d440a; path=/; HttpOnly
X-Rid: vendetta-ac8a22f2-3a5c-4da8-a1e8-25d4a550bf32
Expires: Wed, 19 Sep 2012 07:52:40 GMT
<!DOCTYPE html><head></head><body></body>...
これは正規表現で実現できますか?
どんな助けでも大歓迎です。