python - このページのフォームで機械化が失敗しないようにする方法は?

Question

import mechanize

url = 'http://steamcommunity.com'

br=mechanize.Browser(factory=mechanize.RobustFactory())

br.open(url)
print br.request
print br.form
for each in br.forms():
    print each
    print

上記のコードの結果は次のとおりです。

Traceback (most recent call last):
  File "./mech_test.py", line 12, in <module>
    for each in br.forms():
  File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 426, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 559, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 228, in forms
mechanize._html.ParseError

私の具体的な目標は、ログインフォームを使用することですが、フォームがあることを機械化することさえできません。任意の フォームを選択する最も基本的な方法だと思うものを使用してもbr.select_form(nr=0)、同じトレースバックが発生します。違いがある場合、フォームの enctype は multipart/form-data です。

要約すると、2 つの部分からなる質問に要約されると思います: このページを機械化するにはどうすればよいですか? または、それが不可能な場合、Cookie を維持しながら別の方法を使用することはできますか?

編集: 以下で説明するように、これは「 https://steamcommunity.com 」にリダイレクトされます。

次のコードでわかるように、Mechanize は HTML を正常に取得できます。

url = 'https://steamcommunity.com'

hh = mechanize.HTTPSHandler()  # you might want HTTPSHandler, too
hh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh)
response = opener.open(url)
contents = response.readlines()

print contents

score 2 · Accepted Answer

このシークレットを使用してください。これでうまくいくと確信しています;)

br = mechanize.Browser(factory=mechanize.DefaultFactory(i_want_broken_xhtml_support=True))

score 2 · Accepted Answer

Web サイトが https (ssl) サーバーにリダイレクトされていることは言及しましたか?

さて、次のように新しい HTTPS ハンドラーを設定してみてください。

mechanize.HTTPSHandler()

python - このページのフォームで機械化が失敗しないようにする方法は?

2 に答える 2

Related

Reference