python - Django WSGI のマルチスレッド化と接続の問題

Question

mod_wsgi: http://blog.dscpl.com.au/2010/03/improved-wsgi-script-for-use-with.htmlを使用して Apache を Django と統合する Graham の指示に従いましたが、まだ接続の問題が発生しています。そして応答時間。これはランダムに発生し、Apache ログファイルにエラーがないため、何が起こっているのかを理解するのは困難です。

私の Apache は pre-fork で構築されており、次のように構成されています。

<IfModule prefork.c>
StartServers       8
MinSpareServers    5
MaxSpareServers   20
ServerLimit      256
MaxClients       256
MaxRequestsPerChild 0
</IfModule>

WSGI 関連の設定:

LogLevel info
LoadModule wsgi_module modules/mod_wsgi.so
WSGISocketPrefix run/wsgix
WSGIDaemonProcess somename user=apache group=apache threads=25 processes=1
WSGIScriptAlias / /wsgi-dir/script.wsgi
WSGIImportScript /wsgi-dir/script.wsgi process-group=somename application-group=%{GLOBAL}

<Directory /wsgi-dir/script.wsgi>
   Order deny,allow
   Allow from all
   WSGIProcessGroup  somename
</Directory>

リクエストプロセッサでは、以下を使用してアクティブなスレッドを監視します。

logger.info("Active threads: {0}".format(threading.active_count()))

構成に最大 25 のスレッドがあるにもかかわらず、アクティブなスレッド数が 4 を超えることはないことに気付きました。同時に、一部のクライアントは、要求の処理中に新しい接続を 1 分以上待つことができます。時間は約2秒。

リクエストがサーバーに到達すると、高速に処理されますが、Apache の制限により、クライアントが接続を待機するだけでタイムアウトになる場合があります (100 リクエストあたり約 1 件)。

Timeout 60

この種の動作は、100 分の 1 のリクエストが重要な役割を果たさない (ユーザーがページを再ロードするだけである) Web アプリの世界では簡単に気付かれない可能性があると思いますが、サービスの世界ではそれは本当に問題です。

私はこれを理解できません.すべてのスレッドが他のクライアントにサービスを提供するのに忙しいのなら、なぜ Django はさらに別のスレッドを生成しないのですか? それがスレッドに関するものではない場合、それは何である可能性がありますか? グラハムが書いていたアプリのリロードの問題?

ここに私のバージョンがあります：

python26-2.6.8-3.30.amzn1.x86_64
Django-1.4.3
mod_wsgi-3.2-1.6.amzn1.x86_64
Server version: Apache/2.2.23 (Unix)
Server loaded:  APR 1.4.6, APR-Util 1.4.1
Compiled using: APR 1.4.6, APR-Util 1.4.1
Architecture:   64-bit
Server MPM:     Prefork
  threaded:     no
    forked:     yes (variable process count)

================== グラハムの提案を実装した最初の更新 ========================== ======

グラハムら、

コメントと提案をありがとう。mod_wsgi のバージョンを確認したところ、3.2 でした (上記参照)。私のWSGI構成は次のようになります。

LogLevel info
LoadModule wsgi_module modules/mod_wsgi.so
WSGISocketPrefix run/wsgix
WSGIDaemonProcess somename user=apache group=apache threads=25
WSGIScriptAlias /  /wsgi-dir/script.wsgi process-group=somename application-group=%{GLOBAL}

<Directory /wsgi-dir>
   Order deny,allow
   Allow from all
</Directory>

50 の EC2 クライアントを起動し、それぞれが起動時にサービスにいくつかのメッセージを送信するだけで十分でした。1 つのクライアントで 49 秒の遅延が見られましたが、他のすべてのクライアントの平均応答は 2.2 秒で、最大7秒

アプリのログファイルを確認したところ、クライアントの観点からは 49 秒の遅延であるのに対し、遅延した要求では「要求の受信」と「応答の送信」の間の差分が 0.16 秒であることがわかりました。

2 つの可能性があります。

クライアントは約 49 秒間接続を確立できませんでした
接続は確立されましたが、サーバー (実際には Django/WSGI 内部) はリクエストを高速に読み取ることができませんでした。

クライアントで Python の「リクエスト」モジュールを使用してサービスに接続しているため、1 番か 2 番かを判断するのは困難です。ただし、遅延が 64 ～ 65 秒より少し長くなると、Apache の送受信タイムアウトが発生し、Apache のログファイルで確認できるため、これは #2 だと思います。

これをさらに明確にするために私がやろうとしていることは次のとおりです。

以下のような単純なコントローラーを作成します。

デフリスナー(リクエスト):

logger.info("Started, active threads: {0}".format(threading.active_count()))    
time.sleep(3)
logger.info("Finished, active threads: {0}".format(threading.active_count()))    
return HttpResponse('OK')

注: ロガーは時間も記録します。

シンプルな統計インターフェイスを作成します (すべてのクライアントの EC2 のログを分析したくありません)。

デフログ (リクエスト):

id = request.REQUEST['id']
time = request.REQUEST['time']
res = request.REQUEST['res']

if (id and time):
    logger.info("STAT Instance: {0}, Processing time: {1}, Response: {2}".format(id,time,res))

return HttpResponse('OK')

クライアントは次のように動作します。

「リスナー」URL にいくつかのリクエストを送信し、クライアントでの処理時間を計算します
処理時間を EC2 インスタンス ID とともに「ログ」URL に送信します

この単純なアプローチで問題を再現できれば、再現可能になり、Django チームがそこから問題を解決できることを願っています。

他の提案も同様に高く評価されます。回答してくれたすべての人に感謝します。

================== 推奨テストに関する 2 回目の更新 ========================== =====

私は提案されたリスナーを実装し、問題を再現できました。他のすべての人が同じことを実行できることを願っています。大量の EC2 クライアントを起動するには、AWS アカウントが必要です。通常は 50 で十分です。しかし、遅延を確認するために 100 に行く必要があることもありました。

興味深いことに、このテストでは、アクティブなスレッドの数が 1 から 8 に徐々に増加しました。これは、サーバーでの平均的な処理時間が増加したためか、動作しますが、遅延を防ぐには十分ではありません。

クライアントのスクリプトを EC2 のユーザーデータに配置すると、以下のようになりました。これらすべてのクライアントで自動 sclaing グループを作成する方法について簡単な説明が必要な場合は、お知らせください。

#!/bin/bash

do_send() {

        d1=`date +%s`
        res=`python ~ec2-user/client/fetch.py ${URL_ROOT}/init/`
        res=`echo $res | tr '\n' ' ' | tr ' ' +`
        d2=`date +%s`
        delta=`expr $d2 - $d1`
        echo $ami $ins $res $delta >>$LOG
        curl -s  "${URL_ROOT}/stat/?id=$ami&time=$delta&res=$ins:$res" >/dev/null 2>&1
}

URL_ROOT=<SERVICE-ROOT_URL>
LOG=~ec2-user/log.txt

ins=`curl -s http://169.254.169.254/latest/meta-data/instance-id 2>/dev/null`
ami=`curl -s http://169.254.169.254/latest/meta-data/ami-id 2>/dev/null`
echo "Instance=[$ins]" >$LOG

# First request
do_send

# Second request
do_send

fetch.py クライアントは次のようになります。

@author: ogryb
'''
import requests
import datetime
import socket

from optparse import OptionParser
usage = "usage: %prog [options] init_url\n   init_url - http://<host>/init/ server's address"
parser = OptionParser(usage=usage)
parser.add_option("-i", "--id", dest="id",
                  help="instance ID", metavar="STRING")
parser.add_option("-p", "--phost", dest="phost",
                  help="public hostname", metavar="STRING")
parser.add_option("-l", "--lhost", dest="lhost",
                  help="local hostname", metavar="STRING")
parser.add_option("-t", "--type", dest="type",
                  help="instance type", metavar="STRING")
parser.add_option("-q", "--quiet",
                  action="store_true", dest="quiet", default=False,
                  help="Quiet mode")
(opt, args) = parser.parse_args()
ip = socket.gethostbyname(socket.gethostname())
if (not opt.quiet):
    print ("=== Getting metadata:\t{0} {1}".format(datetime.datetime.utcnow(), ip))
if not opt.id:
    r = requests.get(url='http://169.254.169.254/latest/meta-data/instance-id')
    opt.id = r.text
if not opt.phost:
    r = requests.get(url='http://169.254.169.254/latest/meta-data/public-hostname')
    opt.phost = r.text
if not opt.lhost:
    r = requests.get(url='http://169.254.169.254/latest/meta-data/local-hostname')
    opt.lhost = r.text
if not opt.type:
    r = requests.get(url='http://169.254.169.254/latest/meta-data/instance-type')
    opt.type = r.text
body = "id={0}&phost={1}&lhost={2}&type={3}".format(opt.id, opt.phost, opt.lhost, opt.type)
if (not opt.quiet):
    print ("=== Start sending:\t{0} {1} {2}".format(datetime.datetime.utcnow(), ip, opt.id))
r = requests.post(url=args[0], data=body, verify=False)
if (not opt.quiet):
    print ("=== End sending:\t{0} {1} {2}".format(datetime.datetime.utcnow(), ip, opt.id))
print r.text
if (not opt.quiet):
    print "Request Body={0} url={1}".format(body,args[0])
    print "Response: {0}\n{1}".format(r.status_code, r.text)

============ 03/19/13 - 23:45 エラーログからの追加情報 ===

Apache のログレベルをデバッグに変更したところ、Apache の error_log で次のことがわかりました。それが遅延の理由である可能性があるかどうか、またこれについて何ができるか教えてください。「KeyError」は無害だとどこかで読んだことがありますが、わかりません。

クライアントは 6:37:28 で 41 秒間遅延しました。エラーログの最も近いイベントは @ 06:37:15 に発生しました。

Wed Mar 20 06:37:15 2013] [info] mod_wsgi (pid=27005): Initializing Python.
[Wed Mar 20 06:37:15 2013] [info] mod_wsgi (pid=27005): Attach interpreter '

完全なエラーログは次のとおりです。

Wed Mar 20 06:29:45 2013] [info] Server built: Oct 21 2012 20:35:32
[Wed Mar 20 06:29:45 2013] [debug] prefork.c(1023): AcceptMutex: sysvsem (default: sysvsem)
[Wed Mar 20 06:29:45 2013] [info] mod_wsgi (pid=26891): Attach interpreter ''.
[Wed Mar 20 06:29:45 2013] [info] mod_wsgi (pid=26892): Attach interpreter ''.
[Wed Mar 20 06:29:45 2013] [info] mod_wsgi (pid=26893): Attach interpreter ''.
[Wed Mar 20 06:29:45 2013] [info] mod_wsgi (pid=26895): Attach interpreter ''.
[Wed Mar 20 06:29:45 2013] [info] mod_wsgi (pid=26894): Attach interpreter ''.
[Wed Mar 20 06:37:15 2013] [debug] proxy_util.c(1820): proxy: grabbed scoreboard slot 0 in child 27005 for worker proxy:reverse
[Wed Mar 20 06:37:15 2013] [debug] proxy_util.c(1839): proxy: worker proxy:reverse already initialized
[Wed Mar 20 06:37:15 2013] [debug] proxy_util.c(1936): proxy: initialized single connection worker 0 in child 27005 for (*)
[Wed Mar 20 06:37:15 2013] [info] mod_wsgi (pid=27005): Initializing Python.
[Wed Mar 20 06:37:15 2013] [info] mod_wsgi (pid=27005): Attach interpreter ''.
[Wed Mar 20 06:38:10 2013] [debug] proxy_util.c(1820): proxy: grabbed scoreboard slot 0 in child 27006 for worker proxy:reverse
[Wed Mar 20 06:38:10 2013] [debug] proxy_util.c(1839): proxy: worker proxy:reverse already initialized
[Wed Mar 20 06:38:10 2013] [debug] proxy_util.c(1936): proxy: initialized single connection worker 0 in child 27006 for (*)
[Wed Mar 20 06:38:10 2013] [info] mod_wsgi (pid=27006): Initializing Python.
[Wed Mar 20 06:38:10 2013] [info] mod_wsgi (pid=27006): Attach interpreter ''.
[Wed Mar 20 06:38:11 2013] [info] mod_wsgi (pid=26874): Destroying interpreters.
[Wed Mar 20 06:38:11 2013] [info] mod_wsgi (pid=26874): Cleanup interpreter ''.
[Wed Mar 20 06:38:11 2013] [info] mod_wsgi (pid=26874): Terminating Python.
[Wed Mar 20 06:38:11 2013] [error] Exception KeyError: KeyError(140627014572000,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored
[Wed Mar 20 06:38:11 2013] [info] mod_wsgi (pid=26874): Python has shutdown.
[Wed Mar 20 06:38:44 2013] [debug] proxy_util.c(1820): proxy: grabbed scoreboard slot 0 in child 27007 for worker proxy:reverse
[Wed Mar 20 06:38:44 2013] [debug] proxy_util.c(1839): proxy: worker proxy:reverse already initialized
[Wed Mar 20 06:38:44 2013] [debug] proxy_util.c(1936): proxy: initialized single connection worker 0 in child 27007 for (*)
[Wed Mar 20 06:38:44 2013] [info] mod_wsgi (pid=27007): Initializing Python.
[Wed Mar 20 06:38:44 2013] [info] mod_wsgi (pid=27007): Attach interpreter ''.
[Wed Mar 20 06:38:45 2013] [info] mod_wsgi (pid=26880): Destroying interpreters.
[Wed Mar 20 06:38:45 2013] [info] mod_wsgi (pid=26880): Cleanup interpreter ''.
[Wed Mar 20 06:38:45 2013] [info] mod_wsgi (pid=26880): Terminating Python.
[Wed Mar 20 06:38:45 2013] [error] Exception KeyError: KeyError(140627014572000,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored
[Wed Mar 20 06:38:45 2013] [info] mod_wsgi (pid=26880): Python has shutdown.

score 0 · Accepted Answer

構成:

<Directory /wsgi-dir/script.wsgi>
   Order deny,allow
   Allow from all
   WSGIProcessGroup  somename
</Directory>

次のようにする必要があります。

<Directory /wsgi-dir>
   Order deny,allow
   Allow from all
   WSGIProcessGroup  somename
   WSGIApplicationGroup %{GLOBAL}
</Directory>

始めに。

あなたが持っていたものを備えたWSGIアプリケーションは、リクエストを処理するときにデーモンモードではなく組み込みモードで実行されます。これは、Directory ディレクティブへのパスが間違っていたためであり、本来のディレクトリではありませんでした。以前のようにファイルパスを使用することはできません。次を使用して、実行されている場所を検証する必要があります。

http://code.google.com/p/modwsgi/wiki/CheckingYourInstallation#Embedded_Or_Daemon_Mode

同時に、WSGIImportScript によって使用されることのないデーモンモードプロセスに冗長コピーをロードすることになります。

ディレクティブパスを修正した後でも、プリロードは、リクエストが処理される場所とは異なるサブインタープリターにありました。それらが同じサブインタープリター (アプリケーショングループ) にあることを確認するには、上記の WSGIApplicationGroup が必要です。

mod_wsgi 3.0 以降を使用している場合は、WSGIImportScript ディレクティブを削除し、代わりに次を使用することをお勧めします。

WSGIScriptAlias / /wsgi-dir/script.wsgi process-group=somename application-group=%{GLOBAL}

<Directory /wsgi-dir>
   Order deny,allow
   Allow from all
</Directory>

WSGIScriptAlias で process-group と application-group を同時に指定すると、WSGIProcessGroup と WSGIApplicationGroup を個別に指定する必要がなくなります。両方を指定すると、スクリプトをプリロードするという副作用もあるため、WSGIImportScript が行っていたことが置き換えられ、WSGIImportScript を削除できる理由がわかりました。

リクエストに時間がかかる理由のパフォーマンスの問題については、New Relic をインストールして試してみて、問題がどこにあるかを掘り下げることができます。

一般的な Apache 構成については、私が行った PyCon US トークからの私のスライドも読むことをお勧めします。

http://www.slideshare.net/GrahamDumpleton/pycon-us-2013-making-apache-suck-less-for-hosting-python-web-applications

ビデオは週末までにアップする必要があります。

昨年の私の PyCon トークも興味深いものになるはずです。

http://lanyrd.com/2012/pycon/spcdg/

python - Django WSGI のマルチスレッド化と接続の問題

2 に答える 2

Related

Reference