python - リモートサンドボックス実行用の組み込み python モジュールを実行する方法は?

Question

リモートマシンで実行するために、Python コードをサンドボックスモジュールに動的に追加しようとしています。インポートされたメソッドの処理方法で問題が発生しています。たとえば、次のように記述されたスクリプトがよく見られます。

 from test_module import g
 import other_module

 def f():
     g()
     other_module.z()

g と場合によっては z で f をピクルできることはわかっていますが、z の「other_module」スコープを保持するにはどうすればよいですか? f と g の両方をサンドボックスに入れると、f が呼び出されたときに z が適切に解決されません。z を正しく解決するために、ある種の組み込みモジュール (sandbox.other_module など) を使用することは可能ですか?

リモートコードをサンドボックスにロードする目的は、グローバル名前空間を汚染しないことです。たとえば、別のリモートメソッドが独自の依存関係グラフで呼び出された場合、別のリモートコードセットに干渉してはなりません。サンドボックスモジュールが使用されたり使用されなくなったりしても、Pythonが安定していると期待するのは現実的ですか? 私はこの投稿のためにこれを言います: How do I unload (reload) a Python module? この場合、さまざまなサンドボックスなどのモジュールを削除すると問題が発生する可能性があると感じています。

score 1 · Accepted Answer

他のモジュールは、サンドボックス（実行時に動的に作成されるモジュールを意味します）にインポートできます。

    sandbox.other_module = __import__('other_module')

また：

    exec 'import other_module' in sandbox.__dict__

他のモジュールまたは他のサンドボックスモジュールから「サンドボックス」モジュールを呼び出し、後で新しいコードをリロードする場合は、「サンドボックスインポートf」のような名前ではなく、モジュールのみをインポートして「サンドボックス」を呼び出す方が簡単です。「f」ではなく「f」。その後、簡単にリロードできます。（ただし、当然、reloadコマンドは役に立ちません）

クラス

>>> class A(object): pass
... 
>>> a = A()
>>> A.f = lambda self, x: 2 * x  # or a pickled function
>>> a.f(1)
2
>>> A.f = lambda self, x: 3 * x
>>> a.f(1)
3

メソッドのリロードは簡単なようです。古いクラスコードはあるインスタンスによって保持される可能性があるため、変更されたソースコードで定義されたクラスのリロードは複雑になる可能性があることを覚えています。インスタンスのコードは、最悪の場合、個別に更新できる/更新する必要があります。

    some_instance.__class__ = sandbox.SomeClass  # that means the same reloaded class

私は後者をwin32com自動化を介してアクセスされるPythonサービスで使用し、クラスコードのリロードはインスタンスデータを失うことなく成功しました

score 1 · Accepted Answer

現在のアプローチでは、「import x」と「from x import y」の両方の依存関係のバンドルを有効にしています。この現在の実装の 1 つの欠点は、使用される各モジュールでメソッドのコピーが作成されることです。これは、各使用がメモリ内の同じメソッドへの参照にすぎないコードの起源とは対照的です (ただし、ここでは矛盾する結果があります -コードの後のセクションを参照してください)。

/// analysis_script.py /// (簡潔にするために依存関係は除外されています)

import test_module
from third_level_module import z

def f():
    for i in range(1,5):
        test_module.g('blah string used by g')
        z()

/// driver.py ///

import modutil
import analysis_script

modutil.serialize_module_with_dependencies(analysis_script)

/// modutil.py ///

import sys
import modulefinder
import os
import inspect
import marshal

def dump_module(funcfile, name, module):
    functions_list = [o for o in inspect.getmembers(module) if inspect.isfunction(o[1])]
    print 'module name:' + name
    marshal.dump(name, funcfile)
    for func in functions_list:
       print func
       marshal.dump(func[1].func_code, funcfile)

def serialize_module_with_dependencies(module):

    python_path = os.environ['PYTHONPATH'].split(os.pathsep)
    module_path = os.path.dirname(module.__file__)

    #planning to search for modules only on this python path and under the current scripts working directory
    #standard libraries should be expected to be installed on the target platform
    search_dir = [python_path, module_path]

    mf = modulefinder.ModuleFinder(search_dir)

    #__file__ returns the pyc after first run
    #in this case we use replace to get the py file since we need that for our call to       mf.run_script
    src_file = module.__file__
    if '.pyc' in src_file:
        src_file = src_file.replace('.pyc', '.py')

    mf.run_script(src_file)

    funcfile = open("functions.pickle", "wb")

    dump_module(funcfile, 'sandbox', module)

    for name, mod in mf.modules.iteritems():
        #the sys module is included by default but has no file and we don't want it anyway, i.e. should
        #be on the remote systems path. __main__ we also don't want since it should be virtual empty and
        #just used to invoke this function.
        if not name == 'sys' and not name == '__main__':
            dump_module(funcfile, name, sys.modules[name])

    funcfile.close()

/// sandbox_reader.py ///

import marshal
import types
import imp

sandbox_module = imp.new_module('sandbox')

dynamic_modules = {}
current_module = ''
with open("functions.pickle", "rb") as funcfile:
    while True:
        try:
            code = marshal.load(funcfile)
        except EOFError:
             break

        if isinstance(code,types.StringType):
            print "module name:" + code
            if code == 'sandbox':
                current_module = "sandbox"
            else:
                current_module = imp.new_module(code)
                dynamic_modules[code] = current_module
                exec 'import '+code in sandbox_module.__dict__
        elif isinstance(code,types.CodeType):
            print "func"
            if current_module == "sandbox":
                func = types.FunctionType(code, sandbox_module.__dict__, code.co_name)
                setattr(sandbox_module, code.co_name, func)
            else:
                func = types.FunctionType(code, current_module.__dict__, code.co_name)
                setattr(current_module, code.co_name, func)
        else:
            raise Exception( "unknown type received")

#yaa! actually invoke the method
sandbox_module.f()
del sandbox_module

たとえば、シリアル化前の関数グラフは次のようになります。

 module name:sandbox
 ('f', <function f at 0x15e07d0>)
 ('z', <function z at 0x7f47d719ade8>)
 module name:test_module
 ('g', <function g at 0x15e0758>)
 ('z', <function z at 0x7f47d719ade8>)
 module name:third_level_module
 ('z', <function z at 0x7f47d719ade8>)

具体的には、関数 z を見ると、すべての参照が同じアドレス、つまり 0x7f47d719ade8 を指していることがわかります。

サンドボックスの再構築後のリモートプロセスでは、次のようになります。

 print sandbox_module.z 
 <function z at 0x1a071b8>
 print sandbox_module.third_level_module.z 
 <function z at 0x1a072a8>
 print sandbox_module.test_module.z 
 <function z at 0x1a072a8>

これは私の心を吹き飛ばします！ここのすべてのアドレスは再構築後に一意になると思っていたのですが、何らかの理由で sandbox_module.test_module.z と sandbox_module. third_level_module.z が同じアドレスを持っていますか?

python - リモートサンドボックス実行用の組み込み python モジュールを実行する方法は?

3 に答える 3

Related

Reference