1

PythonでBeautifulSoup4を使用してxmlパーサーを作成しようとしています。何らかの理由で、ドキュメントが正しく解析されていません。私のxmlドキュメントを以下に示します。

<module id="BrainParser_1" name="Brain Parser" package="CCB" version="1" location="pipeline://cranium.loni.ucla.edu//usr/local/loniWorkflows/BrainParser/brainparser.sh" sourceCode="" icon="/9j/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAx&#xA;NDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIy&#xA;MjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAAUCABIAFYEASIAAhEBAxEBBCIA/8QAHwAAAQUB&#xA;AQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEG&#xA;E1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVW&#xA;V1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLD&#xA;xMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAA&#xA;AAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKR&#xA;obHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hp&#xA;anN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU&#xA;1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADgQBAAIRAxEEAAA/APn+iip7OzuNQuktbWIyzPna&#xA;g74GaEr7AfP9FFTWlrcX13Fa2sLzTysEjjQZLE9AKgrQt9C1e7RHt9KvplcblaO3dgw9Rgc16B4a&#xA;+HEMRabWV8+YAGOBThM57nv9OnXrXqn9qXs0cYitoItoAYfxAVr7GS1krDguZ6ENTQWlzckC3t5Z&#xA;STgCNC3P4V7b4G/Z/urp/tfi5mtoBgpaQyDe/wDvEfdH05r6B03TLLSLGKy0+2jtraJdqRxrgAV8&#xA;9ReCPEstuJho9wiE4/e4jOfoxBqZ/AHihOulP0zgSxk4+gavdLi6vLxwFRTHG3zAnlvpVdXeS7Mi&#xA;mQHaRsJxjkVzc75mmd6wtNpO7Piyy8CeK9RjaS18Pai6qcEmBl5/HFTS/DnxlDGXfw3qO0ekJP6C&#xA;vtiivnm90vUNNYre2VxbnO3MsZUE+xPXpVSvoiQSi7Ek6qdw2qAeQKwbjwV4d1DfPLb7JWbBaNiv&#xA;P0BxSVXuTLBv7LPga5tLmymaK6t5YJFJBSRCpBHXg1DX3lf6RpuqhBqFhbXQT7vnxB9v0zXm3iD4&#xA;BeFtYv3u7OW40wuOYoMGPPqAeleK0V1viXwHfaJMjWm++tpSQpjQl0PoQPbuP04zyVaJp7HHOEoO&#xA;0j5Wors/Hfw01vwHPGb3y7izmJEV1BkqT6EHoa4yiiiimSFFFFFeheAfDl/DqbXs0fljycKpHPJB&#xA;59On61j+BPDsWu6yDdjNpERkf327L/j+Fe7f2cLCAqkAU45wMZrXDTh7ZRY5QfJzBXtPwM8A6u/i&#xA;ay8UXdp5WlxRO8MjnmRiCowPxJyfSuP+FvgP/hPPE5trh3i0+1Tzbl1HLDOAgPYn+QNfXmnafa6T&#xA;p1vp9lEIbW3QRxRgk7VH1qpFGJEwo4PJO7BBqwLRGVGk+ZsYB3GlihRY0IiCuevFXFUAYx+dduJn&#xA;Zcpyyk1sWqKKKpfZ0VsopU98d6rOmbrlN3yHI/EVoSSOr/LCSv8AeJwKzLzUbKzuA11c+UNh+6hP&#xA;ce1cKUW3cuNSv0bCiiika2idlk2srIeOTxSNa20h3SoOOcbsA0kOq6Ze5W21GNpOyyDbmpZFPCug&#xA;3Hseh+hrN04S2NoYqvSd5aoKKKKr77SSNkj4fkKM859q8+8Z+B1W2fUdMjzIgL3MYJZnJPUDPGOe&#xA;K9F6bX8pQQeMmkvY/PtpYThCw6gdawlF02elRqxxVN3RFcWtvdwmK5gjmjPVJFDD8jXzf8W/g6uh&#xA;RTeIfDyE6cCXubXOTBk/eX/Z56dvp0+laZNDFcQvDNGskTqVdHGQwPUEV840V0fjTQG0LWiFyYbg&#xA;eYhPr/EPz5/Gitk7q5584uEnFnwHRXU/EPwz/wAIl431DS0B8hX8yAnHMbcjp+X4UV6H8KbTytLg&#xA;uI7VJS5Z23yY+YMVBHHoBXq8tw723l3FkrBh9zzOn6V5L8M3lutCt4AuYIy6MUPzA7i3Pp1FelOi&#xA;6fZbXmZs9CzEkn8a5pR97Vfn/md9GHPFPoe5/s8aDDZ+DrjWt264v52Q+iohwB9c5Nex15J+z3q0&#xA;F34Ak05eJ7K6cOM9Q/zA/wAx+Fet1l392bOVfMgKpn5D5nB/SqsPie3Fy24RKAMAl/8A61WHhOrQ&#xA;lTNmDJBXPJrDvfB8P2bdbA+d1ILcEelelzVKkFzJHNUwnvc0Aooop03iS8vdTEFvLGsZPGwZJ/HF&#xA;T3Fi10+Jo95ZDy0hPcV0fhXw1pNrp0LtbpLKwBeRxyD6D0qfxDDpmmPFMpEQdSDlsAciuOnioSm4&#xA;2/r7zBxaCiiq97fWmm2j3V9cxW1un3pZXCqPqTXj82l3KXzQpEy/N8vBrtNAi1JLBkvox5an5PMb&#xA;Bq9Jf2zvsjCM4XcGyOlXre3ZYxPII2Y9Af51tzLt+YKLloixRWLY+L/DmpXy2VjrlhcXLDcIop1Z&#xA;iPbBraqk8MzrtCqCefv9P0pJ7SWTZI7bAvQ7+v6VppFJO5AAIz/DVXVpbbTIhLe3iKg5IbgDFRVb&#xA;kr8unz/zOrB01SvGUtWFFFZuta/pfh3T5b7Vb2K2gjUsS7cn2A6k/SvJ/inJvu9PQkEosgODn+7R&#xA;XM+K9ZOueILi5wvloTFFtOcoCcH8c5/GilHYxru9Rs+Zv2gYynxMZtynfZxHAOSOo5/KiuJ8Z+I5&#xA;PFfi3UdZcFVnlPlKf4Yxwo/ICirXhbxrqHhVZo7dVlhlO4oTghumQcHt/IV6/wCG7q91OC1vNYjM&#xA;d26bjEc/KO3HbPX2zivnuvVvAXjBLkRWN9Ltu4k2xs3/AC0UD19R3/P1xtRjHnu9zbD1HflbJ/B3&#xA;jvW/A95PcaRLHiddssUqbkb0OPUZr6c+EniPxF4o8JNqHiGFFYzFbeVU2GVMD5sfXIz3r4+r3z4J&#xA;/FS3tLaPwtr1yIkU4srmQ4UD/nmx7ex/D0r1WO1SOTzE2At17Zqjr2rjSrXfFbNNKRgYX5V+ppLb&#xA;V7dwCJFLMcdeallKzRyb04btXY9jrautD6DorO0/XtJ1WWSOw1K1uZInKOsUoYgjrx+NaNaPhC6m&#xA;u9CiuLlRukYtx6dq5T4xrbP4RuHcHzlCCPn/AKaJn9M12dmY7OwiijUKir0HavDvil40ttbuY9N0&#xA;ydJ7WMZkmXOC2TlRkcjgHIyDnrXz1C8q115nmy2CvJv2hkVvhzExkKlb6PC5+9w1eqXNzDZ2stzc&#xA;OI4YkLu56KoGSa+Vfi38Uo/HUtvYaZFLFpVsxfMoAaWTpnHYAdPqa85imlglWWGR45F6OjEEfiK6&#xA;+z+J/iO2jjilnjuIl4YOgDMPTI6flXG0V6SbWxmeZRSyQSrLFI0ciHKuhwQfUGu3sPjD4608Rquu&#xA;yzLH/DcIr5+pIyfzrhaK7rUvijq93amGzQWbN1kV9xH04GP1rkbvVtSv023moXdwvpNMzj9TVOin&#xA;KTluC0PRda+NvjTWtNFkb2OzBPzy2amOR/bdnj8MVwl9qd/qciyX99c3bqMBriVpCB6ZJqrRRRRR&#xA;UgFFFFFKrMjq6MVZTkEHBBoooAKKKK1tE8RXmiah9qTE+SS6SsxBJIJPX73HU5r1Ox8f6Zqdzawo&#xA;0m52zIhQgoP5H8KKKpTlyuPc0jVlFWRp+H9dvPDeu2mr2DKLi2feoblW9QfYivqHwT8afD/i27i0&#xA;6ZJNO1GReEmI8tz6K3r7ECiio/HvxEhg06TTNMaX7VOm1pB8vlL3OeufTH1+vjNFFY06UaatEhu5&#xA;xvxt+KUfk3PhHR2Vy4AvLlW+7zny1x34GT74r59oooooorQQUUUUUUUUAFFFFFFFFABRRRX/2Q==" posX="80" posY="70" rotation="1">
    <authors>
        <author fullName="Mubeena Mirza" email="" website="" />
    </authors>
    <executableAuthors>
        <author fullName="Zhuowen Tu" email="" website="" />
        <author fullName="Bruce Liu" email="" website="" />
    </executableAuthors>
    <metadata>
        <data key="__creationDateKey" value="Tue Sep 11 10:28:28 PDT 2007" />
    </metadata>
    <input id="BrainParser_1.Structure" name="Structure" description="0: segmentation sub-cortical structures&#xA;1: sulci detection" required="false" enabled="true" order="0" prefix="-p" prefixSpaced="true" prefixAllArgs="false">
        <format type="Enumerated" cardinality="1">
            <enumeration>0</enumeration>
            <enumeration>1</enumeration>
            <enumeration>2</enumeration>
        </format>
        <values>
            <value>2</value>
        </values>
    </input>
    <input id="BrainParser_1.Testing" name="Testing" description="0: perform segmentation/detection&#xA;1: perform training&#xA;" required="false" enabled="true" order="1" prefix="-r" prefixSpaced="true" prefixAllArgs="false">
        <format type="Enumerated" cardinality="1">
            <enumeration>0</enumeration>
            <enumeration>1</enumeration>
        </format>
        <values>
            <value>0</value>
        </values>
    </input>
    <input id="BrainParser_1.SourceFile" name="Source File" description="In testing, it points to the source file in training, it points directory in which the training volumes are saved.&#xA;" required="true" enabled="true" order="2">
        <format type="File" cardinality="1">
            <fileTypes>
                <filetype name="Analyze Image" extension="img" description="Analyze Image">
                    <need>hdr</need>
                </filetype>
                <filetype name="Analyze Image" extension="img" description="Analyze Image file">
                    <need>hdr</need>
                </filetype>
            </fileTypes>
        </format>
    </input>
    <output id="BrainParser_1.TargetFile" name="Target File" description="In testing, it points to the target file in training, it points directory in which the trained classifiers are saved.&#xA;" required="true" enabled="true" order="3">
       <format type="File" cardinality="1">
           <fileTypes>
               <filetype name="Analyze Image" extension="img" description="Analyze Image">
                    <need>hdr</need>
               </filetype>
            </fileTypes>
        </format>
    </output>
    <input id="BrainParser_1.ModelsDirectory" name="Models Directory" description="Directory of trained models." required="false" enabled="true" order="4" prefix="-m" prefixSpaced="true" prefixAllArgs="false">
        <format type="Directory" cardinality="1" />
        <values>
            <value>pipeline://cranium.loni.ucla.edu//usr/local/loniWorkflows/BrainParser/56_Structure</value>
        </values>
    </input>
    <input id="BrainParser_1.NumberofStructures" name="Number of Structures" description="Only effective in training." required="false" enabled="false" order="5" prefix="-n" prefixSpaced="true" prefixAllArgs="false">
        <format type="Number" cardinality="1" />
        <values>
            <value>1</value>
        </values>
    </input>
    <input id="BrainParser_1.NumberofIterations" name="Number of Iterations" required="false" enabled="false" order="6" prefix="-t" prefixSpaced="true" prefixAllArgs="false">
        <format type="Number" cardinality="1" />
    </input>
    <input id="BrainParser_1.SmoothnessFactor" name="Smoothness Factor" description="Defalut=0.5, typical 0.0~2.0." required="true" enabled="true" order="7" prefix="-s" prefixSpaced="true" prefixAllArgs="false">
        <format type="Number" cardinality="1" />
        <values>
            <value>2.0</value>
        </values>
    </input>
</module>

私が書いたPythonコードを以下に示します。

if __name__ == '__main__':
    soup = BeautifulSoup (
        open('test.xml'),
        'lxml'
    )

    for e in soup.find_all('module',attrs={'name':'Brain Parser'}):
        for i in e.find_all('input'):
            print i.prettify()

そしてこれが結果です:

<input description="0: segmentation sub-cortical structures 1: sulci detection" enabled="true" id="BrainParser_1.Structure" name="Structure" order="0" prefix="-p" prefixallargs="false" prefixspaced="true" required="false"/>

<input description="0: perform segmentation/detection 1: perform training" enabled="true" id="BrainParser_1.Testing" name="Testing" order="1" prefix="-r" prefixallargs="false" prefixspaced="true" required="false"/>

<input description="In testing, it points to the source file in training, it points directory in which the training volumes are saved. " enabled="true" id="BrainParser_1.SourceFile" name="Source File" order="2" required="true"/>

<input description="Directory of trained models." enabled="true" id="BrainParser_1.ModelsDirectory" name="Models Directory" order="4" prefix="-m" prefixallargs="false" prefixspaced="true" required="false"/>

<input description="Only effective in training." enabled="false" id="BrainParser_1.NumberofStructures" name="Number of Structures" order="5" prefix="-n" prefixallargs="false" prefixspaced="true" required="false"/>

<input enabled="false" id="BrainParser_1.NumberofIterations" name="Number of Iterations" order="6" prefix="-t" prefixallargs="false" prefixspaced="true" required="false"/>

<input description="Defalut=0.5, typical 0.0~2.0." enabled="true" id="BrainParser_1.SmoothnessFactor" name="Smoothness Factor" order="7" prefix="-s" prefixallargs="false" prefixspaced="true" required="true"/>

ご覧のとおり、これにinputは子要素がないと考えられますが、そうではありません。少し調べてみたところ、要素は要素の子として解析されているvalueようです。誰かがこれを手伝ってくれる?formatmodule

4

1 に答える 1

3

で呼び出しBeautifulSoupています。"lxml"これは、パーサーを使用しlxmlて入力をHTMLとして解析するように指示します。(HTMLでは、inputタグは自己閉鎖的で子がないため、文字列は有効なHTMLではありません。BeautifulSoupは魔法のHTML修正を行い、inputタグがすぐに閉じることを意味していると判断します。そのため、タグは表示されません。子供。)

で呼び出したい"xml"場合は、入力がXMLドキュメントであることを示します。

于 2012-11-22T15:32:09.920 に答える