ruby - PBXProject ファイルの正規表現

Question

XCode プロジェクトファイルパーサー PBXProject の純粋な Ruby 実装に取り組んでおり、正規表現の助けはほとんど必要ありません。

そのため、PBXProject ファイルには、内容が混在する奇妙なコード行が多数含まれています。私が今持っているのは、(.*?) = (.*?)( \/\* (.*) \*\/)?; ?より単純なケース（最初の行）で機能する正規表現です。しかし、2 行目では、カットが早すぎます (最初の ; -文字まで)。

isa = PBXBuildFile; fileRef = C0480C2015F4F91F00E0A2F4 /* zip.c */;

isa = PBXBuildFile; fileRef = C0480C2315F4F91F00E0A2F4 /* ZipArchive.mm */; settings = {COMPILER_FLAGS = "-fno-objc-arc"; };

したがって、これらの行から欲しいのは単純なname = valueペアです。

isa = PBXBuildFile
settings = {COMPILER_FLAGS = "-fno-objc-arc"; }

1つの正規表現でこれを達成する簡単な方法は?

score 1 · Accepted Answer

この正規表現は問題なく機能します。

[a-zA-Z0-9]*\s*?=\s*?.*?(?:{[^}]*}|(?=;))

ブラケットは 1 レベルのみ許可されていることに注意してください。正規表現はネストされたブラケットを処理しません。

あなたの例から、次の行がキャッチされます。

isa = PBXBuildFile
fileRef = C0480C2015F4F91F00E0A2F4 /* zip.c */
isa = PBXBuildFile
fileRef = C0480C2315F4F91F00E0A2F4 /* ZipArchive.mm */
settings = {COMPILER_FLAGS = "-fno-objc-arc"; }

正規表現の説明は次のとおりです。

[a-zA-Z0-9]*\s*?=\s*?.*?(?:{[^}]*}|(?=;))

Options: ^ and $ match at line breaks

Match a single character present in the list below «[a-zA-Z0-9]*»
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    A character in the range between “a” and “z” «a-z»
    A character in the range between “A” and “Z” «A-Z»
    A character in the range between “0” and “9” «0-9»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “=” literally «=»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match any single character that is not a line break character «.*?»
    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regular expression below «(?:(?={){[^}]*}|(?=;))»
    Match either the regular expression below (attempting the next alternative only if this one fails) «(?={){[^}]*}»
        Match the character “{” literally «{»
        Match any character that is NOT a “}” «[^}]*»
            Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
        Match the character “}” literally «}»
    Or match regular expression number 2 below (the entire group fails if this one fails to match) «(?=;)»
        Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=;)»
            Match the character “;” literally «;»

score 0 · Accepted Answer

解析したいコンテンツの正確な性質によっては、単一の有限式では不可能な場合があります。問題が発生している 2 行目は、ネストされたパターンが関係している可能性があることを示しています。ネストされたパターンは、有限の深さまでしか一致できません。これが、[X]HTML を正規表現で解析することをお勧めしない理由の 1 つです。実際に任意の深いネストを処理したい場合は、Treetopのようなものを検討することをお勧めします。

堅牢である必要がない場合は、次のような式を試すことができます。

/((?i)(?:[^;]+=\s*\{.*?\})|[^;]+=[^;]+);/

これは最初にの形式のものとの一致を試みsomething = {anything}、失敗した場合something = somethingは;. string.scan(/regex/)を使用して、特定の文字列のすべての一致を見つけることができるはずです。このようにブロックを処理すると、マッチングプロセスが途中で終了するなどの問題が回避され、ペアを簡単に抽出できます。

参考文献：

ruby - PBXProject ファイルの正規表現

2 に答える 2

Related

Reference