ruby - Parsletを使用してRubyでCスタイルのコメントを処理するにはどうすればよいですか?

Question

Parslet自身の作成者 (このリンクで入手可能)のコード例を出発点として、C に似た構文で記述されたファイルからコメントなしのすべてのテキストを取得するように拡張する必要があります。

提供された例では、C スタイルのコメントを正常に解析でき、これらの領域を通常の行スペースとして扱います。ただし、この単純な例では、入力例のように、ファイルのコメント化されていない領域に「a」文字のみが必要です。

         a
      // line comment
      a a a // line comment
      a /* inline comment */ a 
      /* multiline
      comment */

コメントされていないテキストを検出するために使用されるルールは、次のとおりです。

   rule(:expression) { (str('a').as(:a) >> spaces).as(:exp) }

したがって、次のようなより一般的なファイルから他のすべての (コメントされていない) テキストを取得するために、前のルールを一般化する必要があります。

     word0
  // line comment
   word1 // line comment
  phrase /* inline comment */ something 
  /* multiline
  comment */

私は構文解析式文法に不慣れで、以前の試行はどちらも成功しませんでした。

score 4 · Accepted Answer

一般的な考え方は、シーケンスの 1 つ//または/*出現するまで、すべてがコード (別名、コメントなし) であるということです。これを次のようなルールで反映できます。

rule(:code) {
  (str('/*').absent? >> str('//').absent? >> any).repeat(1).as(:code)
}

私のコメントで述べたように、文字列には小さな問題があります。コメントが文字列内にある場合、それは明らかに文字列の一部です。コードからコメントを削除すると、このコードの意味が変わってしまいます。したがって、パーサーに文字列が何であるか、そしてそこに含まれるすべての文字がそれに属することを知らせる必要があります。もう一つはエスケープシーケンスです。たとえば"foo \" bar /*baz*/"、リテラルの二重引用符を含む string は、実際にはとして解析され"foo \"、その後に再びコードが続きます。もちろん、これは対処する必要があるものです。上記のすべてのケースを処理する完全なパーサーを作成しました。

require 'parslet'

class CommentParser < Parslet::Parser
  rule(:eof) { 
    any.absent? 
  }

  rule(:block_comment_text) {
    (str('*/').absent? >> any).repeat.as(:comment)
  }

  rule(:block_comment) {
    str('/*') >> block_comment_text >> str('*/')
  }

  rule(:line_comment_text) {
    (str("\n").absent? >> any).repeat.as(:comment)
  }

  rule(:line_comment) {
    str('//') >> line_comment_text >> (str("\n").present? | eof)
  }

  rule(:string_text) {
    (str('"').absent? >> str('\\').maybe >> any).repeat
  }

  rule(:string) {
    str('"') >> string_text >> str('"')
  }

  rule(:code_without_strings) {
    (str('"').absent? >> str('/*').absent? >> str('//').absent? >> any).repeat(1)
  }

  rule(:code) {
    (code_without_strings | string).repeat(1).as(:code)
  }

  rule(:code_with_comments) {
    (code | block_comment | line_comment).repeat
  }

  root(:code_with_comments)
end

入力を解析します

     word0
  // line comment
   word1 // line comment
  phrase /* inline comment */ something 
  /* multiline
  comment */

このASTに

[{:code=>"\n   word0\n "@0},
 {:comment=>" line comment"@13},
 {:code=>"\n  word1 "@26},
 {:comment=>" line comment"@37},
 {:code=>"\n phrase "@50},
 {:comment=>" inline comment "@61},
 {:code=>" something \n "@79},
 {:comment=>" multiline\n comment "@94},
 {:code=>"\n"@116}]

コメント以外のすべてを抽出するには、次のようにします。

input = <<-CODE
     word0
  // line comment
   word1 // line comment
  phrase /* inline comment */ something 
  /* multiline
  comment */
CODE

ast = CommentParser.new.parse(input)
puts ast.map{|node| node[:code] }.join

生成する

   word0

  word1
 phrase  something

ruby - Parsletを使用してRubyでCスタイルのコメントを処理するにはどうすればよいですか?

2 に答える 2

Related

Reference