c++ - サンプルテキストファイルの解析と分割

Question

アセンブリ命令を含む単純なテキストファイルを調べようとしていますが、次のようになります

TOP   NOP
VAL   INT 0
TAN   LA 2,1

これはほんの一例ですので、どのように機能するかをお見せします。基本的に、最初のラベルを取得してラベルに配置し、次に NOP、INT、LA の 2 番目のラベルをオペコードに配置します。

その後、最初の引数 (0 と 2) を受け取り、それらを arg1 に配置します。ただし、ここで私の問題が発生します。現在のコードでは、引数を文字列に配置したときに得られる出力はそのままです

TOP
0
2

明らかに、最後の2つだけを取得したいのですが、最初の引数でTOPがそこに投げ込まれないようにするにはどうすればよいですか?

#include <string>
#include <iostream>
#include <cstdlib>
#include <string.h>
#include <fstream>
#include <stdio.h>

using namespace std;

int main(int argc, char *argv[])
{
// If no extra file is provided then exit the program with error message
if (argc <= 1)
{
    cout << "Correct Usage: " << argv[0] << " <Filename>" << endl;
    exit (1);
}

// Array to hold the registers and initialize them all to zero
int registers [] = {0,0,0,0,0,0,0,0};

string memory [16000];

string Symtablelab[1000];
int Symtablepos[1000];

string line;
string label;
string opcode;
string arg1;
string arg2;

// Open the file that was input on the command line
ifstream myFile;
myFile.open(argv[1]);

if (!myFile.is_open())
{
    cerr << "Cannot open the file." << endl;
}

int counter = 0;
int i = 0;
int j = 0;

while (getline(myFile, line, '\n'))
{
    if (line[0] == '#')
    {
        continue;
    }

    if (line.length() == 0)
    {
        continue;
    }

    if (line[0] != '\t' && line[0] != ' ')
    {
        string delimeters = "\t ";

        int current;
        int next = -1;

        current = next + 1;
        next = line.find_first_of( delimeters, current);
        label = line.substr( current, next - current );

        Symtablelab[i] = label;

        current = next + 1;
        next = line.find_first_of(delimeters, current);
        opcode = line.substr(current, next - current);

        if (opcode != "WORDS" && opcode != "INT")
        {
            counter += 3;
        }

        if (opcode == "INT")
        {
            counter++;
        }

        delimeters = ", \n\t";
        current = next + 1;
        next = line.find_first_of(delimeters, current);
        arg1 = line.substr(current, next-current);

        cout << arg1<<endl;

        i++;
    }
}

score 2 · Accepted Answer

この手法を使用すると非常に多くの弱点があり、結果をまったくチェックしません。たとえば、あなたが言うとき：

current = next + 1;

アイテム間の区切り文字は 1 つしかないことは既にご存じでしょう。それ以外の場合は、すべてのアイテムをバイパスする必要があります。

next = line.find_first_of(delimeters, current);
<something> = line.substr(current, next - current)

find_first_ofそうしないと、-1 が返され、next - current負の値になります。

この仕事をしたい場合はregex、stdまたはboost正規表現を使用して、このタスクは簡単です。次を使用します。

std::matches m;
std::regex rx("\\s*(\\w+)\\s+(\\w+)(?:\\s+(\\d+)\\s*(?:,(\\d+))?)?");
if (std::regex_match(line, m, rx)) {
    // we found a match here
    string label = m.str(1);
    string opcode = m.str(2);
    string arg1 = m.str(3), arg2 = m.str(4)
}

score 1 · Accepted Answer

問題は、後続の各単語の開始を探しています: current = next + 1. 単語の先頭となる最初の非区切り文字を探し、引数を探す前に行末にいるかどうかを確認します。

デバッグ情報を追加すると、次のように表示されます。

>> label: start=0 end=3 value="TOP"
>> opcode: start=4 end=4 value=""

>> label: start=0 end=3 value="VAL"
>> opcode: start=4 end=4 value=""

>> label: start=0 end=3 value="TAN"
>> opcode: start=4 end=4 value=""

これは、オペコードを試行するたびに別の区切り文字を見つけていることを示しています。

問題は、単語の後に 1 つだけインクリメントし、次の line.substr() が区切り文字をキャッチすることです。

開始後のルックアップで、次のように変更します。

current = next + 1;

に：

current = line.find_first_not_of(delimeters, next + 1);

これにより、すべての区切り文字の後の次の単語の開始を探すことができます。

また、残りの行の長さを条件として引数の検索を行いたいので、で囲みますif(next >0) { ... }。

これにより、デバッグと元の出力（条件付き）が得られます。

>> label: start=0 end=3 value="TOP"
>> opcode: start=6 end=-1 value="NOP"
>> label: start=0 end=3 value="VAL"
>> opcode: start=6 end=9 value="INT"
>> arg1: start=10 end=-1 value="0"
0
>> label: start=0 end=3 value="TAN"
>> opcode: start=6 end=8 value="LA"
>> arg1: start=9 end=10 value="2"
2

メインループからの解析/トークン化をリファクタリングして、それらに集中できるようにします。解析機能をテストするのに役立つ cppunit (または類似のもの) を取得することもできます。そのようなものがない場合は、1 つの場所に移動して、次のようなデバッグ情報を挿入するのに役立ちます。

cout << ">> " << whatIsBeingDebugged << ": " << start=" << current 
     << " end=" << next << " value= \"" << value << "\"" << endl;

堅牢な語彙アナライザーとパーサーを作成することは、多くのライブラリ (lex と yacc、flex と bison など) の主題であり、正規表現などの他のアプリケーションのアプリケーションになる可能性があり、大学全体のコースでもあります。仕事です。ただし、cppunit (または同様のもの) を使用した単体テストのように、系統的かつ徹底的にテストピースを分離してください。

c++ - サンプル テキスト ファイルの解析と分割

2 に答える 2

Related

Reference

c++ - サンプルテキストファイルの解析と分割