regex - Perl RegEx 10桁の数字に関するメモのブロックを解析する

Question

わかりました、これが問題です。古い SQL サーバーのテキスト形式でメモがあります。レコードのすべてのメモを 1 つの大きなデータの塊にまとめます。そのテキストの塊を取り出して解析し、メモエントリごとに 1 つの行を作成し、タイムスタンプ、ユーザー、メモテキストの列を個別に作成する必要があります。私が考えることができる唯一の方法は、正規表現を使用して各メモのUNIXタイムスタンプを見つけ、それを解析することです。区切り文字を解析するための分割機能があることは知っていますが、それによって区切り文字が削除されます。\d{10} で解析する必要がありますが、10 桁の数字も保持します。ここにいくつかのサンプルデータがあります。

create table test_table
(
job_number number,
notes varchar2(4000)
)

insert into test_table values
(12345, '1234567890 username notes text notes text notes text notes text 5468204562 username notes text notes text notes text notes text 1025478510 username notes text notes text notes text notes text')
(12346, '2345678901 username notes text notes text notes text notes text 1523024512 username notes text notes text notes text notes text 1578451236 username notes text notes text notes text notes text')
(12347, '2345678902 username notes text notes text notes text notes text 2365201214 username notes text notes text notes text notes text 1202154215 username notes text notes text notes text notes text')

このように見えるように、メモごとに 1 つのレコードを表示したいと思います。

JOB_NUMBER        DTTM    USER     NOTES_TEXT
----------    ----------  ----     ----------
12345         1234567890  USERNAME notes text notes text notes text notes text
12345         5468204562  USERNAME notes text notes text notes text notes text
12345         1025478510  USERNAME notes text notes text notes text notes text
12346         2345678901  USERNAME notes text notes text notes text notes text
12346         1523024512  USERNAME notes text notes text notes text notes text
12346         1578451236  USERNAME notes text notes text notes text notes text
12347         2345678902  USERNAME notes text notes text notes text notes text
12347         2365201214  USERNAME notes text notes text notes text notes text
12347         1202154215  USERNAME notes text notes text notes text notes text

ご協力いただきありがとうございます。

score 1 · Accepted Answer

Text::ParseWords引用符で囲まれた文字列を処理し、カンマで分割できます。フリップフロップ演算子を使用して、入力をスキップできます1 .. /values/。この特定のスキップ方法は、修正が必要になる場合があります。

次に、文字列を解析するだけです。これは、先読みアサーションを使用して分割し、各部分文字列のさまざまなエントリをキャプチャすることで実行できます。分割の正規表現:

my @entries = split /(?<!^)(?=\d{10})/, $data;

には、文字列の先頭での一致を回避するための否定後読みアサーション^と、10 個の数字と一致するための先読みアサーションがあります。これにより、数値で効果的に分割され、それらが保持されます。

ファイルハンドルはデモンストレーションに使用されます。DATA単純にに置き換え<DATA>て<>、引数ファイル名で使用します。

use strict;
use warnings;
use Text::ParseWords;

my $format = "%-12s %-12s %-10s %s\n";              # format for printing
my @headers = qw(JOB_NUMBER DTTM USER NOTES_TEXT);  
printf $format, @headers;
printf $format, map "-" x length, @headers;         # print underline
while (<DATA>) {
    next while 1 .. /values/;                       # skip to data
    s/^\(|\)$//g;                                   # remove parentheses
    my ($job, $data) = quotewords('\s*,\s*',0, $_); # parse string
    my @entries = split /(?<!^)(?=\d{10})/, $data;  # split into entries
    for my $entry (@entries) {                      # parse each entry
        my ($dttm, $user, $notes) = $entry =~ /^(\d+)\s+(\S+)\s+(.*)/;
        printf $format, $job, $dttm, $user, $entry;
    }
}

__DATA__
create table test_table
(
job_number number,
notes varchar2(4000)
)

insert into test_table values
(12345, '1234567890 username notes text notes text notes text notes text 5468204562 username notes text notes text notes text notes text 1025478510 username notes text notes text notes text notes text')
(12346, '2345678901 username notes text notes text notes text notes text 1523024512 username notes text notes text notes text notes text 1578451236 username notes text notes text notes text notes text')
(12347, '2345678902 username notes text notes text notes text notes text 2365201214 username notes text notes text notes text notes text 1202154215 username notes text notes text notes text notes text')

出力：

JOB_NUMBER   DTTM         USER       NOTES_TEXT
----------   ----         ----       ----------
12345        1234567890   username   1234567890 username notes text notes text notes text notes text
12345        5468204562   username   5468204562 username notes text notes text notes text notes text
12345        1025478510   username   1025478510 username notes text notes text notes text notes text
12346        2345678901   username   2345678901 username notes text notes text notes text notes text
12346        1523024512   username   1523024512 username notes text notes text notes text notes text
12346        1578451236   username   1578451236 username notes text notes text notes text notes text
12347        2345678902   username   2345678902 username notes text notes text notes text notes text
12347        2365201214   username   2365201214 username notes text notes text notes text notes text
12347        1202154215   username   1202154215 username notes text notes text notes text notes text

regex - Perl RegEx 10桁の数字に関するメモのブロックを解析する

1 に答える 1

Related

Reference