sql - sql regex parse text to add in new lines

Question

I am trying to take a notes field that is just a big block of text, sample data is below as if I were inserting it into a table.

create table test_table
(
job_number number,
notes varchar2(4000)
)

insert into test_table (job_number,notes)
values (12345,1022089483 notes notes notes notes 1022094450 notes notes notes notes 1022095218 notes notes notes notes)

I need to parse it out so there is a separate record for each notes entry (the 10 digit numbers leading the notes are unix timestamps). so if i were to export to pipe delimited it would look like this:

job_number|notes

12345|1022089483 notes notes notes notes

12345|1022094450 notes notes notes notes

12345|1022095218 notes notes notes notes

I really hope this makes sense. I appreciate any insight.

score 0 · Accepted Answer

これを行ういくつかの方法：

SQL> insert into test_table (job_number,notes)
  2  values (12345,'1022089483 notes notes notes notes 1022094450 notes notes notes notes 1022095218 notes notes notes notes');

1 row created.

SQL> insert into test_table (job_number,notes)
  2  values (12346,'1022089483 notes notes notes notes 1022094450 foo 1022095218 test notes 1022493228 the answer is 42');

1 row created.

SQL> commit;

Commit complete.

注：私は[0-9]{10}、メモを決定するための正規表現として使用しています（つまり、10桁の数字はメモの先頭と見なされます）。

最初に、任意の行のノートの最大数を計算するアプローチを取り、次にその行数でデカルト結合を実行できます。次に、各メモを除外します。

SQL> with data
  2  as (select job_number, notes,
  3            (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes
  4        from test_table t)
  5  select job_number,
  6         substr(d.notes, regexp_instr(d.notes, '[0-9]{10}', 1, rn.l),
  7                       regexp_instr(d.notes||' 0000000000', '[0-9]{10}', 1, rn.l+1)
  8                       -regexp_instr(d.notes, '[0-9]{10}', 1, rn.l) -1
  9               ) note
 10    from data d
 11         cross join (select rownum l
 12                      from dual
 13                    connect by level <= (select max(num_of_notes)
 14                                           from data)) rn
 15   where rn.l <= d.num_of_notes
 16   order by job_number, rn.l;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42

7 rows selected.

ノートの数が一般的に同じである限り、これは問題ありません（多くの再帰的なルックアップを実行しているため、差が大きいほど、このスケールは悪化します）。

11gでは、再帰的因数分解されたサブクエリを使用して上記と同じことを行うことができますが、余分なループは行いません。

SQL> with data (job_number, notes, note, num_of_notes, iter)
  2  as (select job_number, notes,
  3             substr(notes, regexp_instr(notes, '[0-9]{10}', 1, 1),
  4                    regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, 2)
  5                    -regexp_instr(notes, '[0-9]{10}', 1, 1) -1
  6                  ),
  7             (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes,
  8             1
  9        from test_table
 10      union all
 11     select job_number, notes,
 12             substr(notes, regexp_instr(notes, '[0-9]{10}', 1, iter+1),
 13                    regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, iter+2)
 14                    -regexp_instr(notes, '[0-9]{10}', 1, iter+1) -1
 15                  ),
 16             num_of_notes, iter + 1
 17       from data
 18      where substr(notes, regexp_instr(notes, '[0-9]{10}', 1, iter+1),
 19                    regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, iter+2)
 20                    -regexp_instr(notes, '[0-9]{10}', 1, iter+1) -1
 21                  ) is not null
 22    )
 23  select job_number, note
 24    from data
 25  order by job_number, iter;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42

7 rows selected.

または、10g以降では、model句を使用して行を構成できます。

SQL> with data as (select job_number, notes,
  2                       (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes
  3                  from test_table)
  4  select job_number, note
  5    from data
  6  model
  7  partition by (job_number)
  8  dimension by (1 as i)
  9  measures (notes, num_of_notes, cast(null as varchar2(4000)) note)
 10  rules
 11  (
 12    note[for i from 1 to num_of_notes[1] increment 1]
 13      = substr(notes[1],
 14               regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)),
 15               regexp_instr(notes[1]||' 0000000000', '[0-9]{10}', 1, cv(i)+1)
 16               -regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)) -1
 17              )
 18  )
 19  order by job_number, i;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42

sql - sql regex parse text to add in new lines

1 に答える 1

Related

Reference