oracle11g - 最初と最後のセグメントを返す REGEXP_SUBSTR

Question

アカウント番号をいくつかの異なるバリエーションで保存できるデータセットがあります。セグメント区切りとしてハイフンまたはスペースを含めるか、完全に連結することができます。私の希望する出力は、最初の 3 文字と最後の 5 文字の英数字です。2 つのセグメント「FIRST_THREE_AND_LAST_FIVE:

with testdata as (select '1-23-456-78-90-ABCDE' txt from dual union all
                  select '1 23 456 78 90 ABCDE' txt from dual union all
                  select '1234567890ABCDE' txt from dual union all
                  select '123ABCDE' txt from dual union all
                  select '12DE' txt from dual)
select TXT
       ,regexp_replace(txt, '[^[[:alnum:]]]*',null) NO_HYPHENS_OR_SPACES
       ,regexp_substr(regexp_replace(txt, '[^[[:alnum:]]]*',null), '([[:alnum:]]){3}',1,1) FIRST_THREE
       ,regexp_substr(txt, '([[:alnum:]]){5}$',1,1) LAST_FIVE
       ,regexp_substr(regexp_replace(txt, '[^[[:alnum:]]]*',null), '([[:alnum:]]){3}',1,1) FIRST_THREE_AND_LAST_FIVE
from  testdata;

私の望ましい出力は次のようになります。

FIRST_THREE_AND_LAST_FIVE
-------------------------
123ABCDE
123ABCDE
123ABCDE
123ABCDE
(null)

score 1 · Accepted Answer

これが私の試みです。一致が見つからない場合、元の文字列が返されることに注意してください。そのregexp_replace()ため、null を直接取得することはできません。私の考えは、結果の文字列が元の文字列と一致するかどうかを確認することでしたが、もちろん、結果が正しく、たまたま元の文字列と一致する 4 行目では機能しません。他の人は、CASE を使用して長さなどをカウントする方法について言及していますが、返される 8 文字をチェックするだけでは、それらが正しいとは限らないため、より厳密になり、最初の 3 が数値であり、最後の 5 がアルファであることを確認します。 8キャラ！そこは読者にお任せします。

とにかく、これは数字とそれに続くオプションのダッシュまたはスペース（仕様による）を探し、数字を記憶し（3回）、最後の5つのアルファベット文字も記憶します。次に、記憶されたグループをその順序で返します。

これを、文字列を渡してきれいな文字列を返す関数にすることを強くお勧めします。メンテナンスがはるかに簡単になり、このコードをカプセル化して再利用できるようにし、PL/SQL コードを使用してエラーチェックを改善できるようにするためです。

SQL> with testdata(txt) as (
  2    select '1-23-456-78-90-ABCDE' from dual
  3    union
  4    select '1 23 456 78 90 ABCDE' from dual
  5    union
  6    select '1234567890ABCDE'      from dual
  7    union
  8    select '123ABCDE'             from dual
  9    union
 10    select '12DE'                 from dual
 11  )
 12  select
 13    case when length(regexp_replace(upper(txt), '^(\d)[- ]?(\d)[- ]?(\d)[- ]?.*([A-Z]{5})$', '\1\2\3\4')) < 8
 14         -- Needs more robust error checking here
 15         THEN 'NULL'  -- for readability
 16      else regexp_replace(upper(txt), '^(\d)[- ]?(\d)[- ]?(\d)[- ]?.*([A-Z]{5})$', '\1\2\3\4')
 17    end result
 18  from testdata;

RESULT
--------------------------------------------------------------------------------
123ABCDE
123ABCDE
123ABCDE
123ABCDE
NULL

SQL>

score 0 · Accepted Answer

You can use the fact that the position parameter of REGEXP_REPLACE() can take back-references to get a lot closer. Wrapped in a CASE statement you get what you're after:

select case when length(regexp_replace(txt, '[^[:alnum:]]')) >= 8 then
            regexp_replace( regexp_replace(txt, '[^[:alnum:]]')
                          , '^([[:alnum:]]{3}).*([[:alnum:]]{5})$'
                          , '\1\2')
       end
  from test_data

This is, where the length of the string with all non-alpha-numeric characters replaced is greater or equal to 8 return the 1st and 2nd groups, which are respectively the first 3 and last 8 alpha-numeric characters.

This feels... overly complex. Once you've replaced all non-alpha-numeric characters you can just use an ordinary SUBSTR():

with test_data as (
select '1-23-456-78-90-ABCDE' txt from dual union all
select '1 23 456 78 90 ABCDE' txt from dual union all
select '1234567890ABCDE' txt from dual union all
select '123ABCDE' txt from dual union all
select '12DE' txt from dual
       )
, standardised as (
select regexp_replace(txt, '[^[:alnum:]]') as txt
  from test_data
       )
select case when length(txt) >= 8 then substr(txt, 1, 3) || substr(txt, -5) end
  from standardised

oracle11g - 最初と最後のセグメントを返す REGEXP_SUBSTR

3 に答える 3

Related

Reference