postgresql - PostgreSQLは再帰CTEから関数にデータを渡します

Question

次の問題があります: ソースノード ( node_s ) からターゲットノード ( node_t ) へのすべての可能なパスを検出しようとしています。

グラフエッジを含む元のテーブルの形式は単純です。ノード_x | ノード_y | 強度 | ここで、"node_x" -> "node_y" は、エッジの強度が "重み" である直接エッジです。

パスの探索中の任意の時点で、その子の中のノードがターゲットnode_tを持っていることを発見した場合、このパスを記録し、このノードからのパスの探索を停止します。それ以外の場合は、探索を続行します。

簡単な解決策は、グラフの推移閉包を構築する PostgreSQL の再帰 CTE を使用することでした。

  WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
  ( 
   --non-recurive term
   SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
   FROM best_path b
   WHERE b.node_x = node_s --source_node

   UNION

   --recurive term
   SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
   FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
   WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
  )
  SELECT * 
  FROM transitive_closure tc
  WHERE tc.target = node_t --target_node

上記のコードは、ソースノードnode_sから可能なすべてのパスを検出します。推移閉包の構築後にのみ、ソースノードからターゲットノードへの必要なパスの行を選択できます (最後の SELECT ステートメントを参照)。

例：

best_path テーブルには次のデータがあります。

 node_x | node_y | strength 
--------+--------+----------
      1 |      2 |        1
      1 |      3 |        1
      2 |      4 |        1
      2 |      5 |        1
      3 |      6 |        1
      3 |      7 |        1
      4 |      8 |        1
      4 |      9 |        1
      5 |     10 |        1
      5 |     11 |        1

クエリ:

ソースノード = 1 からターゲットノード = 4 へのパスを見つける

結果：

 source | target | strength | path_string
--------+--------+----------+------------
      1 |      2 |        1 | 1.2.
      1 |      3 |        1 | 1.3.
      1 |      4 |        1 | 1.2.4.
      1 |      5 |        1 | 1.2.5.
      1 |      6 |        1 | 1.3.6.
      1 |      7 |        1 | 1.3.7.
      1 |      8 |        1 | 1.2.4.8.
      1 |      9 |        1 | 1.2.4.9.
      1 |     10 |        1 | 1.2.5.10.
      1 |     11 |        1 | 1.2.5.11.

これは私が必要とするものではありません。ノード 2 からノード 4 (ターゲット) への直接エッジが既にあるため、パス 1.2.5.、1.2.4.8.、1.2.4.9.、1.2.5.10.、1.2.5.11. は必要ありません。ノード 2 の場合、2 から 4 へのパスが検出された時点で停止する必要があります。

要約すると、ノードが既にターゲットノードへの直接エッジを持っている場合、ノードのパスを発見したくありません。これは、CTE の再帰的な用語で、次のような条件が必要であることを意味します。疑似コードは次のとおりです。

  WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
  ( 
   --non-recurive term (as before)
   SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
   FROM best_path b
   WHERE b.node_x = node_s --source_node

   UNION

   --recurive term
   SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
   FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
   WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
     AND b.node_y = node_t 
     --will select only rows with direct edge to target

   UNION (second union is not allowed in CTE)

   SELECT those rows which do not have direct edge to target 
   AND which parents did not contribute to constructing the query above. 
   i.e. b.node_x = tc.target where there is no direct edge between b.node_x to node_t
  )

ソースノード = 1 からターゲットノード = 4 へのパスを検索するクエリの結果として、次のようにしたいと考えています。

 source | target | strength | path_string
--------+--------+----------+------------
      1 |      2 |        1 | 1.2.
      1 |      3 |        1 | 1.3.
      1 |      4 |        1 | 1.2.4.
      1 |      6 |        1 | 1.3.6.
      1 |      7 |        1 | 1.3.7.

よろしくお願いします。

私はすでに多くの方法を試しました。たとえば、FROM/WHERE 句の条件、CTE を関数に渡そうとしましたが、成功しませんでした。

任意の提案をいただければ幸いです。

私は自分が望むものを達成する独自の再帰関数を持っていますが、膨大な量のデータでは非常に遅くなります。PostgreSQL の CTE は最適化されているようですので、もう少し掘り下げてみたいと思います。

score 2 · Accepted Answer

下部から開始すると、パスの検索をより効率的に行うことができます。子供たちから始めます。親から開始すると、すべての子をトラバースする必要があります。一方、子から検索した場合、親は 1 つしかないため、ソースとターゲットの間のパスを見つけるのに時間を無駄にすることはありません。

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9

    union all

    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source
),
construct_path(source, target, recentness, path) as
(
  select source, target, recentness, source || '.' || target
  from find_parent 
  where recentness = (select max(recentness) from find_parent)

  union

  select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
  from find_parent dd
  join construct_path cp on dd.recentness = cp.recentness - 1  
)
select source, target, path 
from construct_path
order by recentness desc

出力：

SOURCE   TARGET   PATH
1        2        1.2
2        4        1.2.4
4        9        1.2.4.9

ライブテスト: http://www.sqlfiddle.com/#!1/13e6b/1

これに似ています: SQL SERVER 2005 で子を指定して親を取得する方法

これは最適化されており、特定のもの (ソース) が既に見つかっている場合は、親への再帰をカットします。

ソース = 2

ターゲット = 9

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9

    union all

    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source 
         -- despite the name, this target is another one's source
         and i.target <> 2
)
,construct_path(source, target, recentness, path) as
(
    select source, target, recentness, source || '.' || target
    from find_parent 
    where recentness = (select max(recentness) from find_parent)

    union

    select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
    from find_parent dd
    join construct_path cp on dd.recentness = cp.recentness - 1  

)
select source, target, path
from construct_path
order by recentness desc

出力：

SOURCE   TARGET  PATH
2        4       2.4
4        9       2.4.9

ライブテスト: http://www.sqlfiddle.com/#!1/13e6b/16

score 1 · Accepted Answer

最適化されたソリューション。最終結果にWHERE句はもうありません。Postgresql固有のソリューションですが、つまり、特定の行を選択するためにDISTINCTONを使用します。

このデータを考えると：

CREATE TABLE best_path
    (node_x int, node_y int, strength int);

INSERT INTO best_path
    (node_x, node_y, strength)
VALUES
    (1, 2, 1),
    (1, 3, 1),
    (2, 4, 1),
    (2, 5, 1),
    (3, 6, 1),
    (3, 7, 1),
    (4, 8, 1),
    (4, 9, 1),
    (5, 10, 1),
    (5, 11, 1),
    (5, 12, 1);

クエリ、第1段階、舞台裏を表示（ソース= 1、ターゲット= 11）：

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo, distincter) as
(
  select                      
      bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool            
      ,coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)                           
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select 
      tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool        
      ,coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)                       
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
)    
select tp.*
from traverse_path tp

ソース=1、ターゲット= 11の出力：http ：//www.sqlfiddle.com/#！1 / db290 / 56

FILTER   SOURCE   TARGET   PATH     BINGO    DISTINCTER
1        1        2        1.2      0        2
1        1        3        1.3      0        3
1        2        4        1.2.4    0        4
1        2        5        1.2.5    0        5
1        3        6        1.3.6    0        6
1        3        7        1.3.7    0        7
1        4        8        1.2.4.8  0        8
1        4        9        1.2.4.9  0        9
1        5        11       1.2.5.11 1        1
1        5        10       1.2.5.10 1        1
1        5        12       1.2.5.12 1        1

ご覧のとおり、ターゲット11は5のソースの最初の行です。これはどのように発生しましたか？DISTINCTER列では、MAXでORDER BYを使用します。これは、MAXウィンドウでのORDERが意味をなすまれなケースの1つです。結果を並べ替えるために使用します。クエリの最後にORDERBYを配置しようとしましたが、データベースがCTEでORDERを使用できないと文句を言います。CTEは、ORDERBY句の配置を禁止しています。しかし、ご存知のとおり、sorting onOVER()句に影響を与えることができるため、結果はこのように並べ替えられます。上記の結果では、5のソースのうち、11がターゲットであるため、11は10と12の前にソートされます。したがって、DISTINCT ON（Postgresql固有の機能）を実行すると、Postgresはその最初の行をピックアップするため、ターゲット11をピックアップします。

これは、Postgresql固有ですが、最適化された最後のクエリです。

with recursive -- this denotes that at least one CTE is using recursion    
inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo) as (
  select 
      distinct on (
        bp.node_x,            
        coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)
      )          
      bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter
  union
  select       
      distinct on (
        bp.node_x,            
        coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)
      )    
      tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
) select tp.* from traverse_path tp

ソース=1、ターゲット=11の出力http://www.sqlfiddle.com/#!1/db290/55

FILTER   SOURCE   TARGET   PATH     BINGO
1        1        2        1.2      0
1        1        3        1.3      0
1        2        4        1.2.4    0
1        2        5        1.2.5    0
1        3        6        1.3.6    0
1        3        7        1.3.7    0
1        4        8        1.2.4.8  0
1        4        9        1.2.4.9  0
1        5        11       1.2.5.11 1

ソース=1、ターゲット= 4の出力：http ：//www.sqlfiddle.com/#！1 / db290 / 53

FILTER  SOURCE  TARGET  PATH    BINGO
1       1       2       1.2     0
1       1       3       1.3     0
1       2       4       1.2.4   1
1       3       6       1.3.6   0
1       3       7       1.3.7   0

ソース=2、ターゲット= 5の出力：http ：//www.sqlfiddle.com/#！1 / db290 / 54

FILTER  SOURCE  TARGET  PATH    BINGO
2       2       5       2.5     1

BINGOアプローチの代わりに別のアプローチ。トラベサルを続行するためのロジックとしてcontinue_searchを使用します。そして、すべての集計関数を使用して、グラフのトラバースを続行する必要があるかどうかを判断します。

仕組みは次のとおりです：http ：//www.sqlfiddle.com/#！1 / db290 / 84

最終クエリ：http ：//www.sqlfiddle.com/#！1 / db290 / 85

興味深いことに、すべてが非常に英語に似ています。

これと対比してください：

,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool

EVERYを使用して：

,every(bp.node_y <> i.d_target) over(partition by bp.node_x)

どちらが読みやすいですか？

DISTINCTONを容易にするためにEVERYを使用する原理を示す出力は次のとおりです。

ソース=1、ターゲット=5。同じソースに属する他の番号の前に5が最初にソートされることに注意してください。これは、後でDISTINCTONによって選択されます。

FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH     DISTINCTER
1         1         2         1.2       1                   2
1         1         3         1.3       1                   3
1         2         5         1.2.5     0                   0
1         2         4         1.2.4     0                   0
1         3         6         1.3.6     1                   6
1         3         7         1.3.7     1                   7

その原則を実行するクエリは次のとおりです。

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search, distincter) as
(
  select                      
      bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
      ,coalesce(        
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                           
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select 
      tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)    
      ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
     and tp.continue_search
)    
select tp.*
from traverse_path tp

最終クエリ：

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search) as
(
  select                      
      distinct on
      (
        bp.node_x
        ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
      )    
      bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select   
      distinct on
      (
        bp.node_x
        ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
      )  
      tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
     and tp.continue_search
)    
select tp.*
from traverse_path tp

出力：

FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH
1         1         2         1.2       1
1         1         3         1.3       1
1         2         5         1.2.5     0
1         3         6         1.3.6     1
1         3         7         1.3.7     1

score 1 · Accepted Answer

テスト用の一時テーブル:

CREATE TEMP TABLE best_path (node_x int, node_y int, strength int);
INSERT INTO best_path  VALUES
 (1, 2,  1)
,(1, 3,  1)
,(2, 4,  1)
,(2, 5,  1)
,(3, 6,  1)
,(3, 7,  1)
,(4, 8,  1)
,(4, 9,  1)
,(5, 10, 1)
,(5, 11, 1);

クエリ:
_{2 - 5 に関するコメントに対応するように変更されました}

WITH RECURSIVE t AS (    -- for readability and convenience:
    SELECT 1 AS node_s   -- source_node
         , 4 AS node_t   -- target_node
    )
    , x AS (
    SELECT node_x
    FROM   t, best_path
    WHERE  node_y = node_t
    )
    , a AS  ( 
    SELECT b.node_x
         , b.node_y
         , b.strength AS weight
         , b.node_x || '.' || b.node_y || '.' AS path
    FROM   t, best_path b
    LEFT   JOIN x ON x.node_x = b.node_x
    WHERE  b.node_x = node_s
    AND    (x.node_x IS NULL                    -- no point with edge to target
            OR b.node_y = node_t)               -- except with target_node

    UNION ALL
    SELECT a.node_x
         , b.node_y
         , least(a.weight, b.strength)
         , a.path || b.node_y || '.' AS path
    FROM   t, a
    JOIN   best_path AS b ON b.node_x = a.node_y
    LEFT   JOIN x ON x.node_x = b.node_x
    WHERE  a.node_y <> node_t                   -- arrived at target
    AND    a.path !~~ ('%' || b.node_y || '.%') -- not visited yet
    AND    (x.node_x IS NULL                    -- no point with edge to target
            OR b.node_y = node_t)               -- except with target_node
    )
TABLE a;

要求された結果を正確に生成します。

1 ステップでターゲットに到達できるため、最初のクエリにもブレーク条件を追加しました。

これは、以前の同様の質問に対する私の回答とよく似ています。すべての説明とリンクが適用されます。ここでの主な追加のトリックは、追加の CTE を含めxてノードを収集することです...

(a) ターゲットへの直接エッジを持つ。

再帰的 CTE のブレーク条件で繰り返し使用するため。キーワードに関係なく、再帰的な CTE の上に (非再帰的な) CTE を追加できることはあまり知られていませんRECURSIVE。

score 1 · Accepted Answer

OPの質問を読み直すと、次の解決策が思いつきます。

ソース

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 4 as d_target )
,traverse_path(filter, source, target, path, bingo) as
(
  select bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y,

    max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  

  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter

  union

  select tp.filter, bp.node_x, bp.node_y, path || '.' || node_y,   

      max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool   

  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
)    
select tp.*
from traverse_path tp cross join inputs i    
-- if Postgresql has Oracle KEEP windowing, 
-- we don't need to use WHERE clause here     
where 
  not tp.bingo   
  or
  (
    tp.bingo and tp.target = d_target
  )

上記の WHERE 句は次のように短縮できます。

  WHERE not bingo or target = 4

出力：

FILTER  SOURCE  TARGET  PATH    BINGO
1       1       2       1.2     0
1       1       3       1.3     0
1       2       4       1.2.4   1
1       3       6       1.3.6   0
1       3       7       1.3.7   0

ライブテスト: http://www.sqlfiddle.com/#!1/cdde6/55

ソース = 2、ターゲット = 5 の結果は次のとおりです。

FILTER  SOURCE  TARGET  PATH    BINGO
2       2       5       2.5     1

データ：

CREATE TABLE best_path
    (node_x int, node_y int, strength int);

INSERT INTO best_path
    (node_x, node_y, strength)
VALUES
    (1, 2, 1),
    (1, 3, 1),
    (2, 4, 1),
    (2, 5, 1),
    (3, 6, 1),
    (3, 7, 1),
    (4, 8, 1),
    (4, 9, 1),
    (5, 10, 1),
    (5, 11, 1);

postgresql - PostgreSQLは再帰CTEから関数にデータを渡します

4 に答える 4

Related

Reference