13

簡単に言えば、ツリーのさらに上の親が所有するツリーのルートのパーセンテージを計算しようとしています。SQL だけでこれを行うにはどうすればよいですか?

これが私の(サンプル)スキーマです。階層自体は非常に単純ですが、追加の があることに注意してくださいholding_id。これは、単一の親が子のさまざまな部分を「所有」できることを意味します。

create table hierarchy_test ( 
       id number -- "root" ID
     , parent_id number -- Parent of ID
     , holding_id number -- The ID can be split into multiple parts
     , percent_owned number (3, 2)
     , primary key (id, parent_id, holding_id) 
        );

そしていくつかのサンプルデータ:

insert all 
 into hierarchy_test values (1, 2, 1, 1) 
 into hierarchy_test values (2, 3, 1, 0.25)
 into hierarchy_test values (2, 4, 1, 0.25)
 into hierarchy_test values (2, 5, 1, 0.1)
 into hierarchy_test values (2, 4, 2, 0.4)
 into hierarchy_test values (4, 5, 1, 1)
 into hierarchy_test values (5, 6, 1, 0.3)
 into hierarchy_test values (5, 7, 1, 0.2)
 into hierarchy_test values (5, 8, 1, 0.5)
select * from dual;

SQL フィドル

次のクエリは、実行したい計算を返します。SYS_CONNECT_BY_PATH の性質上、私の知る限り、計算自体を実行することはできません。

 select a.*, level as lvl
      , '1' || sys_connect_by_path(percent_owned, ' * ') as calc
   from hierarchy_test a
  start with id = 1
connect by nocycle prior parent_id = id

この例ではありませんが、データには周期的な関係があります。

現時点では、非常に単純な関数を使用して、列の文字列calcを数値に変換します

create or replace function some_sum ( P_Sum in varchar2 ) return number is
   l_result number;
begin  
   execute immediate 'select ' || P_Sum || ' from dual'
     into l_result;
     
   return l_result;   
end;
/

これはばかげた方法のように思われ、動的 SQL 1の解析にかかる追加の時間を避けたいと思います。

理論的には、MODEL 句を使用してこれを計算できるはずです。私の問題は、ツリーの非一意性が原因です。これを行うために MODEL 句を使用する私の試みの1つは次のとおりです。

select *
  from ( select a.*, level as lvl
              , '1' || sys_connect_by_path(percent_owned, ' * ') as calc
           from hierarchy_test a
          start with id = 1
        connect by nocycle prior parent_id = id
                 )
 model
 dimension by (lvl ll, id ii)
 measures (percent_owned, parent_id )
 rules upsert all ( 
   percent_owned[any, any]
   order by ll, ii  = percent_owned[cv(ll), cv(ii)] * nvl( percent_owned[cv(ll) - 1, parent_id[cv(ll), cv(ii)]], 1)
               )

当然のことながら、これは次の場合に失敗します。

ORA-32638: MODELディメンションのアドレス指定が一意ではありません

UNIQUE SINGLE REFERENCEの使用は、同様の理由で失敗します。つまり、ORDER BY 句が一意ではないということです。

tl;dr

SQL のみを使用して、親が所有するツリーのルートのパーセンテージを計算する簡単な方法はありますか? MODEL で正しい軌道に乗っている場合、どこで間違っているのでしょうか?

1. PL/SQL SQL コンテキスト スイッチも避けたいと思います。これはほんのわずかな時間であることはわかっていますが、1 日に数分追加せずにすばやく行うのは非常に困難です。

4

2 に答える 2

6

11gでは、おそらく次のようなものです-

SELECT a.*, LEVEL AS lvl
      ,XMLQuery( substr( sys_connect_by_path( percent_owned, '*' ), 2 ) RETURNING CONTENT).getnumberval() AS calc
   FROM hierarchy_test a
  START WITH id = 1
CONNECT BY nocycle PRIOR parent_id = id;

SQLフィドル

または、あなたの'1'||トリックに従って-

SELECT a.*, LEVEL AS lvl
      , XMLQuery( ('1'|| sys_connect_by_path( percent_owned, '*' )) RETURNING CONTENT).getnumberval() AS calc
   FROM hierarchy_test a
  START WITH id = 1
CONNECT BY nocycle PRIOR parent_id = id;

残念ながら 10g では、XMLQuery関数を受け入れることができず、評価には常に文字列リテラルが必要です。

select XMLQuery('1*0.25' RETURNING CONTENT).getnumberval() as val 
  from dual;

動作して戻ります0.25が、

select XMLQuery(substr('*1*0.25',2) RETURNING CONTENT).getnumberval() as val
   from dual;

を与えORA-19102: XQuery string literal expectedます。

ツリーのレベル数が増えると、内部ツリーの作成XMLQuery自体のオーバーヘッドが増えるため、クエリが遅くなる可能性があります。結果を達成するための最も最適な方法は、10g と 11g の両方で動作する PL/SQL 関数です。

于 2012-12-10T20:41:07.833 に答える
4

This deserves an answer; though beware we're operating under a few special circumstances.

The first thing to mention is that the best possible way to do this is with recursive sub-query factoring/a recursive CTE as per Daniel Hilgarth and jonearles in the comments:

with temp (id, parent_id, percent_owned, calc) as (
  select a.id, a.parent_id, a.percent_owned, percent_owned as calc
    from hierarchy_test a
   where id = 1
   union all
  select a.id, a.parent_id, a.percent_owned, a.percent_owned * t.calc as calc
    from temp t
    join hierarchy_test a
      on t.parent_id = a.id
         )
select * 
  from temp

Their SQL Fiddle..

Unfortunately the complexity of the query and the size of the data we#re working with is such that this turned out to be impossible. There was no way to do it without full-scanning some overly large tables each time.

This doesn't necessarily mean that we're back to CONNECT BY. There is the opportunity to calculate hierarchies in bulk. Unfortunately, this turned out to be impossible as well; an hour in the database crashed. Thrice. We were using up almost 100GB of UNDO and the server just couldn't cope.

These are the special circumstances; we have to calculate hundreds of thousands of hierarchies in a few hours, at most. The average one is about 1.5 levels deep with maybe 5-10 leaves and 8-12 nodes in total. However, the outliers have 90k nodes, 27 levels and multiple cyclic relationships. The outliers aren't anywhere near rare enough.

So, CONNECT BY. Benchmarking Annjawn's solution against the PL/SQL EXECUTE IMMEDIATE suggested in the question indicated that for an above average tree XMLQuery() was up to 4 times slower. Excellent, had the answer; no other option; leave it at that.

Not.

Because we're calculating so many hierarchies with so many nodes we ended up getting overly long waits from library cache pin locks caused by the constant hard-parsing of hundreds of thousands of mathematical functions in EXECUTE IMMEDIATE.

No obvious response to that, so going back too Annjawn's solution it ends up 3 times quicker! The library cache pin locks completely disappear and we're back on the straight and narrow.

Not.

Unfortunately, there seems to be an Oracle bug in 11.2 that appears when you combine CONNECT BY, XMLQuery() and DBMS_SCHEDULER. In certain circumstances, normally in the larger hierarchies, it leaks huge amounts of memory. Lost the database and the server finding that one out. A report's been raised with Oracle and we're testing in 12c; though the memory leaks exhibit less, they still appear so 12c is out.

The solution? Wrap the XMLQuery() in a PL/SQL function. Memory leak solved, unfortunately this caused a large amount of contention for this function, and we started getting multi-hour Library cache: mutex x waits.. Querying x$kglob confirmed it was XMLTYPE that was getting hammered.

Andrey Nikolaev recommends either altering the system; rather not do that when everything else works fine, or using the DBMS_POOL.MARKHOT procedure to tell Oracle that you'll be accessing this object a lot. To the casual eye, this may have solved the issue, however, after about 10 minutes, and going through what seemed to be every lock that Oracle has, we ended up with 5 processes contending for CPU. Apparently there wasn't enough (54GB and 24 cores on the test box)...

We then started getting Cursor pin: s waits. Burleson recommends more hidden parameter finangling, Jonathan Lewis suggests that it's down to SGA resizing. As the DB was using automated SGA sizing we tried gradually increasing the shared pool, up to 30GB and only got back out old friend the Library cache: mutex x wait.

So, what's the solution? Who knows is the honest answer, but a Java stored procedure is working brilliantly so far, no memory leaks, no waits and significantly quicker than everything else.

I'm sure there's more out there... and I'd really like to get the MODEL clause to work if anyone has any ideas?

P.S. I can't claim credit for all of this; it's the work of about 3 people to get us to this stage...

于 2013-07-29T20:44:54.700 に答える