sas - SAS のハッシュオブジェクト - ハッシュオブジェクトを使用して以下の 2 つのテーブルをマージすることは可能ですか?

Question

以下の SAS 9.1 の例で、ハッシュオブジェクトを使用して以下の 2 つのテーブルをマージすることは可能ですか? 主な問題は、Value変数 w Result データセットの作成にあるようです。問題は、各支払いが複数の料金を支払う可能性があることです。また、1 つの料金に対して複数の支払いが必要になる場合があり、このようなケースが同時に発生する可能性があります。問題には一般的な名前がありますか? http://support.sas.com/rnd/base/datastep/dot/hash-getting-started.pdf

data TABLE1;
input ID_client   ID_commodity    Charge;
datalines;
1             111111111      100
1             222222222      200
2             333333333      300    
2             444444444      400
2             555555555      500
;;;;
run;


data TABLE2;
input ID_client_hash     ID_ofpayment  paymentValue;
datalines;
1             11              50    
1             12              50    
1             13              100   
1             14              50    
1             15              50    
2             21              500   
2             22              200   
2             23              100   
2             24              200   
2             25              200
;;;;
run;

data OUT;
input ID_client     ID_commodity    ID_ofpayment    value;
datalines;
1               111111111             11    50
1               111111111             12    50
1               222222222             13    100
1               222222222             14    50
1               222222222             15    50
2               333333333             21    300
2               444444444             21    200
2               444444444             22    200
2               555555555             23    100
2               555555555             24    200
2               555555555             25    200

score 1 · Accepted Answer

これはあなたのために働くかもしれません-私は9.2を持っていて、9.2はいくつかの重要なハッシュの改善を持っています、しかし私は自分自身を振る舞い、9.1にあったものだけを使ったと思います。Paul Dorfman（つまり、The Hash Guru）がまだ私が信じていることを読んでいるので、これをSAS-L[SASlistserv]にクロスポストしてみてください。

私はあなたが「残り物」を投稿したいと思ったと思いました。希望どおりに機能しない場合は、その部分で作業する必要があります。これはあまりよくテストされていません。サンプルデータセットで機能します。24と25は使用されていないので、欠品と呼びます。

私が行うよりもイテレーションを行うためのよりクリーンな方法があると確信していますが、9.2 +が私が使用するものであり、マルチデータが利用可能であるため、ハッシュイテレータの代わりに常にそれを使用しているのでわかりませんよりクリーンな方法。

data have;
input ID_client   ID_commodity    Charge;
datalines;
1             111111111      100
1             222222222      200
2             333333333      300    
2             444444444      400
2             555555555      50
;;;;
run;


data for_hash;
input ID_client_hash     ID_ofpayment  paymentValue;
datalines;
1             11              50    
1             12              50    
1             13              100   
1             14              50    
1             15              50    
2             21              500   
2             22              200   
2             23              100   
2             24              200   
2             25              200
;;;;
run;

data want;
*Create hash and hash iterator - must use iterator since 9.1 does not allow multidata option;
if _n_ = 1 then do;
  format id_client_hash paymentValue id_ofpayment BEST12.;
  declare hash h(dataset:'for_hash' , ordered: 'a');
  h.defineKey('ID_client_hash','id_ofpayment'); *note I put id_client_hash, renaming the id - want to be able to compare them;
  h.defineData('id_client_hash','id_ofpayment','paymentValue');
  call missing(id_ofpayment,paymentValue, id_client_hash);
  h.defineDone();
  declare hiter hi('h');
end;

do _t = 1 by 1 until (last.id_client);
 set have;
 by id_client;

 *Iterate through the hash and find the first record with the same ID_client;
 do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
   rc = hi.next();
 end;

 *For the current charge record, iterate through the payment (hash) until all paid up.;
 do while (charge gt 0 and rc eq 0 and ID_client=ID_client_hash);
   if charge ge paymentValue then do; *If charge >= paymentvalue, use up the payment value;
     value = paymentValue; *so whole paymentValue is value;
     charge = charge - paymentValue; *charge is decremented by paymentValue;
     output; *output row;
     _id=ID_client_hash; 
     _pay=id_ofpayment;
     rc = hi.next();
    h.remove(key:_id,key:_pay); *remove payment row from hash now that it has been used up;
   end;
   else do; *this is if (remaining) charge is less than payment - we will not use all of the payment;
     value = charge; *value is the remainder of the charge, ie, how much of payment was actually used;
     paymentValue = paymentValue - charge; *paymentValue is the remainder of paymentValue;
     charge= 0; *charge is zero now;
     output; *output a row;
     h.replace(); *replace paymentValue in the hash with the new value of paymentValue, minus charge;
   end;
 end; *end of iteration through hash - at this point, either charge = 0 or we have run out of payments with that ID;
 if charge gt 0 then do;
   value=-1*charge;
   call missing(id_ofpayment);
   output; *output a row for the charge, which is not paid; 
 end;
 if last.id_client then do;  *this is cleanup, checking to see if we have any leftover payments;
   do while (rc=0); *iterate through the remaining hash;
     do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
       rc = hi.next();
     end;
     if rc=0 then do;
         call missing(id_commodity); *to make it clear this is a leftover payment;
         value=paymentValue; *update the value;
         output; *output the payment;
         _id=ID_client_hash;
         _pay=id_ofpayment;
         rc = hi.next();
         if rc= 0 then h.remove(key:_id,key:_pay); *remove the payment just output;
     end;    
   end;
 end;
end;
keep id_client id_ofpayment id_commodity value;
run;

とりわけ、これはそれほど速くはありません-私は無駄になるかもしれない多くの反復を行います。請求レコードに表示されていない支払いID_clientレコードがない場合は、比較的高速になります。実行するレコードはスキップされるため、非常に遅くなる可能性があります。

少なくとも9.2より前では、ハッシュが優れたソリューションであるとは確信していません。キー付きUPDATEの方が優れている可能性があります。UPDATEは、トランザクションデータベース構造用に作成されています。これはこれに近いようです。

sas - SAS のハッシュ オブジェクト - ハッシュ オブジェクトを使用して以下の 2 つのテーブルをマージすることは可能ですか?

1 に答える 1

Related

Reference

sas - SAS のハッシュオブジェクト - ハッシュオブジェクトを使用して以下の 2 つのテーブルをマージすることは可能ですか?