0

このような XML ファイルがあります。

ファイルの各行は、process_infoタグで開始および終了します。ファイルにはこのような行が多数含まれている可能性があり、同様のファイルが多数存在する可能性があります。

<process_info><module>pe_gw_a</module><result code="3">D14 - Calls *144</result><data><input><event_data origin_id="asn1"><CallType>moc</CallType><OtherParty ton="2" npi="1" int_code="55">55009999999991222</OtherParty><OtherLocation>55009999999991222</OtherLocation><IntCodeCallingPartyNumber>55</IntCodeCallingPartyNumber><IntCodeServedParty>55</IntCodeServedParty><TicketType>0</TicketType><original_cdr FILENAME="TIM+ZGNA01-99703-1211241250-D.TTF"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TIM+ZGNA01-99703-1211241250-D.TTF;TICKET=6</Report><CDRType>1</CDRType><networkCallReference>29722352746</networkCallReference><switchIdentity>7274</switchIdentity><originatedCode>1</originatedCode><subscriptionType>1</subscriptionType><speechCoderPreferenceList>2010005030</speechCoderPreferenceList><radioChannelProperty>30</radioChannelProperty><incomingAssignedRoute>BGNA05N</incomingAssignedRoute><translatedNumber>12#222</translatedNumber><miscellaneousInformation>0</miscellaneousInformation><incomingRoute>BGNA05N</incomingRoute><outgoingRoute>ZBSA1CO</outgoingRoute><mSCIdentification>11556281138800</mSCIdentification><exchangeIdentity>ZGNA01</exchangeIdentity><tariffClass>0010</tariffClass><chargingCase>1</chargingCase><originForCharging>62</originForCharging><chargedParty>00</chargedParty><timeFromRegisterSeizureToStartOfCharging>0</timeFromRegisterSeizureToStartOfCharging><interruptionTime>0</interruptionTime><chargeableDuration>5</chargeableDuration><timeForStopOfCharge>194949</timeForStopOfCharge><timeForStartOfCharge>194944</timeForStartOfCharge><dateForStartOfCharge>20121123</dateForStartOfCharge><disconnectingParty>00</disconnectingParty><calledPartyNumber>12#222</calledPartyNumber><callingSubscriberIMEI>355921042890190</callingSubscriberIMEI><callingSubscriberIMSI>724046008971498</callingSubscriberIMSI><callingPartyNumber>11556281020633</callingPartyNumber><typeOfCallingSubscriber>10</typeOfCallingSubscriber><recordSequenceNumber>2987070</recordSequenceNumber><callIdentificationNumber>1362570</callIdentificationNumber><tAC>721421</tAC><internalCauseAndLoc>3</internalCauseAndLoc><eosInfo>00</eosInfo><callPosition>30</callPosition><firstRadioChannelUsed>00</firstRadioChannelUsed><gSMTeleServiceCode>17</gSMTeleServiceCode><cellIDForLastCellCalling>724046213C64F8A</cellIDForLastCellCalling><cellIDFor1stCellCalling>7240400C64F8A</cellIDFor1stCellCalling><timeForTCSeizureCalling>194943</timeForTCSeizureCalling></original_cdr><TypeOfCommunication>voi</TypeOfCommunication><MSC_ID>11556281138800</MSC_ID><CallStart>20121123194944</CallStart><CallDuration>5</CallDuration><CallDuration_30_inf>30</CallDuration_30_inf><CallDuration_60_inf>60</CallDuration_60_inf><CallDuration_MC>30</CallDuration_MC><CallDuration_30_60>30</CallDuration_30_60><ServedParty ton="1" npi="1" int_code="55">556281020633</ServedParty><ServedLocation>7240462</ServedLocation><ScenarioName>NA</ScenarioName><ServedZone>ZO00031</ServedZone><OtherZone>ZP30158</OtherZone></event_data><dupChk></dupChk><account map_type="2">556281020633</account><other_account map_type="2">55#222</other_account><operation alternate_rating="1" type="charge"/><transaction id="0000000050B0DE9C-0000DB98-00002876-62F2B0C6"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TIM+ZGNA01-99703-1211241250-D.TTF;TICKET=6</Report></transaction><start>20121123194944</start></input></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TIM+ZGNA01-99703-1211241250-D.TTF</filename><index_into_file>6</index_into_file></process_info>
<process_info><module>pe_gw_a</module><result code="707">Error on CDR level; File processing continued.</result><data><file_info result="partial">CDR-Counter: (IN=16, BAD=0): (NORM_ERR=0 DUP_ERR=0, RAL_ERR=0), DUPLICATE=1, DISCARDED=0, OK=15</file_info></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF</filename></process_info>
<process_info><module>pe_gw_a</module><result code="705">Duplicate CDR</result><data><input><event_data origin_id="asn1"><CallType>mosms</CallType><OtherParty ton="1" npi="1" int_code="55">556291860209</OtherParty><OtherLocation>55006234191860209</OtherLocation><IntCodeCallingPartyNumber>55</IntCodeCallingPartyNumber><IntCodeOtherParty>55</IntCodeOtherParty><IntCodeServedParty>55</IntCodeServedParty><TicketType>0</TicketType><original_cdr FILENAME="TKM_SMS+STKM01-28129-1211241251-A.TTF"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF;TICKET=15</Report><CDRType>5</CDRType><serviceCentreAddress>11556291860209</serviceCentreAddress><miscellaneousInformation>41</miscellaneousInformation><gSMTeleServiceCode>34</gSMTeleServiceCode><cellIDFor1stCellCalling>7240462003E0000</cellIDFor1stCellCalling><mSCIdentification>11551189848200</mSCIdentification><exchangeIdentity>STKM01</exchangeIdentity><originForCharging>62</originForCharging><chargedParty>00</chargedParty><timeForStartOfCharge>124619</timeForStartOfCharge><dateForStartOfCharge>20121124</dateForStartOfCharge><callingSubscriberIMSI>724046012529641</callingSubscriberIMSI><callingPartyNumber>11556282361092</callingPartyNumber></original_cdr><TypeOfCommunication>sms</TypeOfCommunication><CallDuration>0.9</CallDuration><CallStart>20121124124619</CallStart><MSC_ID>11551189848200</MSC_ID><ServedParty int_code="55" ton="1" npi="1">556282361092</ServedParty><ServedLocation>7240462</ServedLocation><ScenarioName>Sms_SMS___TIM_TIM</ScenarioName><ServedZone>ZO00031</ServedZone><OtherZone>ZP37744</OtherZone></event_data><dupChk></dupChk><account map_type="2">556282361092</account><other_account map_type="2">556291860209</other_account><operation alternate_rating="1" type="charge"/><transaction id="0000000050B0DE9D-0000DBC9-00002876-62F2B0C6"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF;TICKET=15</Report></transaction><start>20121124124619</start></input></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF</filename><index_into_file>15</index_into_file></process_info>
<process_info><module>pe_gw_a</module><result code="3">D14 - Calls *144</result><data><input><event_data origin_id="asn1"><CallType>moc</CallType><OtherParty ton="2" npi="1" int_code="55">55009999999991144</OtherParty><OtherLocation>55009999999991144</OtherLocation><IntCodeCallingPartyNumber>55</IntCodeCallingPartyNumber><IntCodeServedParty>55</IntCodeServedParty><TicketType>0</TicketType><original_cdr FILENAME="TMX+ZBHE01-95068-1211241251-AG.TTF"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TMX+ZBHE01-95068-1211241251-AG.TTF;TICKET=6</Report><CDRType>1</CDRType><networkCallReference>447382755812</networkCallReference><switchIdentity>6628</switchIdentity><originatedCode>1</originatedCode><subscriptionType>21</subscriptionType><speechCoderPreferenceList>2010005030</speechCoderPreferenceList><radioChannelProperty>30</radioChannelProperty><incomingAssignedRoute>BMCL01B</incomingAssignedRoute><translatedNumber>12#144</translatedNumber><originatingLocationNumber>11553191938800</originatingLocationNumber><miscellaneousInformation>0</miscellaneousInformation><incomingRoute>BMCL01B</incomingRoute><outgoingRoute>XMCL1AO</outgoingRoute><mSCIdentification>11553191938800</mSCIdentification><exchangeIdentity>ZBHE01</exchangeIdentity><tariffClass>0010</tariffClass><chargingCase>1</chargingCase><originForCharging>38</originForCharging><chargedParty>00</chargedParty><timeFromRegisterSeizureToStartOfCharging>4</timeFromRegisterSeizureToStartOfCharging><interruptionTime>0</interruptionTime><chargeableDuration>426</chargeableDuration><timeForStopOfCharge>182128</timeForStopOfCharge><timeForStartOfCharge>181421</timeForStartOfCharge><dateForStartOfCharge>20121123</dateForStartOfCharge><disconnectingParty>00</disconnectingParty><calledPartyNumber>12#144</calledPartyNumber><callingSubscriberIMEI>358855043501160</callingSubscriberIMEI><callingSubscriberIMSI>724023016557605</callingSubscriberIMSI><callingPartyNumber>11553891610047</callingPartyNumber><typeOfCallingSubscriber>10</typeOfCallingSubscriber><recordSequenceNumber>1489944</recordSequenceNumber><callIdentificationNumber>11705419</callIdentificationNumber><tAC>721421</tAC><internalCauseAndLoc>3</internalCauseAndLoc><eosInfo>00</eosInfo><callPosition>30</callPosition><firstRadioChannelUsed>00</firstRadioChannelUsed><gSMTeleServiceCode>17</gSMTeleServiceCode><cellIDForLastCellCalling>7240238279ADEE5</cellIDForLastCellCalling><cellIDFor1stCellCalling>72402009ADEE5</cellIDFor1stCellCalling><timeForTCSeizureCalling>181417</timeForTCSeizureCalling></original_cdr><TypeOfCommunication>voi</TypeOfCommunication><MSC_ID>11553191938800</MSC_ID><CallStart>20121123181421</CallStart><CallDuration>426</CallDuration><CallDuration_30_inf>426</CallDuration_30_inf><CallDuration_60_inf>426</CallDuration_60_inf><CallDuration_MC>426</CallDuration_MC><CallDuration_30_60>60</CallDuration_30_60><ServedParty ton="1" npi="1" int_code="55">553891610047</ServedParty><ServedLocation>7240238</ServedLocation><ScenarioName>NA</ScenarioName><ServedZone>ZO00461</ServedZone><OtherZone>ZP30411</OtherZone></event_data><dupChk></dupChk><account map_type="2">553891610047</account><other_account map_type="2">55#144</other_account><operation alternate_rating="1" type="charge"/><transaction id="0000000050B0DEA8-0000DBE8-00002876-62F2B0C6"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TMX+ZBHE01-95068-1211241251-AG.TTF;TICKET=6</Report></transaction><start>20121123181421</start></input></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TMX+ZBHE01-95068-1211241251-AG.TTF</filename><index_into_file>6</index_into_file></process_info>

要素のさまざまな値をすべてカウントしたいresultので、出力は次のようになります。

"D14 - Calls *144" count 2
"Duplicate CDR" count 1
"CDR レベルのエラー。ファイル処理は続行されました。" カウント1

どうすればいいですか?XML:Twigまたはを使用していると思いXML:Parserますが、ファイル内に多くの開始/終了タグがあるため、解決策を見つけることができません。

4

5 に答える 5

1

これは、任意の Perl XML モジュールで便利に実行できますが、あなたが言及したのでXML::Twig、それがこのソリューションで使用したものです。

類似の XML ファイルが多数存在する可能性があるとのことですが、それらをどのように識別するかについては言及されていません。そのため、私にできることは、1 つのファイルに対する解決策を提供することだけであり、ここから推測していただければ幸いです。

プログラムは、ファイルを 1 行ずつ読み取り、各行を個別の XML ドキュメントとして解析し、resultタグを持つルート ドキュメントの最初の子要素のテキスト値を抽出します。このテキスト値は、それぞれの異なる結果の発生回数を追跡するためのハッシュ キーとして使用されます。

use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig->new;

my %results;

open my $fh, '<', 'my.xml' or die $!;

while (<$fh>) {
  $twig->parse($_);
  my $result = $twig->root->first_child('result');
  if ($result) {
    $result = $result->trimmed_text;
    $results{$result}++;
  }
}

for (sort keys %results) {
  my $n = $results{$_};
  printf qq("%s" count %d\n), $_, $n;
}

出力

"D14 - Calls *144" count 2
"Duplicate CDR" count 1
"Error on CDR level; File processing continued." count 1
于 2012-11-24T18:12:57.830 に答える
1

これらをカウントするには、Mojolicious スイートの優れた DOM パーサーMojo::DOMを使用できます。それはかなり簡単です。ハッシュ ( %count) を使用して、結果が見つかった頻度を追跡します。これは、この種の問題に対する典型的な Perl のイディオムです。

#!/usr/bin/env perl

use strict;
use warnings;
use feature 'say';
use Mojo::DOM;

# read all input lines at once
my $dom = Mojo::DOM->new(do {local $/; <DATA>});

# prepare count hash
my %count = ();

# iterate result elements
$dom->find('result')->each(sub {
    my $element = shift;
    $count{$element->text}++;
});

# output
say "$_: $count{$_}" for keys %count;

__DATA__
<process_info><module>pe_gw_a</module><result code="3">D14 - Calls *144</result><data><input><event_data origin_id="asn1"><CallType>moc</CallType><OtherParty ton="2" npi="1" int_code="55">55009999999991222</OtherParty><OtherLocation>55009999999991222</OtherLocation><IntCodeCallingPartyNumber>55</IntCodeCallingPartyNumber><IntCodeServedParty>55</IntCodeServedParty><TicketType>0</TicketType><original_cdr FILENAME="TIM+ZGNA01-99703-1211241250-D.TTF"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TIM+ZGNA01-99703-1211241250-D.TTF;TICKET=6</Report><CDRType>1</CDRType><networkCallReference>29722352746</networkCallReference><switchIdentity>7274</switchIdentity><originatedCode>1</originatedCode><subscriptionType>1</subscriptionType><speechCoderPreferenceList>2010005030</speechCoderPreferenceList><radioChannelProperty>30</radioChannelProperty><incomingAssignedRoute>BGNA05N</incomingAssignedRoute><translatedNumber>12#222</translatedNumber><miscellaneousInformation>0</miscellaneousInformation><incomingRoute>BGNA05N</incomingRoute><outgoingRoute>ZBSA1CO</outgoingRoute><mSCIdentification>11556281138800</mSCIdentification><exchangeIdentity>ZGNA01</exchangeIdentity><tariffClass>0010</tariffClass><chargingCase>1</chargingCase><originForCharging>62</originForCharging><chargedParty>00</chargedParty><timeFromRegisterSeizureToStartOfCharging>0</timeFromRegisterSeizureToStartOfCharging><interruptionTime>0</interruptionTime><chargeableDuration>5</chargeableDuration><timeForStopOfCharge>194949</timeForStopOfCharge><timeForStartOfCharge>194944</timeForStartOfCharge><dateForStartOfCharge>20121123</dateForStartOfCharge><disconnectingParty>00</disconnectingParty><calledPartyNumber>12#222</calledPartyNumber><callingSubscriberIMEI>355921042890190</callingSubscriberIMEI><callingSubscriberIMSI>724046008971498</callingSubscriberIMSI><callingPartyNumber>11556281020633</callingPartyNumber><typeOfCallingSubscriber>10</typeOfCallingSubscriber><recordSequenceNumber>2987070</recordSequenceNumber><callIdentificationNumber>1362570</callIdentificationNumber><tAC>721421</tAC><internalCauseAndLoc>3</internalCauseAndLoc><eosInfo>00</eosInfo><callPosition>30</callPosition><firstRadioChannelUsed>00</firstRadioChannelUsed><gSMTeleServiceCode>17</gSMTeleServiceCode><cellIDForLastCellCalling>724046213C64F8A</cellIDForLastCellCalling><cellIDFor1stCellCalling>7240400C64F8A</cellIDFor1stCellCalling><timeForTCSeizureCalling>194943</timeForTCSeizureCalling></original_cdr><TypeOfCommunication>voi</TypeOfCommunication><MSC_ID>11556281138800</MSC_ID><CallStart>20121123194944</CallStart><CallDuration>5</CallDuration><CallDuration_30_inf>30</CallDuration_30_inf><CallDuration_60_inf>60</CallDuration_60_inf><CallDuration_MC>30</CallDuration_MC><CallDuration_30_60>30</CallDuration_30_60><ServedParty ton="1" npi="1" int_code="55">556281020633</ServedParty><ServedLocation>7240462</ServedLocation><ScenarioName>NA</ScenarioName><ServedZone>ZO00031</ServedZone><OtherZone>ZP30158</OtherZone></event_data><dupChk></dupChk><account map_type="2">556281020633</account><other_account map_type="2">55#222</other_account><operation alternate_rating="1" type="charge"/><transaction id="0000000050B0DE9C-0000DB98-00002876-62F2B0C6"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TIM+ZGNA01-99703-1211241250-D.TTF;TICKET=6</Report></transaction><start>20121123194944</start></input></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TIM+ZGNA01-99703-1211241250-D.TTF</filename><index_into_file>6</index_into_file></process_info>
<process_info><module>pe_gw_a</module><result code="707">Error on CDR level; File processing continued.</result><data><file_info result="partial">CDR-Counter: (IN=16, BAD=0): (NORM_ERR=0 DUP_ERR=0, RAL_ERR=0), DUPLICATE=1, DISCARDED=0, OK=15</file_info></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF</filename></process_info>
<process_info><module>pe_gw_a</module><result code="705">Duplicate CDR</result><data><input><event_data origin_id="asn1"><CallType>mosms</CallType><OtherParty ton="1" npi="1" int_code="55">556291860209</OtherParty><OtherLocation>55006234191860209</OtherLocation><IntCodeCallingPartyNumber>55</IntCodeCallingPartyNumber><IntCodeOtherParty>55</IntCodeOtherParty><IntCodeServedParty>55</IntCodeServedParty><TicketType>0</TicketType><original_cdr FILENAME="TKM_SMS+STKM01-28129-1211241251-A.TTF"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF;TICKET=15</Report><CDRType>5</CDRType><serviceCentreAddress>11556291860209</serviceCentreAddress><miscellaneousInformation>41</miscellaneousInformation><gSMTeleServiceCode>34</gSMTeleServiceCode><cellIDFor1stCellCalling>7240462003E0000</cellIDFor1stCellCalling><mSCIdentification>11551189848200</mSCIdentification><exchangeIdentity>STKM01</exchangeIdentity><originForCharging>62</originForCharging><chargedParty>00</chargedParty><timeForStartOfCharge>124619</timeForStartOfCharge><dateForStartOfCharge>20121124</dateForStartOfCharge><callingSubscriberIMSI>724046012529641</callingSubscriberIMSI><callingPartyNumber>11556282361092</callingPartyNumber></original_cdr><TypeOfCommunication>sms</TypeOfCommunication><CallDuration>0.9</CallDuration><CallStart>20121124124619</CallStart><MSC_ID>11551189848200</MSC_ID><ServedParty int_code="55" ton="1" npi="1">556282361092</ServedParty><ServedLocation>7240462</ServedLocation><ScenarioName>Sms_SMS___TIM_TIM</ScenarioName><ServedZone>ZO00031</ServedZone><OtherZone>ZP37744</OtherZone></event_data><dupChk></dupChk><account map_type="2">556282361092</account><other_account map_type="2">556291860209</other_account><operation alternate_rating="1" type="charge"/><transaction id="0000000050B0DE9D-0000DBC9-00002876-62F2B0C6"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF;TICKET=15</Report></transaction><start>20121124124619</start></input></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF</filename><index_into_file>15</index_into_file></process_info>
<process_info><module>pe_gw_a</module><result code="3">D14 - Calls *144</result><data><input><event_data origin_id="asn1"><CallType>moc</CallType><OtherParty ton="2" npi="1" int_code="55">55009999999991144</OtherParty><OtherLocation>55009999999991144</OtherLocation><IntCodeCallingPartyNumber>55</IntCodeCallingPartyNumber><IntCodeServedParty>55</IntCodeServedParty><TicketType>0</TicketType><original_cdr FILENAME="TMX+ZBHE01-95068-1211241251-AG.TTF"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TMX+ZBHE01-95068-1211241251-AG.TTF;TICKET=6</Report><CDRType>1</CDRType><networkCallReference>447382755812</networkCallReference><switchIdentity>6628</switchIdentity><originatedCode>1</originatedCode><subscriptionType>21</subscriptionType><speechCoderPreferenceList>2010005030</speechCoderPreferenceList><radioChannelProperty>30</radioChannelProperty><incomingAssignedRoute>BMCL01B</incomingAssignedRoute><translatedNumber>12#144</translatedNumber><originatingLocationNumber>11553191938800</originatingLocationNumber><miscellaneousInformation>0</miscellaneousInformation><incomingRoute>BMCL01B</incomingRoute><outgoingRoute>XMCL1AO</outgoingRoute><mSCIdentification>11553191938800</mSCIdentification><exchangeIdentity>ZBHE01</exchangeIdentity><tariffClass>0010</tariffClass><chargingCase>1</chargingCase><originForCharging>38</originForCharging><chargedParty>00</chargedParty><timeFromRegisterSeizureToStartOfCharging>4</timeFromRegisterSeizureToStartOfCharging><interruptionTime>0</interruptionTime><chargeableDuration>426</chargeableDuration><timeForStopOfCharge>182128</timeForStopOfCharge><timeForStartOfCharge>181421</timeForStartOfCharge><dateForStartOfCharge>20121123</dateForStartOfCharge><disconnectingParty>00</disconnectingParty><calledPartyNumber>12#144</calledPartyNumber><callingSubscriberIMEI>358855043501160</callingSubscriberIMEI><callingSubscriberIMSI>724023016557605</callingSubscriberIMSI><callingPartyNumber>11553891610047</callingPartyNumber><typeOfCallingSubscriber>10</typeOfCallingSubscriber><recordSequenceNumber>1489944</recordSequenceNumber><callIdentificationNumber>11705419</callIdentificationNumber><tAC>721421</tAC><internalCauseAndLoc>3</internalCauseAndLoc><eosInfo>00</eosInfo><callPosition>30</callPosition><firstRadioChannelUsed>00</firstRadioChannelUsed><gSMTeleServiceCode>17</gSMTeleServiceCode><cellIDForLastCellCalling>7240238279ADEE5</cellIDForLastCellCalling><cellIDFor1stCellCalling>72402009ADEE5</cellIDFor1stCellCalling><timeForTCSeizureCalling>181417</timeForTCSeizureCalling></original_cdr><TypeOfCommunication>voi</TypeOfCommunication><MSC_ID>11553191938800</MSC_ID><CallStart>20121123181421</CallStart><CallDuration>426</CallDuration><CallDuration_30_inf>426</CallDuration_30_inf><CallDuration_60_inf>426</CallDuration_60_inf><CallDuration_MC>426</CallDuration_MC><CallDuration_30_60>60</CallDuration_30_60><ServedParty ton="1" npi="1" int_code="55">553891610047</ServedParty><ServedLocation>7240238</ServedLocation><ScenarioName>NA</ScenarioName><ServedZone>ZO00461</ServedZone><OtherZone>ZP30411</OtherZone></event_data><dupChk></dupChk><account map_type="2">553891610047</account><other_account map_type="2">55#144</other_account><operation alternate_rating="1" type="charge"/><transaction id="0000000050B0DEA8-0000DBE8-00002876-62F2B0C6"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TMX+ZBHE01-95068-1211241251-AG.TTF;TICKET=6</Report></transaction><start>20121123181421</start></input></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TMX+ZBHE01-95068-1211241251-AG.TTF</filename><index_into_file>6</index_into_file></process_info>

出力:

Duplicate CDR: 1
Error on CDR level; File processing continued.: 1
D14 - Calls *144: 2
于 2012-11-24T17:17:28.830 に答える
0

XML::SAX::PurePerl を使用できます。これは非常に失敗しにくく、私の経験では、乱雑な XML をうまく処理できます。

#!/usr/bin/env perl
package Result::Extractor;
use strict;
use warnings qw(all);

use base qw(XML::SAX::Base);

sub new {
    return bless {
        count   => {},
        data    => '',
    };
}

sub start_element {
    my ($self, $el) = @_;
    $self->{data} = '';
}

sub end_element {
    my ($self, $el) = @_;
    if ($el->{Name} eq 'result') {
        ++$self->{count}{$self->{data}};
    }
}

sub characters {
    my ($self, $data) = @_;
    $self->{data} .= $data->{Data};
}

1;

package main;
use strict;
use warnings qw(all);

use Data::Printer;
use XML::SAX::PurePerl;

my $handler = Result::Extractor->new;
my $parser = XML::SAX::PurePerl->new(Handler => $handler);

$parser->parse_string(do { local $/; '<wrapper>' . <DATA> . '</wrapper>' });

p $handler->{count};

__DATA__
<process_info><module>pe_gw_a</module><result code="3">D14 - Calls *144</result><data><input><event_data origin_id="asn1"><CallType>moc</CallType><OtherParty ton="2" npi="1" int_code="55">55009999999991222</OtherParty><OtherLocation>55009999999991222</OtherLocation><IntCodeCallingPartyNumber>55</IntCodeCallingPartyNumber><IntCodeServedParty>55</IntCodeServedParty><TicketType>0</TicketType><original_cdr FILENAME="TIM+ZGNA01-99703-1211241250-D.TTF"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TIM+ZGNA01-99703-1211241250-D.TTF;TICKET=6</Report><CDRType>1</CDRType><networkCallReference>29722352746</networkCallReference><switchIdentity>7274</switchIdentity><originatedCode>1</originatedCode><subscriptionType>1</subscriptionType><speechCoderPreferenceList>2010005030</speechCoderPreferenceList><radioChannelProperty>30</radioChannelProperty><incomingAssignedRoute>BGNA05N</incomingAssignedRoute><translatedNumber>12#222</translatedNumber><miscellaneousInformation>0</miscellaneousInformation><incomingRoute>BGNA05N</incomingRoute><outgoingRoute>ZBSA1CO</outgoingRoute><mSCIdentification>11556281138800</mSCIdentification><exchangeIdentity>ZGNA01</exchangeIdentity><tariffClass>0010</tariffClass><chargingCase>1</chargingCase><originForCharging>62</originForCharging><chargedParty>00</chargedParty><timeFromRegisterSeizureToStartOfCharging>0</timeFromRegisterSeizureToStartOfCharging><interruptionTime>0</interruptionTime><chargeableDuration>5</chargeableDuration><timeForStopOfCharge>194949</timeForStopOfCharge><timeForStartOfCharge>194944</timeForStartOfCharge><dateForStartOfCharge>20121123</dateForStartOfCharge><disconnectingParty>00</disconnectingParty><calledPartyNumber>12#222</calledPartyNumber><callingSubscriberIMEI>355921042890190</callingSubscriberIMEI><callingSubscriberIMSI>724046008971498</callingSubscriberIMSI><callingPartyNumber>11556281020633</callingPartyNumber><typeOfCallingSubscriber>10</typeOfCallingSubscriber><recordSequenceNumber>2987070</recordSequenceNumber><callIdentificationNumber>1362570</callIdentificationNumber><tAC>721421</tAC><internalCauseAndLoc>3</internalCauseAndLoc><eosInfo>00</eosInfo><callPosition>30</callPosition><firstRadioChannelUsed>00</firstRadioChannelUsed><gSMTeleServiceCode>17</gSMTeleServiceCode><cellIDForLastCellCalling>724046213C64F8A</cellIDForLastCellCalling><cellIDFor1stCellCalling>7240400C64F8A</cellIDFor1stCellCalling><timeForTCSeizureCalling>194943</timeForTCSeizureCalling></original_cdr><TypeOfCommunication>voi</TypeOfCommunication><MSC_ID>11556281138800</MSC_ID><CallStart>20121123194944</CallStart><CallDuration>5</CallDuration><CallDuration_30_inf>30</CallDuration_30_inf><CallDuration_60_inf>60</CallDuration_60_inf><CallDuration_MC>30</CallDuration_MC><CallDuration_30_60>30</CallDuration_30_60><ServedParty ton="1" npi="1" int_code="55">556281020633</ServedParty><ServedLocation>7240462</ServedLocation><ScenarioName>NA</ScenarioName><ServedZone>ZO00031</ServedZone><OtherZone>ZP30158</OtherZone></event_data><dupChk></dupChk><account map_type="2">556281020633</account><other_account map_type="2">55#222</other_account><operation alternate_rating="1" type="charge"/><transaction id="0000000050B0DE9C-0000DB98-00002876-62F2B0C6"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TIM+ZGNA01-99703-1211241250-D.TTF;TICKET=6</Report></transaction><start>20121123194944</start></input></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TIM+ZGNA01-99703-1211241250-D.TTF</filename><index_into_file>6</index_into_file></process_info>
<process_info><module>pe_gw_a</module><result code="707">Error on CDR level; File processing continued.</result><data><file_info result="partial">CDR-Counter: (IN=16, BAD=0): (NORM_ERR=0 DUP_ERR=0, RAL_ERR=0), DUPLICATE=1, DISCARDED=0, OK=15</file_info></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF</filename></process_info>
<process_info><module>pe_gw_a</module><result code="705">Duplicate CDR</result><data><input><event_data origin_id="asn1"><CallType>mosms</CallType><OtherParty ton="1" npi="1" int_code="55">556291860209</OtherParty><OtherLocation>55006234191860209</OtherLocation><IntCodeCallingPartyNumber>55</IntCodeCallingPartyNumber><IntCodeOtherParty>55</IntCodeOtherParty><IntCodeServedParty>55</IntCodeServedParty><TicketType>0</TicketType><original_cdr FILENAME="TKM_SMS+STKM01-28129-1211241251-A.TTF"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF;TICKET=15</Report><CDRType>5</CDRType><serviceCentreAddress>11556291860209</serviceCentreAddress><miscellaneousInformation>41</miscellaneousInformation><gSMTeleServiceCode>34</gSMTeleServiceCode><cellIDFor1stCellCalling>7240462003E0000</cellIDFor1stCellCalling><mSCIdentification>11551189848200</mSCIdentification><exchangeIdentity>STKM01</exchangeIdentity><originForCharging>62</originForCharging><chargedParty>00</chargedParty><timeForStartOfCharge>124619</timeForStartOfCharge><dateForStartOfCharge>20121124</dateForStartOfCharge><callingSubscriberIMSI>724046012529641</callingSubscriberIMSI><callingPartyNumber>11556282361092</callingPartyNumber></original_cdr><TypeOfCommunication>sms</TypeOfCommunication><CallDuration>0.9</CallDuration><CallStart>20121124124619</CallStart><MSC_ID>11551189848200</MSC_ID><ServedParty int_code="55" ton="1" npi="1">556282361092</ServedParty><ServedLocation>7240462</ServedLocation><ScenarioName>Sms_SMS___TIM_TIM</ScenarioName><ServedZone>ZO00031</ServedZone><OtherZone>ZP37744</OtherZone></event_data><dupChk></dupChk><account map_type="2">556282361092</account><other_account map_type="2">556291860209</other_account><operation alternate_rating="1" type="charge"/><transaction id="0000000050B0DE9D-0000DBC9-00002876-62F2B0C6"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF;TICKET=15</Report></transaction><start>20121124124619</start></input></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TKM_SMS+STKM01-28129-1211241251-A.TTF</filename><index_into_file>15</index_into_file></process_info>
<process_info><module>pe_gw_a</module><result code="3">D14 - Calls *144</result><data><input><event_data origin_id="asn1"><CallType>moc</CallType><OtherParty ton="2" npi="1" int_code="55">55009999999991144</OtherParty><OtherLocation>55009999999991144</OtherLocation><IntCodeCallingPartyNumber>55</IntCodeCallingPartyNumber><IntCodeServedParty>55</IntCodeServedParty><TicketType>0</TicketType><original_cdr FILENAME="TMX+ZBHE01-95068-1211241251-AG.TTF"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TMX+ZBHE01-95068-1211241251-AG.TTF;TICKET=6</Report><CDRType>1</CDRType><networkCallReference>447382755812</networkCallReference><switchIdentity>6628</switchIdentity><originatedCode>1</originatedCode><subscriptionType>21</subscriptionType><speechCoderPreferenceList>2010005030</speechCoderPreferenceList><radioChannelProperty>30</radioChannelProperty><incomingAssignedRoute>BMCL01B</incomingAssignedRoute><translatedNumber>12#144</translatedNumber><originatingLocationNumber>11553191938800</originatingLocationNumber><miscellaneousInformation>0</miscellaneousInformation><incomingRoute>BMCL01B</incomingRoute><outgoingRoute>XMCL1AO</outgoingRoute><mSCIdentification>11553191938800</mSCIdentification><exchangeIdentity>ZBHE01</exchangeIdentity><tariffClass>0010</tariffClass><chargingCase>1</chargingCase><originForCharging>38</originForCharging><chargedParty>00</chargedParty><timeFromRegisterSeizureToStartOfCharging>4</timeFromRegisterSeizureToStartOfCharging><interruptionTime>0</interruptionTime><chargeableDuration>426</chargeableDuration><timeForStopOfCharge>182128</timeForStopOfCharge><timeForStartOfCharge>181421</timeForStartOfCharge><dateForStartOfCharge>20121123</dateForStartOfCharge><disconnectingParty>00</disconnectingParty><calledPartyNumber>12#144</calledPartyNumber><callingSubscriberIMEI>358855043501160</callingSubscriberIMEI><callingSubscriberIMSI>724023016557605</callingSubscriberIMSI><callingPartyNumber>11553891610047</callingPartyNumber><typeOfCallingSubscriber>10</typeOfCallingSubscriber><recordSequenceNumber>1489944</recordSequenceNumber><callIdentificationNumber>11705419</callIdentificationNumber><tAC>721421</tAC><internalCauseAndLoc>3</internalCauseAndLoc><eosInfo>00</eosInfo><callPosition>30</callPosition><firstRadioChannelUsed>00</firstRadioChannelUsed><gSMTeleServiceCode>17</gSMTeleServiceCode><cellIDForLastCellCalling>7240238279ADEE5</cellIDForLastCellCalling><cellIDFor1stCellCalling>72402009ADEE5</cellIDFor1stCellCalling><timeForTCSeizureCalling>181417</timeForTCSeizureCalling></original_cdr><TypeOfCommunication>voi</TypeOfCommunication><MSC_ID>11553191938800</MSC_ID><CallStart>20121123181421</CallStart><CallDuration>426</CallDuration><CallDuration_30_inf>426</CallDuration_30_inf><CallDuration_60_inf>426</CallDuration_60_inf><CallDuration_MC>426</CallDuration_MC><CallDuration_30_60>60</CallDuration_30_60><ServedParty ton="1" npi="1" int_code="55">553891610047</ServedParty><ServedLocation>7240238</ServedLocation><ScenarioName>NA</ScenarioName><ServedZone>ZO00461</ServedZone><OtherZone>ZP30411</OtherZone></event_data><dupChk></dupChk><account map_type="2">553891610047</account><other_account map_type="2">55#144</other_account><operation alternate_rating="1" type="charge"/><transaction id="0000000050B0DEA8-0000DBE8-00002876-62F2B0C6"><Report>FILE=/gold/rte/data/IncomingCDRs/ASN1/010/TMX+ZBHE01-95068-1211241251-AG.TTF;TICKET=6</Report></transaction><start>20121123181421</start></input></data><filename>/gold/rte/data/IncomingCDRs/ASN1/010/TMX+ZBHE01-95068-1211241251-AG.TTF</filename><index_into_file>6</index_into_file></process_info>

結果:

\ {
    'Duplicate CDR'                                    1,
    'D14 - Calls *144'                                 2,
    'Error on CDR level; File processing continued.'   1
}

XML::SAX::ExpatXML::SAX::ExpatXS、およびXML::LibXML::SAXも確認できます。それらは高速ですが、エラーが発生しやすくなります。

于 2012-11-26T02:21:39.380 に答える
-1

のすべてのインスタンスが興味のあるものであると仮定する<result>...</result>と、正規表現を回避できる場合があります。

my $doc = read_file("file.xml"); # slurp in the doc
my %count;

while ($doc =~ m,<result.*?>(.*?)</result>,g) {
  $count{$1}++;
}

しかし、これには実際の XML 処理ライブラリを使用しますXML::XPathXML::Pathサンプル プログラムをXML ファイルに適用するのは非常に簡単です。

use XML::XPath;
use XML::XPath::XMLParser;

my $xp = XML::XPath->new(filename => 'file.xml');

my $nodeset = $xp->find('/zzz/process_info/result'); # find all results

my %count;
foreach my $node ($nodeset->get_nodelist) {
  $count{ $node->string_value } ++;
}

私はxpathを使用していることに注意してください-XML/zzz/...ドキュメントのトップレベルは単一の要素でなければならないため、例を<zzz>...</zzz>.

resultこれは、要素の子である要素のみを検索するため、より堅牢なソリューションprocess_infoです。

于 2012-11-24T16:45:03.723 に答える
-1
perl -MXML::Twig -E'XML::Twig->new( twig_handlers => { result => sub { $count{$_->text}++ } })->parsefile( $ARGV[0]); say "$_: $count{$_}" foreach sort keys %count; ' count.xml

IF YOUR DATA WAS XML.

そうではない。

于 2012-11-24T17:48:40.993 に答える