XML が無効ですが、Nokogiri が修正を試みます。
無効な XML/XHTML/HTML をチェックする方法と、必要なセクションを書き換える方法を次に示します。
セットアップは次のとおりです。
require 'nokogiri'
doc = Nokogiri.XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<Schema name="Sample_Neighborhoods_Samples" id="Sample_Neighborhoods_Samples">
<SimpleField type="int" name="nid"/>
<SimpleField type="string" name="neighborhd"/>
<SimpleField type="string" name="place"/>
<SimpleField type="string" name="placecode"/>
<SimpleField type="string" name="nbr_type"/>
<SimpleField type="string" name="po_name"/>
<SimpleField type="string" name="metro"/>
<SimpleField type="string" name="country"/>
<SimpleField type="string" name="state"/>
<SimpleField type="string" name="statefips"/>
<SimpleField type="string" name="county"/>
<SimpleField type="string" name="countyfips"/>
<SimpleField type="string" name="mcd"/>
<SimpleField type="string" name="mcdfips"/>
<SimpleField type="string" name="cbsa"/>
<SimpleField type="string" name="cbsacode"/>
<SimpleField type="string" name="cbsatype"/>
<SimpleField type="double" name="cenlat"/>
<SimpleField type="double" name="cenlon"/>
<SimpleField type="int" name="color"/>
<SimpleField type="string" name="ncs_code"/>
<SimpleField type="string" name="release"/>
</Schema>
<Style id="KMLSTYLER_6">
<LabelStyle>
<scale>1.0</scale>
</LabelStyle>
<LineStyle>
<colorMode>normal</colorMode>
</LineStyle>
<PolyStyle>
<color>7f4080ff</color>
<colorMode>random</colorMode>
</PolyStyle>
</Style>
<name>Sample_Neighborhoods_NYC</name>
<visibility>1</visibility>
<Folder id="kml_ft_Sample_Neighborhoods_Samples">
<name>Sample_Neighborhoods_Samples</name>
<Folder id="kml_ft_Sample_Neighborhoods_Samples_Sample_Neighborhoods_NYC">
<name>Sample_Neighborhoods_NYC</name>
<Placemark id="kml_1">
<name>Colgate Center</name>
<Snippet> </Snippet>
<styleUrl>#KMLSTYLER_6</styleUrl>
<ExtendedData>
<SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
<SimpleData name="nid">7086</SimpleData>
<SimpleData name="neighborhd">Colgate Center</SimpleData>
<SimpleData name="place">Jersey City</SimpleData>
<SimpleData name="placecode">36000</SimpleData>
<SimpleData name="nbr_type">S</SimpleData>
<SimpleData name="po_name">JERSEY CITY</SimpleData>
<SimpleData name="metro">New York City, NY</SimpleData>
<SimpleData name="country">USA</SimpleData>
<SimpleData name="state">NJ</SimpleData>
<SimpleData name="statefips">34</SimpleData>
<SimpleData name="county">Hudson</SimpleData>
<SimpleData name="countyfips">34017</SimpleData>
<SimpleData name="mcd">Jersey City</SimpleData>
<SimpleData name="mcdfips">36000</SimpleData>
<SimpleData name="cbsa">New York-Northern New Jersey-Long Island, NY-NJ-PA</SimpleData>
<SimpleData name="cbsacode">35620</SimpleData>
<SimpleData name="cbsatype">Metro</SimpleData>
<SimpleData name="cenlat">40.7145135000001</SimpleData>
<SimpleData name="cenlon">-74.0343385</SimpleData>
<SimpleData name="color">1</SimpleData>
<SimpleData name="ncs_code">40910000</SimpleData>
<SimpleData name="release">1.12.2</SimpleData>
</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-74.036628,40.712211,0 -74.0357779999999,40.7120810000001,0 -74.035535,40.7122010000001,0 -74.0348299999999,40.71209,0 -74.034903,40.711804,0 -74.033761,40.7116560000001,0 -74.0334089999999,40.7121090000001,0 -74.032996,40.7141330000001,0 -74.0331899999999,40.7141790000001,0 -74.032656,40.7162500000001,0 -74.032231,40.716194,0 -74.032049,40.716908,0 -74.033871,40.7170370000001,0 -74.035629,40.7173710000001,0 -74.035669,40.7171650000001,0 -74.036009,40.715335,0 -74.036325,40.713625,0 -74.036482,40.7123580000001,0 -74.036628,40.712211,0 </coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
<Placemark id="kml_2">
<name>Colgate Center</name>
<Snippet> </Snippet>
<ExtendedData>
EOT
エラーがあるかどうかを確認する方法は次のとおりです。いつでもerrors
空ではありません。問題があります。
puts doc.errors
SimpleData
ドキュメント全体でノードを見つける方法の 1 つを次に示します。読みやすさの理由から、XPath よりも CSS アクセサーを使用することを好みます。検索時の粒度が向上するため、XPath の方が優れている場合があります。その両方を学ぶ必要があります。
doc.search('ExtendedData SimpleData').each do |simple_data|
node_name = simple_data['name']
puts "<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name]
end
実行後の出力は次のとおりです。
Premature end of data in tag ExtendedData line 87
Premature end of data in tag Placemark line 84
Premature end of data in tag Folder line 44
Premature end of data in tag Folder line 42
Premature end of data in tag Document line 3
Premature end of data in tag kml line 2
<nid>7086</nid>
<neighborhd>Colgate Center</neighborhd>
<place>Jersey City</place>
<placecode>36000</placecode>
<nbr_type>S</nbr_type>
<po_name>JERSEY CITY</po_name>
<metro>New York City, NY</metro>
<country>USA</country>
<state>NJ</state>
<statefips>34</statefips>
<county>Hudson</county>
<countyfips>34017</countyfips>
<mcd>Jersey City</mcd>
<mcdfips>36000</mcdfips>
<cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
<cbsacode>35620</cbsacode>
<cbsatype>Metro</cbsatype>
<cenlat>40.7145135000001</cenlat>
<cenlon>-74.0343385</cenlon>
<color>1</color>
<ncs_code>40910000</ncs_code>
<release>1.12.2</release>
DOM を変更しようとしているわけではありませんが、簡単に行うことができます。
doc.search('ExtendedData SimpleData').each do |simple_data|
node_name = simple_data['name']
simple_data.replace("<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name])
end
puts doc.to_xml
実行後、影響を受けるセクションは次のとおりです。
<ExtendedData>
<SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
<nid>7086</nid>
<neighborhd>Colgate Center</neighborhd>
<place>Jersey City</place>
<placecode>36000</placecode>
<nbr_type>S</nbr_type>
<po_name>JERSEY CITY</po_name>
<metro>New York City, NY</metro>
<country>USA</country>
<state>NJ</state>
<statefips>34</statefips>
<county>Hudson</county>
<countyfips>34017</countyfips>
<mcd>Jersey City</mcd>
<mcdfips>36000</mcdfips>
<cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
<cbsacode>35620</cbsacode>
<cbsatype>Metro</cbsatype>
<cenlat>40.7145135000001</cenlat>
<cenlon>-74.0343385</cenlon>
<color>1</color>
<ncs_code>40910000</ncs_code>
<release>1.12.2</release>
</SchemaData>
</ExtendedData>