4

I have some legacy XML documents stored in a database as a blob, which are not well formed XML. I'm reading them in from a SQL database, and ultimately, as I am using C#.NET, would like to instantiate them as an XMLDocument.

When I try to do this, I obviously get an XMLException. Having looked at the XML documents, they are all failing because of undeclared namespaces in specific XML Nodes.

I am not concerned with any of the XML nodes which have this prefix, so I can ignore them or throw them away. So basically, before I load the string as an XMLDocument, I would like to remove the prefix in the string, so that

<tem:GetRouteID>
        <tem:PostCode>postcode</tem:PostCode>
        <tem:Type>ItemType</tem:Type>
</tem:GetRouteID>

becomes

<GetRouteID>
    <PostCode>postcode</PostCode>
    <Type>ItemType</Type>
</GetRouteID>

and this

<wsse:Security soapenv:actor="">
    <wsse:BinarySecurityToken>token</wsse:BinarySecurityToken>
</wsse:Security>

becomes this :

<Security soapenv:actor="">
    <BinarySecurityToken>token</BinarySecurityToken>
</Security>

I have one solution which does this like so :

<appSettings>
  <add key="STRIP_NAMESPACES" value="wsse;tem" />
</appSettings>
if (STRIP_NAMESPACES != null)
{
    string[] namespaces = Regex.Split(STRIP_NAMESPACES, ";");

    foreach (string ns in namespaces)
   {
        str2 = str2.Replace("<" + ns + ":", "<"); // Replace opening tag
        str2 = str2.Replace("</" + ns + ":", "</"); // Replace closing tag

    }
}

but Ideally I would like a generic approach for this, so I don't have to endlessly configure the namespaces I want to remove.

How can I achieve this in C#.NET. I am assuming that a Regex is the way to go here?

UPDATE 1

Ria's Regex below works well for the requirement above. However, how would I need to change the Regex to also change this

<wsse:Security soapenv:actor="">
    <BinarySecurityToken>authtoken</BinarySecurityToken>
</Security>

to this?

<Security>
    <BinarySecurityToken>authtoken</BinarySecurityToken>
</Security>

UPDATE 2

Think I've worked out the updated version myself based on Ria's answer like so :

<(/?)\w+:(\w+/?) ?(\w+:\w+.*)?>
4

1 に答える 1

6

UPDATE

For new issue (attribs namespace) try this general solution. this has no effect on node values:

Regex.Replace(originalXml, 
              @"((?<=</?)\w+:(?<elem>\w+)|\w+:(?<elem>\w+)(?==\"))", 
              "${elem}");

try this regex on my sample xml:

<wsse:Security soapenv:actor="dont match soapenv:actor attrib">
    <BinarySecurityToken>authtoken</BinarySecurityToken>
</Security> 

Try using XSL, You can apply XSL directly or using XslTransform class in .NET:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no"/>

<xsl:template match="/|comment()|processing-instruction()">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
</xsl:template>

<xsl:template match="*">
    <xsl:element name="{local-name()}">
      <xsl:apply-templates select="@*|node()"/>
    </xsl:element>
</xsl:template>

<xsl:template match="@*">
    <xsl:attribute name="{local-name()}">
      <xsl:value-of select="."/>
    </xsl:attribute>
</xsl:template>
</xsl:stylesheet>

or try this Regex:

var finalXml = Regex.Replace(originalXml, @"<(/?)\w+:(\w+/?)>", "<$1$2>");
于 2012-07-31T10:05:59.880 に答える