1

プレーンテキストと、挿入する必要のある各XML要素の開始オフセットと終了オフセットからXMLドキュメントを作成する必要があります。これが私が合格したいいくつかのテストケースです:

val text = "The dog chased the cat."
val spans = Seq(
    (0, 23, <xml/>),
    (4, 22, <phrase/>),
    (4, 7, <token/>))
val expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
assert(expected === spansToXML(text, spans))

val text = "aabbccdd"
val spans = Seq(
    (0, 8, <xml x="1"/>),
    (0, 4, <ab y="foo"/>),
    (4, 8, <cd z="42>3"/>))
val expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
assert(expected === spansToXML(text, spans))

val spans = Seq(
    (0, 1, <a/>),
    (0, 0, <b/>),
    (0, 0, <c/>),
    (1, 1, <d/>),
    (1, 1, <e/>))
assert(<a><b/><c/> <d/><e/></a> === spansToXML(" ", spans))

私の部分的な解決策(以下の私の答えを参照)は、文字列の連結とによって機能しますXML.loadString。それはハッキーのようです、そして私はまた、このソリューションがすべてのコーナーケースで正しく機能するかどうか100%確信していません...

より良い解決策はありますか?(その価値については、このタスクが簡単になる場合は、anti-xmlに切り替えていただければ幸いです。)

2011年8月10日更新して、テストケースを追加し、よりクリーンな仕様を提供します。

4

5 に答える 5

3

あなたが提唱した報奨金を考慮して、私はあなたの問題をしばらく研究し、すべてのテストケースで成功する次の解決策を思いつきました. 私の回答が受け入れられることを本当に望んでいます-私の解決策に何か問題があるかどうか教えてください。

いくつかのコメント: 実行中に何が起こっているのか知りたい場合は、コメントアウトされた print ステートメントを内部に残しました。あなたの仕様に加えて、既存の子 (存在する場合) を保持します - これが行われる場所にコメントがあります。

XMLノードを手動で構築せず、渡されたものを変更します。開始タグと終了タグの分割を避けるために、アルゴリズムを大幅に変更する必要がありましたが、ソートのアイデアはあなたのソリューションにbegin及びます。-end

コードはやや高度な Scala です。特に、Orderings必要な別のものをビルドする場合はそうです。私が入手した最初のバージョンからいくらか単純化しました。

を使用して間隔を表すツリーを作成し、SortedMap抽出後に間隔をフィルタリングすることを避けました。この選択はやや最適ではありません。ただし、間隔ツリーのような入れ子になった間隔を表すための「より良い」データ構造があると聞きましたが (それらは計算幾何学で研究されています)、実装が非常に複雑であり、ここでは必要ないと思います。

/**
 * User: pgiarrusso
 * Date: 12/8/2011
 */

import collection.mutable.ArrayBuffer
import collection.SortedMap
import scala.xml._

object SpansToXmlTest {
    def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]) = {
        val intOrdering = implicitly[Ordering[Int]] // Retrieves the standard ordering on Ints.

        // Sort spans decreasingly on begin and increasingly on end and their label - this processes spans outwards.
        // The sorting on labels matches the given examples.
        val spanOrder = Ordering.Tuple3(intOrdering.reverse, intOrdering, Ordering.by((_: Elem).label))

        //Same sorting, excluding labels.
        val intervalOrder = Ordering.Tuple2(intOrdering.reverse, intOrdering)
        //Map intervals of the source string to the sequence of nodes which match them - it is a sequence because
        //multiple spans for the same interval are allowed.
        var intervalMap = SortedMap[(Int, Int), Seq[Node]]()(intervalOrder)

        for ((start, end, elem) <- spans.sorted(spanOrder)) {
            //Only nested intervals. Interval nesting is a partial order, therefore we cannot use the filter function as an ordering for intervalMap, even if it would be nice.
            val nestedIntervalsMap = intervalMap.until((start, end)).filter(_ match {
                case ((intStart, intEnd), _) => start <= intStart && intEnd <= end
            })
            //println("intervalMap: " + intervalMap)
            //println("beforeMap: " + nestedIntervalsMap)

            //We call sorted to use a standard ordering this time.
            val before = nestedIntervalsMap.keys.toSeq.sorted

            // text.slice(start, end) must be split into fragments, some of which are represented by text node, some by
            // already computed xml nodes.
            val intervals = start +: (for {
                (intStart, intEnd) <- before
                boundary <- Seq(intStart, intEnd)
            } yield boundary) :+ end

            var xmlChildren = ArrayBuffer[Node]()
            var useXmlNode = false

            for (interv <- intervals.sliding(2)) {
                val intervStart = interv(0)
                val intervEnd = interv(1)
                xmlChildren.++=(
                    if (useXmlNode)
                        intervalMap((intervStart, intervEnd)) //Precomputed nodes
                    else
                        Seq(Text(text.slice(intervStart, intervEnd))))
                useXmlNode = !useXmlNode //The next interval will be of the opposite kind.
            }
            //Remove intervals that we just processed
            intervalMap = intervalMap -- before

            // By using elem.child, you also preserve existing xml children. "elem.child ++" can be also commented out.
            var tree = elem.copy(child = elem.child ++ xmlChildren)
            intervalMap += (start, end) -> (intervalMap.getOrElse((start, end), Seq.empty) :+ tree)
            //println(tree)
        }
        intervalMap((0, text.length)).head
    }

    def test(text: String, spans: Seq[(Int, Int, Elem)], expected: Node) {
        val res = spansToXML(text, spans)
        print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
        assert(expected == res)
    }
    def test1() =
        test(
            text = "The dog chased the cat.",
            spans = Seq(
                (0, 23, <xml/>),
                (4, 22, <phrase/>),
                (4, 7, <token/>)),
            expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
        )

    def test2() =
        test(
            text = "aabbccdd",
            spans = Seq(
                (0, 8, <xml x="1"/>),
                (0, 4, <ab y="foo"/>),
                (4, 8, <cd z="42>3"/>)),
            expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
        )

    def test3() =
        test(
            text = " ",
            spans = Seq(
                (0, 1, <a/>),
                (0, 0, <b/>),
                (0, 0, <c/>),
                (1, 1, <d/>),
                (1, 1, <e/>)),
            expected = <a><b/><c/> <d/><e/></a>
        )

    def main(args: Array[String]) {
        test1()
        test2()
        test3()
    }
}
于 2011-08-12T22:08:28.093 に答える
2

これは楽しかったです!

私はスティーブと同様のアプローチを取りました。要素の「開始タグ」と「終了タグ」を並べ替えて、どこに配置するかを計算します。

私は恥知らずにも Blaisorblade からテストを盗み、さらにコードを開発するのに役立つものをいくつか追加しました。

2011-08-14 編集

test-5 で空のタグが挿入されたのが残念でした。ただし、この配置では、テスト 3 の定式化の結果

  • 空のタグ (c,d) の場合でも、spans リスト内のスパニング タグ (a) の後に c,d-tags が a の終了タグと同じ挿入ポイントを持っていても、c,d タグは a の内側に入ります。これにより、便利な複数のタグの間に空のタグを配置することが難しくなります。

そこで、いくつかのテストを少し変更し、代替ソリューションを提供しました。

代替ソリューションでは、同じ方法で開始しますが、開始タグ、空タグ、終了タグの 3 つの個別のリストがあります。そして、ソートするだけでなく、空のタグをタグ リストに配置する 3 番目のステップがあります。

最初の解決策:

import xml.{XML, Elem, Node}
import annotation.tailrec

object SpanToXml {
    def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]): Node = {
        // Create a Seq of elements, sorted by where it should be inserted
        //  differentiate start tags ('s) and empty tags ('e)
        val startElms = spans sorted Ordering[Int].on[(Int, _, _)](_._1) map {
            case e if e._1 != e._2 => (e._1, e._3, 's)
            case e => (e._1, e._3, 'e)
        }
        //Create a Seq of closing tags ('c), sorted by where they should be inserted
        // filter out all empty tags
        val endElms = spans.reverse.sorted(Ordering[Int].on[(_, Int, _)](_._2))
            .filter(e => e._1 != e._2)
            .map(e => (e._2, e._3, 'c))

        //Combine the Seq's and sort by insertion point
        val elms = startElms ++ endElms sorted Ordering[Int].on[(Int, _, _)](_._1)
        //The sorting need to be refined
        // - end tag's need to come before start tag's if the insertion point is thesame
        val sorted = elms.sortWith((a, b) => a._1 == b._1 && a._3 == 'c && b._3 == 's )

        //Adjust the insertion point to what it should be in the final string
        // then insert the tags into the text by folding left
        // - there are different rules depending on start, empty or close
        val txt = adjustInset(sorted).foldLeft(text)((tx, e) => {
            val s = tx.splitAt(e._1)
            e match {
                case (_, elem, 's) => s._1 + "<" + elem.label + elem.attributes + ">" + s._2
                case (_, elem, 'e) => s._1 + "<" + elem.label + elem.attributes + "/>" + s._2
                case (_, elem, 'c) => s._1 + "</" + elem.label + ">" + s._2
            }
        })
        //Sanity check
        //println(txt)

        //Convert to XML
        XML.loadString(txt)
    }

    def adjustInset(elems: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {
        @tailrec
        def adjIns(elems: Seq[(Int, Elem, Symbol)], tmp: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] =
            elems match {
                case t :: Nil => tmp :+ t
                case t :: ts => {
                    //calculate offset due to current element
                    val offset = t match {
                        case (_, e, 's) => e.label.size + e.attributes.toString.size + 2
                        case (_, e, 'e) => e.label.size + e.attributes.toString.size + 3
                        case (_, e, 'c) => e.label.size + 3
                    }
                    //add offset to all elm's in tail, and recurse
                    adjIns(ts.map(e => (e._1 + offset, e._2, e._3)), tmp :+ t)
                }
            }

            adjIns(elems, Nil)
    }

    def test(text: String, spans: Seq[(Int, Int, Elem)], expected: Node) {
        val res = spansToXML(text, spans)
        print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
        assert(expected == res)
    }

    def test1() =
        test(
            text = "The dog chased the cat.",
            spans = Seq(
                (0, 23, <xml/>),
                (4, 22, <phrase/>),
                (4, 7, <token/>)),
            expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
        )

    def test2() =
        test(
            text = "aabbccdd",
            spans = Seq(
                (0, 8, <xml x="1"/>),
                (0, 4, <ab y="foo"/>),
                (4, 8, <cd z="42>3"/>)),
            expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
        )

    def test3() =
        test(
            text = " ",
            spans = Seq(
                (0, 1, <a/>),
                (0, 0, <b/>),
                (0, 0, <c/>),
                (1, 1, <d/>),
                (1, 1, <e/>)),
            expected = <a><b/><c/> <d/><e/></a>
        )

    def test4() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aabb</ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test5() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (4, 4, <empty/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b>bb<empty/></b></ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test6() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (2, 4, <c/>),
                        (3, 4, <d/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b><c>b<d>b</d></c></b></ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test7() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab a="a" b="b"/>),
                        (4, 8, <cd c="c" d="d"/>)),
            expected = <xml><ab a="a" b="b">aabb</ab><cd c="c" d="d">ccdd</cd></xml>
        )

    def invalidSpans() = {
        val text = "aabbccdd"
        val spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 6, <err/>),
                        (4, 8, <cd/>))
        try {
            val res = spansToXML(text, spans)
            assert(false)
        } catch {
            case e => {
                println("This generate invalid XML:")
                println("<xml><ab>aabb</ab><err><cd>cc</err>dd</cd></xml>")
                println(e.getMessage)
            }
        }
    }

    def main(args: Array[String]) {
        test1()
        test2()
        test3()
        test4()
        test5()
        test6()
        test7()
        invalidSpans()
    }
}

SpanToXml.main(Array())

代替ソリューション:

import xml.{XML, Elem, Node}
import annotation.tailrec

object SpanToXmlAlt {
    def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]): Node = {
        // Create a Seq of start tags, sorted by where it should be inserted
        // filter out all empty tags
        val startElms = spans.sorted(Ordering[Int].on[(Int, _, _)](_._1))
            .filterNot(e => e._1 == e._2)
            .map(e => (e._1, e._3, 's))
        //Create a Seq of closing tags, sorted by where they should be inserted
        // filter out all empty tags
        val endElms = spans.reverse.sorted(Ordering[Int].on[(_, Int, _)](_._2))
            .filterNot(e => e._1 == e._2)
            .map(e => (e._2, e._3, 'c))

        //Create a Seq of empty tags, sorted by where they should be inserted
        val emptyElms = spans.sorted(Ordering[Int].on[(Int, _, _)](_._1))
            .filter(e => e._1 == e._2)
            .map(e => (e._1, e._3, 'e))

        //Combine the Seq's and sort by insertion point
        val elms = startElms ++ endElms sorted Ordering[Int].on[(Int, _, _)](_._1)
        //The sorting need to be refined
        // - end tag's need to come before start tag's if the insertion point is the same
        val sorted = elms.sortWith((a, b) => a._1 == b._1 && a._3 == 'c && b._3 == 's )

        //Insert empty tags
        val allSorted = insertEmpyt(spans, sorted, emptyElms) sorted Ordering[Int].on[(Int, _, _)](_._1)
        //Adjust the insertion point to what it should be in the final string
        // then insert the tags into the text by folding left
        // - there are different rules depending on start, empty or close
        val str = adjustInset(allSorted).foldLeft(text)((tx, e) => {
            val s = tx.splitAt(e._1)
            e match {
                case (_, elem, 's) => s._1 + "<" + elem.label + elem.attributes + ">" + s._2
                case (_, elem, 'e) => s._1 + "<" + elem.label + elem.attributes + "/>" + s._2
                case (_, elem, 'c) => s._1 + "</" + elem.label + ">" + s._2
            }
        })
        //Sanity check
        //println(str)
        //Convert to XML
        XML.loadString(str)
    }

    def insertEmpyt(spans: Seq[(Int, Int, Elem)],
        sorted: Seq[(Int, Elem, Symbol)],
        emptys: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {

        //Find all tags that should be before the empty tag
        @tailrec
        def afterSpan(empty: (Int, Elem, Symbol),
            spans: Seq[(Int, Int, Elem)],
            after: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {
            var result = after
            spans match {
                case t :: _ if t._1 == empty._1 && t._2 == empty._1 && t._3 == empty._2 => after //break
                case t :: ts if t._1 == t._2 => afterSpan(empty, ts, after :+ (t._1, t._3, 'e))
                case t :: ts => {
                    if (t._1 <= empty._1) result = result :+ (t._1, t._3, 's)
                    if (t._2 <= empty._1) result = result :+ (t._2, t._3, 'c)
                    afterSpan(empty, ts, result)
                }
            }
        }

        //For each empty tag, insert it in the sorted list
        var result = sorted
        emptys.foreach(e => {
            val afterSpans = afterSpan(e, spans, Seq[(Int, Elem, Symbol)]())
            var emptyInserted = false
            result = result.foldLeft(Seq[(Int, Elem, Symbol)]())((res, s) => {
                if (afterSpans.contains(s) || emptyInserted) {
                    res :+ s
                } else {
                    emptyInserted = true
                    res :+ e :+ s
                }
            })
        })
        result
    }

    def adjustInset(elems: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {
        @tailrec
        def adjIns(elems: Seq[(Int, Elem, Symbol)], tmp: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] =
            elems match {
                case t :: Nil => tmp :+ t
                case t :: ts => {
                    //calculate offset due to current element
                    val offset = t match {
                        case (_, e, 's) => e.label.size + e.attributes.toString.size + 2
                        case (_, e, 'e) => e.label.size + e.attributes.toString.size + 3
                        case (_, e, 'c) => e.label.size + 3
                    }
                    //add offset to all elm's in tail, and recurse
                    adjIns(ts.map(e => (e._1 + offset, e._2, e._3)), tmp :+ t)
                }
            }

            adjIns(elems, Nil)
    }

    def test(text: String, spans: Seq[(Int, Int, Elem)], expected: Node) {
        val res = spansToXML(text, spans)
        print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
        assert(expected == res)
    }

    def test1() =
        test(
            text = "The dog chased the cat.",
            spans = Seq(
                (0, 23, <xml/>),
                (4, 22, <phrase/>),
                (4, 7, <token/>)),
            expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
        )

    def test2() =
        test(
            text = "aabbccdd",
            spans = Seq(
                (0, 8, <xml x="1"/>),
                (0, 4, <ab y="foo"/>),
                (4, 8, <cd z="42>3"/>)),
            expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
        )

    def test3alt() =
        test(
            text = "  ",
            spans = Seq(
                (0, 2, <a/>),
                (0, 0, <b/>),
                (0, 0, <c/>),
                (1, 1, <d/>),
                (1, 1, <e/>)),
            expected = <a><b/><c/> <d/><e/> </a>
        )

    def test4() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aabb</ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test5alt() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (4, 4, <empty/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b>bb</b></ab><empty/><cd><ok>cc</ok>dd</cd></xml>
        )

    def test5b() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 2, <empty1/>),
                        (4, 4, <empty2/>),
                        (2, 4, <b/>),
                        (2, 2, <empty3/>),
                        (4, 4, <empty4/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<empty1/><b><empty3/>bb<empty2/></b></ab><empty4/><cd><ok>cc</ok>dd</cd></xml>
        )

    def test6() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (2, 4, <c/>),
                        (3, 4, <d/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b><c>b<d>b</d></c></b></ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test7() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab a="a" b="b"/>),
                        (4, 8, <cd c="c" d="d"/>)),
            expected = <xml><ab a="a" b="b">aabb</ab><cd c="c" d="d">ccdd</cd></xml>
        )

    def failedSpans() = {
        val text = "aabbccdd"
        val spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 6, <err/>),
                        (4, 8, <cd/>))
        try {
            val res = spansToXML(text, spans)
            assert(false)
        } catch {
            case e => {
                println("This generate invalid XML:")
                println("<xml><ab>aabb</ab><err><cd>cc</err>dd</cd></xml>")
                println(e.getMessage)
            }
        }

    }

    def main(args: Array[String]) {
        test1()
        test2()
        test3alt()
        test4()
        test5alt()
        test5b()
        test6()
        test7()
        failedSpans()
    }
}

SpanToXmlAlt.main(Array())
于 2011-08-13T23:59:12.180 に答える
0

XML ノードを動的に簡単に作成できます。

scala> import scala.xml._
import scala.xml._

scala> Elem(null, "AAA",xml.Null,xml.TopScope, Array[Node]():_*)
res2: scala.xml.Elem = <AAA></AAA>

これがElem.apply署名ですdef apply (prefix: String, label: String, attributes: MetaData, scope: NamespaceBinding, child: Node*) : Elem

このアプローチで私が目にする唯一の問題は、最初に内部ノードを構築する必要があることです。

簡単にするための何か:

scala> def elem(name:String, children:Node*) = Elem(null, name ,xml.Null,xml.TopScope, children:_*); def elem(name:String):Elem=elem(name, Array[Node]():_*);

scala> elem("A",elem("B"))
res11: scala.xml.Elem = <A><B></B></A>
于 2010-11-09T18:04:13.433 に答える
0

これは、文字列連結とを使用した正しい解決策ですXML.loadString

def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]): Node = {
  // arrange items so that at each offset:
  //   closing tags sort before opening tags
  //   with two opening tags, the one with the later closing tag sorts first
  //   with two closing tags, the one with the later opening tag sorts first
  val items = Buffer[(Int, Int, Int, String)]()
  for ((begin, end, elem) <- spans) {
    val elemStr = elem.toString
    val splitIndex = elemStr.indexOf('>') + 1
    val beginTag = elemStr.substring(0, splitIndex)
    val endTag = elemStr.substring(splitIndex)
    items += ((begin, +1, -end, beginTag))
    items += ((end, -1, -begin, endTag))
  }
  // group tags to be inserted by index
  val inserts = Map[Int, Buffer[String]]()
  for ((index, _, _, tag) <- items.sorted) {
    inserts.getOrElseUpdate(index, Buffer[String]()) += tag
  }
  // put tags and characters into a buffer
  val result = Buffer[String]()
  for (i <- 0 until text.size + 1) {
    for (tags <- inserts.get(i); tag <- tags) {
      result += tag
    }
    result += text.slice(i, i + 1)
  }
  // create XML from the string buffer
  XML.loadString(result.mkString)
}

これは、最初の 2 つのテスト ケースの両方に合格しますが、3 番目のテスト ケースでは失敗します。

于 2010-11-09T22:05:41.723 に答える