regex - クエリ文字列値を名前付きグループに解析するための正規表現

Question

次の内容の HTML があります。

... some text ...
<a href="file.aspx?userId=123&section=2">link</a> ... some text ...
... some text ...
<a href="file.aspx?section=5&user=678">link</a> ... some text ...
... some text ...

それを解析して、名前付きグループと一致するようにしたいと思います:

マッチ1

グループ["ユーザー"]=123

グループ[「セクション」]=2

マッチ2

グループ["ユーザー"]=678

グループ[「セクション」]=5

パラメータが常に最初にユーザー、次にセクションの順序である場合は実行できますが、順序が異なる場合の実行方法がわかりません。

ありがとうございました！

score 8 · Accepted Answer

私の場合、WP7 ではユーティリティHttpUtility.ParseQueryStringを使用できないため、Url を解析する必要がありました。そこで、次のような拡張メソッドを作成しました。

public static class UriExtensions
{
    private static readonly Regex queryStringRegex;
    static UriExtensions()
    {
        queryStringRegex = new Regex(@"[\?&](?<name>[^&=]+)=(?<value>[^&=]+)");
    }

    public static IEnumerable<KeyValuePair<string, string>> ParseQueryString(this Uri uri)
    {
        if (uri == null)
            throw new ArgumentException("uri");

        var matches = queryStringRegex.Matches(uri.OriginalString);
        for (int i = 0; i < matches.Count; i++)
        {
            var match = matches[i];
            yield return new KeyValuePair<string, string>(match.Groups["name"].Value, match.Groups["value"].Value);
        }
    }
}

次に、たとえば、それを使用する問題です

        var uri = new Uri(HttpUtility.UrlDecode(@"file.aspx?userId=123&section=2"),UriKind.RelativeOrAbsolute);
        var parameters = uri.ParseQueryString().ToDictionary( kvp => kvp.Key, kvp => kvp.Value);
        var userId = parameters["userId"];
        var section = parameters["section"];

注:辞書の代わりに IEnumerable を直接返すのは、パラメーターの名前が重複している可能性があると想定しているからです。重複した名前がある場合、ディクショナリは例外をスローします。

score 5 · Accepted Answer

正規表現を使用して分割するのはなぜですか?

最初にクエリ文字列を抽出できます。& で結果を分割し、その結果を = で分割してマップを作成します。

score 1 · Accepted Answer

作業している言語を指定しませんでしたが、これは C# でうまくいくはずです。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace RegexTest
{
    class Program
    {
        static void Main(string[] args)
        {
            string subjectString = @"... some text ...
                <a href=""file.aspx?userId=123&section=2"">link</a> ... some text ...
... some text ...
<a href=""file.aspx?section=5&user=678"">link</a> ... some text ...
... some text ...";
            Regex regexObj = 
               new Regex(@"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)&section=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
            Match matchResults = regexObj.Match(subjectString);
            while (matchResults.Success)
            {
                string user = matchResults.Groups["user"].Value;
                string section = matchResults.Groups["section"].Value;
                Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
                matchResults = matchResults.NextMatch();
            }
            Console.ReadKey();
        }
    }
}

score 0 · Accepted Answer

もう1つのアプローチは、キャプチャグループを先読みの中に配置することです。

Regex r = new Regex(@"<a href=""file\.aspx\?" +
                    @"(?=[^""<>]*?user=(?<user>\w+))" +
                    @"(?=[^""<>]*?section=(?<section>\w+))";

パラメータが2つしかない場合は、Mikeとstragerによって提案された交互ベースのアプローチよりもこの方法を好む理由はありません。ただし、 3つのパラメーターを一致させる必要がある場合、他の正規表現は現在の長さの数倍になりますが、これは既存の2つの正規表現と同じように別の先読みが必要なだけです。

ちなみに、Clausに対するあなたの応答とは反対に、どの言語で作業しているかはかなり重要です。言語ごとに機能、構文、およびAPIに大きな違いがあります。

score 0 · Accepted Answer

使用している正規表現のフレーバーを言いませんでした。サンプル URL は .aspx ファイルにリンクしているため、.NET を想定します。.NET では、1 つの正規表現に同じ名前の複数の名前付きキャプチャグループを含めることができ、.NET はそれらを 1 つのグループであるかのように扱います。したがって、正規表現を使用できます

userID=(?<user>\d+)&section=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)

代替を使用したこの単純な正規表現は、ルックアラウンドを使用したどのトリックよりもはるかに効率的です。パラメータがリンク内にある場合にのみ一致するパラメータが要件に含まれている場合は、簡単に拡張できます。

score 0 · Accepted Answer

順序付けの問題を克服する単純な python 実装

In [2]: x = re.compile('(?:(userId|section)=(\d+))+')

In [3]: t = 'href="file.aspx?section=2&userId=123"'

In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]

In [5]: t = 'href="file.aspx?userId=123&section=2"'

In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]

score 0 · Accepted Answer

正規表現を使用して最初にキーと値のペアを見つけてから分割を行う...正しくないようです。

完全な正規表現ソリューションに興味があります。

誰？

score 0 · Accepted Answer

これをチェックしてください

\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>

Groups["key"].Captures[i] & Groups["value"].Captures[i] のようなものでペアを取得できます

score 0 · Accepted Answer

おそらく次のようなものです（私は正規表現に慣れていないので、そもそも正規表現が得意ではありませんでした。テストされていません）：

/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/

(ちなみに、XHTML は形式が正しくありません。& は属性に & である必要があります。)

regex - クエリ文字列値を名前付きグループに解析するための正規表現

9 に答える 9

Related

Reference