java - JavaregExURLマッチングの問題

Question

いつものようによろしくお願いします。

正規表現に慣れようとしていますが、URLの一致に問題があります。

URLの例を次に示します。

www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

これが私の正規表現の内訳です：

[site]/[dir]*?/[year]/[month]/[day]/[storyTitle]?/[id]/htmlpage.html

これ[id]は22文字の長さの文字列で、大文字または小文字、および数字を使用できます。ただし、URLからそれを抽出したくありません。明確にするだけ

ここで、このURLから2つの値を抽出する必要があります。

まず、dirsを抽出する必要があります。ただし、[dir]はオプションですが、必要な数だけ指定することもできます。言い換えると、そのパラメータはそこにないか、dir1/dir2/dir3 ..etcである可能性があります。だから、私の最初の例をやめます：

    www.examplesite.com/dir1/dir2/dir3/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

dir1/dir2/dir3ここで、dirがすべて小文字の単一の単語である文字列である場所を抽出する必要があります（つまり、sports / mlb / games）。dirには数字はなく、例としてのみ使用しています。

しかし、この有効なURLの例では、次のようになります。

www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

ない[dir]ので何も抽出しません。したがって、[dir]はオプションです

次に、上記のようにオプションである[storyTitle]場所を抽出する必要がありますが、存在する場合は1つしか存在できません。[storyTitle][dir]storyTitle

だから私の前の例から離れて

www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

'title-of-some-story'ストーリーのタイトルが常に小文字であるダッシュで区切られた文字列である場所を抽出する必要がある場合に有効です。以下の例も有効です。

www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

上記の例では、[storyTitle]このようにオプションにすることはありません

最後に、念のため、a[dir]とaのないURL[storyTitle]も有効です。例：

www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

有効なURLです。どんな入力でも参考になると思います。

score 1 · Accepted Answer

これが機能する1つの例です。

public static void main(String[] args) {

    Pattern p = Pattern.compile("(?:http://)?.+?(/.+?)?/\\d+/\\d{2}/\\d{2}(/.+?)?/\\w{22}");

    String[] strings ={
            "www.examplesite.com/dir1/dir2/4444/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html"
    };
    for (int idx = 0; idx < strings.length; idx++) {
        Matcher m = p.matcher(strings[idx]);
        if (m.find()) {
            String dir = m.group(1);
            String title = m.group(2);
            if (title != null) {
                title = title.substring(1); // remove the leading /
            }
            System.out.println(idx+": Dir: "+dir+", Title: "+title);
        }
    }
}

score 0 · Accepted Answer

これはすべての正規表現ソリューションです。

編集： http://を許可します

Java ソース:

import java.util.*;
import java.lang.*;
import java.util.regex.*;

class Main
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String url = "http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
        String url2 = "www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
        String url3 = "www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";

        String patternStr = "(?:http://)?[^/]*[/]?([\\S]*)/[\\d]{4}/[\\d]{2}/[\\d]{2}[/]?([\\S]*)/[\\S]*/[\\S]*";

        // Compile regular expression
        Pattern pattern = Pattern.compile(patternStr);


        // Match 1st url
        System.out.println("Match 1st URL:");
        Matcher matcher = pattern.matcher(url);

        if (matcher.find()) {
            System.out.println("URL: " + matcher.group(0));
            System.out.println("DIR: " + matcher.group(1));
            System.out.println("TITLE: " + matcher.group(2));
        }
        else{ System.out.println("No match."); }


        // Match 2nd url
        System.out.println("\nMatch 2nd URL:");
        matcher = pattern.matcher(url2);

        if (matcher.find()) {
            System.out.println("URL: " + matcher.group(0));
            System.out.println("DIR: " + matcher.group(1));
            System.out.println("TITLE: " + matcher.group(2));
        }
        else{ System.out.println("No match."); }


        // Match 3rd url
        System.out.println("\nMatch 3rd URL:");
        matcher = pattern.matcher(url3);

        if (matcher.find()) {
            System.out.println("URL: " + matcher.group(0));
            System.out.println("DIR: " + matcher.group(1));
            System.out.println("TITLE: " + matcher.group(2));
        }
        else{ System.out.println("No match."); }
    }
}

出力：

Match 1st URL:
URL: http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir
TITLE: title-of-some-story

Match 2nd URL:
URL: www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir/dir2/dir3
TITLE: 

Match 3rd URL:
URL: www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: 
TITLE: title-of-some-story

java - JavaregExURLマッチングの問題

2 に答える 2

Related

Reference