ruby - ルビーで英数字配列をソートする方法

Question

配列データをルビーで英数字でソートするにはどうすればよいですか？

私の配列がa = [test_0_1, test_0_2, test_0_3, test_0_4, test_0_5, test_0_6, test_0_7, test_0_8, test_0_9, test_1_0, test_1_1, test_1_2, test_1_3, test_1_4, test_1_5, test_1_6, test_1_7, test_1_8, test_1_9, test_1_10, test_1_11, test_1_12, test_1_13, test_1_14, ...........test_1_121...............]

出力を次のようにします。

.
.
.
test_1_121
.
.
.
test_1_14
test_1_13
test_1_12
test_1_11
test_1_10
test_1_9
test_1_8
test_1_7
test_1_6
test_1_5
test_1_4
test_1_3
test_1_2
test_1_1
test_0_10
test_0_9
test_0_8
test_0_7
test_0_6
test_0_5
test_0_4
test_0_3
test_0_2
test_0_1

score 8 · Accepted Answer

任意の位置にパディングされていないシーケンス番号を含む文字列をソートするための一般的なアルゴリズム。

padding = 4
list.sort{|a,b|
  a,b = [a,b].map{|s| s.gsub(/\d+/){|m| "0"*(padding - m.size) + m } }
  a<=>b
}

ここで、パディングは、比較中に数値に含めるフィールドの長さです。文字列で見つかった数値は、「パディング」桁数より少ない場合、比較前にゼロパディングされます。これにより、期待されるソート順が得られます。

user682932によって要求された結果を生成する.reverseには、sortブロックの後に追加するだけです。これにより、自然な順序（昇順）が降順に反転します。

文字列のプリループを使用すると、もちろん、文字列のリストで最大桁数を動的に見つけることができます。これは、任意のパディング長をハードコーディングする代わりに使用できますが、より多くの処理（低速）ともう少しコード。例えば

padding = list.reduce(0){|max,s| 
  x = s.scan(/\d+/).map{|m|m.size}.max
  (x||0) > max ? x : max
}

score 5 · Accepted Answer

たとえば、単に文字列として並べ替えると、「test_2」と「test_10」の間の正しい順序が得られません。そうしてください：

sort_by{|s| s.scan(/\d+/).map{|s| s.to_i}}.reverse

score 2 · Accepted Answer

ブロックをsort関数に渡して、カスタムソートすることができます。あなたの場合、あなたの数はゼロで埋められていないので問題があります、それでこの方法は数値部分をゼロで埋めて、それからそれらをソートします、そしてあなたの望ましいソート順序になります。

a.sort { |a,b|
  ap = a.split('_')
  a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
  bp = b.split('_')
  b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
  b <=> a
}

score 2 · Accepted Answer

ソートルーチンの処理時間は大きく異なります。この種のベンチマークのバリエーションは、物事を行うための最速の方法にすぐに帰着することができます。

#!/usr/bin/env ruby

ary = %w[
    test_0_1  test_0_2   test_0_3 test_0_4 test_0_5  test_0_6  test_0_7
    test_0_8  test_0_9   test_1_0 test_1_1 test_1_2  test_1_3  test_1_4  test_1_5
    test_1_6  test_1_7   test_1_8 test_1_9 test_1_10 test_1_11 test_1_12 test_1_13
    test_1_14 test_1_121
]

require 'ap'
ap ary.sort_by { |v| a,b,c = v.split(/_+/); [a, b.to_i, c.to_i] }.reverse

そしてその出力：

>> [
>>     [ 0] "test_1_121",
>>     [ 1] "test_1_14",
>>     [ 2] "test_1_13",
>>     [ 3] "test_1_12",
>>     [ 4] "test_1_11",
>>     [ 5] "test_1_10",
>>     [ 6] "test_1_9",
>>     [ 7] "test_1_8",
>>     [ 8] "test_1_7",
>>     [ 9] "test_1_6",
>>     [10] "test_1_5",
>>     [11] "test_1_4",
>>     [12] "test_1_3",
>>     [13] "test_1_2",
>>     [14] "test_1_1",
>>     [15] "test_1_0",
>>     [16] "test_0_9",
>>     [17] "test_0_8",
>>     [18] "test_0_7",
>>     [19] "test_0_6",
>>     [20] "test_0_5",
>>     [21] "test_0_4",
>>     [22] "test_0_3",
>>     [23] "test_0_2",
>>     [24] "test_0_1"
>> ]

速度のアルゴリズムをテストすると、次のことがわかります。

require 'benchmark'

n = 50_000
Benchmark.bm(8) do |x|
  x.report('sort1') { n.times { ary.sort { |a,b| b <=> a }         } }
  x.report('sort2') { n.times { ary.sort { |a,b| a <=> b }.reverse } }
  x.report('sort3') { n.times { ary.sort { |a,b|
                                  ap = a.split('_')
                                  a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
                                  bp = b.split('_')
                                  b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
                                  b <=> a
                                } } }

  x.report('sort_by1') { n.times { ary.sort_by { |s| s                                               }         } }
  x.report('sort_by2') { n.times { ary.sort_by { |s| s                                               }.reverse } }
  x.report('sort_by3') { n.times { ary.sort_by { |s| s.scan(/\d+/).map{ |s| s.to_i }                 }.reverse } }
  x.report('sort_by4') { n.times { ary.sort_by { |v| a = v.split(/_+/); [a[0], a[1].to_i, a[2].to_i] }.reverse } }
  x.report('sort_by5') { n.times { ary.sort_by { |v| a,b,c = v.split(/_+/); [a, b.to_i, c.to_i]      }.reverse } }
end


>>               user     system      total        real
>> sort1     0.900000   0.010000   0.910000 (  0.919115)
>> sort2     0.880000   0.000000   0.880000 (  0.893920)
>> sort3    43.840000   0.070000  43.910000 ( 45.970928)
>> sort_by1  0.870000   0.010000   0.880000 (  1.077598)
>> sort_by2  0.820000   0.000000   0.820000 (  0.858309)
>> sort_by3  7.060000   0.020000   7.080000 (  7.623183)
>> sort_by4  6.800000   0.000000   6.800000 (  6.827472)
>> sort_by5  6.730000   0.000000   6.730000 (  6.762403)
>>

Sort1とsort2、sort_by1とsort_by2は、のベースラインを確立するのに役立ちsortます。sort_byreverse

ソートsort3とsort_by3は、このページの他の2つの回答です。Sort_by4とsort_by5は、私がそれを行う方法の2つのスピンであり、sort_by5は、数分間いじくり回した後、私が思いついた最速のものです。

これは、アルゴリズムのわずかな違いが最終出力にどのように違いをもたらすかを示しています。より多くの反復があった場合、またはより大きな配列がソートされている場合、違いはより極端になります。

score 1 · Accepted Answer

@ctcherryの回答に似ていますが、より高速です。

a.sort_by {|s| "%s%05i%05i" % s.split('_') }.reverse

編集：私のテスト：

require 'benchmark'
ary = []
100_000.times { ary << "test_#{rand(1000)}_#{rand(1000)}" }
ary.uniq!; puts "Size: #{ary.size}"

Benchmark.bm(5) do |x|
  x.report("sort1") do
    ary.sort_by {|e| "%s%05i%05i" % e.split('_') }.reverse
  end
  x.report("sort2") do
    ary.sort { |a,b|
      ap = a.split('_')
      a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
      bp = b.split('_')
      b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
      b <=> a
    } 
  end
  x.report("sort3") do
    ary.sort_by { |v| a, b, c = v.split(/_+/); [a, b.to_i, c.to_i] }.reverse
  end
end

出力：

Size: 95166

           user     system      total        real
sort1  3.401000   0.000000   3.401000 (  3.394194)
sort2 94.880000   0.624000  95.504000 ( 95.722475)
sort3  3.494000   0.000000   3.494000 (  3.501201)

score 1 · Accepted Answer

Rubyで自然な10進数のソートを実行するためのより一般的な方法をここに投稿します。以下は、 https ： //github.com/CocoaPods/Xcodeproj/blob/ca7b41deb38f43c14d066f62a55edcd53876cd07/lib/xcodeproj/project/object/helpers/sort_helper.rbから「Xcodeのように」ソートするための私のコードに触発されています。 //rosettacode.org/wiki/Natural_sorting#Ruby。

自然な10進数の並べ替えでは、「10」を「2」の後に配置する必要があることが明らかな場合でも、複数の可能な代替動作を検討する必要がある他の側面があります。

等式を「001」/「01」のように扱うにはどうすればよいですか。元の配列の順序を維持するのでしょうか、それともフォールバックロジックを使用するのでしょうか。（以下では、最初のパスが等しい場合に、厳密な順序付けロジックを使用して2番目のパスを選択します）
並べ替えのために連続するスペースを無視しますか、それとも各スペース文字がカウントされますか？（以下では、最初のパスで連続するスペースを無視し、等式パスで厳密に比較することを選択します）
他の特殊文字についても同じ質問です。（以下では、スペース以外の文字と数字以外の文字を個別にカウントするように選択されています）
大文字と小文字を区別するかどうか。「A」は「A」の前ですか、それとも後ですか？（以下では、最初のパスで大文字と小文字を無視するように選択されており、等式パスでは「A」の前に「a」があります）

これらの考慮事項について：

これは、比較する3種類の部分文字列（数字、スペース、残りすべて）が存在する可能性があるため、ほぼ確実scanにの代わりに使用する必要があることを意味します。split
これは、ほぼ確実にComparableクラスを操作する必要があることを意味します。コンテキストに応じて2つの異なる動作（最初のパスと等式パス）を持つ他def <=>(other)のサブストリングに単純に各サブストリングを作成することはできないためです。map

これにより、実装に少し時間がかかりますが、エッジの状況ではうまく機能します。

  # Wrapper for a string that performs a natural decimal sort (alphanumeric).
  # @example
  #   arrayOfFilenames.sort_by { |s| NaturalSortString.new(s) }
  class NaturalSortString
    include Comparable
    attr_reader :str_fallback, :ints_and_strings, :ints_and_strings_fallback, :str_pattern

    def initialize(str)
      # fallback pass: case is inverted
      @str_fallback = str.swapcase
      # first pass: digits are used as integers, spaces are compacted, case is ignored
      @ints_and_strings = str.scan(/\d+|\s+|[^\d\s]+/).map do |s|
        case s
        when /\d/ then Integer(s, 10)
        when /\s/ then ' '
        else s.downcase
        end
      end
      # second pass: digits are inverted, case is inverted
      @ints_and_strings_fallback = @str_fallback.scan(/\d+|\D+/).map do |s|
        case s
        when /\d/ then Integer(s.reverse, 10)
        else s
        end
      end
      # comparing patterns
      @str_pattern = @ints_and_strings.map { |el| el.is_a?(Integer) ? :i : :s }.join
    end

    def <=>(other)
      if str_pattern.start_with?(other.str_pattern) || other.str_pattern.start_with?(str_pattern)
        compare = ints_and_strings <=> other.ints_and_strings
        if compare != 0
          # we sort naturally (literal ints, spaces simplified, case ignored)
          compare
        else
          # natural equality, we use the fallback sort (int reversed, case swapped)
          ints_and_strings_fallback <=> other.ints_and_strings_fallback
        end
      else
        # type mismatch, we sort alphabetically (case swapped)
        str_fallback <=> other.str_fallback
      end
    end
  end

使用法

例1：

arrayOfFilenames.sort_by { |s| NaturalSortString.new(s) }

例2：

arrayOfFilenames.sort! do |x, y|
  NaturalSortString.new(x) <=> NaturalSortString.new(y)
end

私のテストケースはhttps://github.com/CocoaPods/Xcodeproj/blob/ca7b41deb38f43c14d066f62a55edcd53876cd07/spec/project/object/helpers/sort_helper_spec.rbにあります。ここでは、このリファレンスを注文に使用しました：['a'、'a '、' 0.1.1'、' 0.1.01'、' 0.1.2'、' 0.1.10'、' 1'、' 01'、' 1a'、' 2'、' 2 a'、' 10 ' 、'a'、'A'、'a'、'a 2'、'a1'、'A1B001'、'A01B1'、]

もちろん、今すぐ独自の並べ替えロジックを自由にカスタマイズしてください。

score 1 · Accepted Answer

ウィキペディアのページでUnixsort関数を確認しました。そのGNUバージョンには、-V「バージョン文字列」を一般的にソートするフラグがあります。（これは、数字と非数字の混合を意味し、数値部分を数値でソートし、非数値部分を字句的にソートすることを意味します）。

記事には次のように記載されています。

GNU実装には、テキスト内の自然な種類の（バージョン）番号である-V--version-sortオプションがあります。比較される2つのテキスト文字列は、文字のブロックと数字のブロックに分割されます。文字のブロックは英数字で比較され、数字のブロックは数値で比較されます（つまり、先行ゼロをスキップすると、数字が多いほど大きくなります。そうでない場合は、異なる左端の数字が結果を決定します）。ブロックは左から右に比較され、そのループ内の最初の等しくないブロックが、どちらのテキストが大きいかを決定します。これは、IPアドレス、Debianパッケージのバージョン文字列、および可変長の数値が文字列に埋め込まれている同様のタスクで機能します。

sawaのソリューションはこのように機能しますが、数値以外の部分で並べ替えることはありません。

したがって、GNUのように機能するCoeurとsawaの間のどこかに解決策を投稿することは有用であるように思われますsort -V

a.sort_by do |r|
  # Split the field into array of [<string>, nil] or [nil, <number>] pairs
  r.to_s.scan(/(\D+)|(\d+)/).map do |m|
    s,n = m
    n ? n.to_i : s.to_s # Convert number strings to integers
  end.to_a
end

私の場合、TSVファイルを次のようにフィールドで並べ替えたかったので、ボーナスとして、この場合のスクリプトもあります。

require 'csv'

# Sorts a tab-delimited file input on STDIN, sortin

opts = {
  headers:true,
  col_sep: "\t",
  liberal_parsing: true,
}

table = CSV.new($stdin, **opts)


# Emulate unix's sort -V: split each field into an array of string or
# numeric values, and sort by those in turn. So for example, A10
# sorts above A100.
sorted_ary = table.sort_by do |r|
  r.fields.map do |f|
    # Split the field into array of [<string>, nil] or [nil, <number>] values
    f.to_s.scan(/(\D+)|(\d+)/).map do |m|
      s,n = m
      n ? n.to_i : s.to_s # Convert number strings to integers
    end.to_a
  end
end

puts CSV::Table.new(sorted_ary).to_csv(**opts)

（余談ですが、ここでの別の解決策は、を使用してソートしますGem::Versionが、それは整形式のGemバージョンの文字列でのみ機能するようです。）

score 0 · Accepted Answer

見た目からは、ソート機能や逆関数を使いたいと思います。

ruby-1.9.2-p136 :009 > a = ["abc_1", "abc_11", "abc_2", "abc_3", "abc_22"]
 => ["abc_1", "abc_11", "abc_2", "abc_3", "abc_22"] 

ruby-1.9.2-p136 :010 > a.sort
 => ["abc_1", "abc_11", "abc_2", "abc_22", "abc_3"] 
ruby-1.9.2-p136 :011 > a.sort.reverse
 => ["abc_3", "abc_22", "abc_2", "abc_11", "abc_1"]

score 0 · Accepted Answer

わかりました、あなたの出力から、あなたはそれを逆にしたいようです、それで使用してくださいreverse()

a.reverse

ruby - ルビーで英数字配列をソートする方法

9 に答える 9

使用法

Related

Reference