python - ファイルから単語を抽出し、それらの単語を含む行番号とともにファイルを一覧表示します

Question

私はStrings.hというファイルを持っています。これは、私が持っているアプリをローカライズするために使用します。すべてのクラスファイルを検索して、各文字列を使用しているかどうか、どこで使用しているかを調べ、各文字列のクラスと行番号を出力したいと思います。

私の考えはPythonを使用することですが、それはその仕事には間違ったツールかもしれません。また、基本的なアルゴリズムはありますが、実行に時間がかかりすぎるのではないかと心配しています。このスクリプトを書いて、私がやりたいことを実行したり、より良いアルゴリズムを提案したりできますか？

Strings.hは次のようになります。

#import "NonLocalizedStrings.h"

#pragma mark Coordinate Behavior Strings
#define LATITUDE_WORD NSLocalizedString(@"Latitude", @"used in coordinate behaviors")
#define LONGITUDE_WORD NSLocalizedString(@"Longitude", @"used in coordinate behaviors")
#define DEGREES_WORD NSLocalizedString(@"Degrees", @"used in coordinate behaviors")
#define MINUTES_WORD NSLocalizedString(@"Minutes", @"Used in coordiante behaviors")
#define SECONDS_WORD NSLocalizedString(@"Seconds", @"Used in DMSBehavior.m")

...

スクリプトは、＃defineで始まる各行を取得し、＃defineの後に表示される単語のリストを作成する必要があります（例）LATITUDE_WORD

擬似コードは次のようになります。

file = strings.h
for line in file:
  extract word after #define
  search_words.push(word) 

print search_words
[LATITUDE_WORD, LONGITUDE_WORD, DEGREES_WORD, MINUTES_WORD, SECONDS WORD]

単語のリストを取得すると、擬似コードは次のようになります。

found_words = {}
for word in words:
   found_words[word] = []

for file in files:
  for line in file:
    for word in search_words:
      if line contains word:
        found_words[word].push((filename, linenumber))   

print found_words

したがって、見つかった単語は次のようになります。

 {
   LATITUDE_WORD: [
                    (foo.m, 42),
                    (bar.m, 132) 
                  ],
   LONGITUDE_WORD: [
                    (baz.m, 22),
                    (bim.m, 112) 
                  ],

 }

score 3 · Accepted Answer

これは[bashで]どうですか？

$ pattern="\\<($(grep '^#define ' Strings.h | cut -d' ' -f2 | tr '\n' '|' | sed 's/|$//'))\\>"
$ find project_dir -iname '*.m' -exec egrep -Hno "${pattern}" {} + > matches

出力：

project_dir/bar.m:132:LATITUDE_WORD
project_dir/baz.m:22:LONGITUDE_WORD
project_dir/bim.m:112:LONGITUDE_WORD
project_dir/foo.m:42:LATITUDE_WORD

編集：上記のコードを変更して、出力をファイルにリダイレクトしましたmatches。これを使用して、見つからない単語を表示できます。

for word in $(grep '^#define ' Strings.h | cut -d' ' -f2)
do
    if ! cut -d':' -f3 matches | grep -q "${word}"
    then
        echo "${word}"
    fi
done

score 2 · Accepted Answer

だから、あなたは正しい考えを持っているように見えます。ここにあなたが持っているものに対するいくつかの長所と短所があります。

利点：

Pythonを使用している場合、擬似コードはほぼ1行ずつスクリプトに直接変換されます。
Pythonについてもう少し学ぶことができます（このようなもののために持っている素晴らしいスキル）。

短所：

Pythonは、投稿されている他のbashベースのソリューションよりも実行速度が少し遅くなります（検索するファイルがたくさんある場合は問題になります）。
Pythonスクリプトは、これらの他のソリューションよりも少し長くなりますが、出力を少し柔軟にすることもできます。

回答： 私はPythonに精通しており、それが最初に求めていたものであるため、使用できるコードがもう少しあります。

#!/usr/bin/env python

# List the files you want to search here
search_files = []
word_file = open('<FILE_PATH_HERE>', 'r')

# Allows for sorted output later.
words = []

#Contains all found instances.
inst_dict = {}

for line in word_file:
    if line[0:7] == "#define":
        w = line[7:].split()[0]
        words.append(w)
        inst_dict[w] = []

for file_name in search_files:
    file_obj = open(file_name, 'r')
    line_num = 0
    for line in file_obj:
        for w in words:
            if w in line:
                inst_dict[w].append((file_name,line_num))
        line_num += 1

# Do whatever you want with 'words' and 'inst_dict'
words.sort()
for w in words:
    string = w + ":\n"
    for inst in inst_dict[w]:
        string += "\tFile: " + inst[0] + "\n"
        string += "\tLine: " + inst[1] + "\n"
    print string

コードの検索部分はテストしていませんので、自己責任で「そのまま」使用してください。幸運を祈ります。必要に応じて、質問したり、コードを拡張したりしてください。あなたのリクエストは非常にシンプルで多くの解決策があるので、これがどのように機能するかを理解していただきたいと思います。

score 1 · Accepted Answer

このソリューションはとを使用awkしますglobstar（後者にはBash 4が必要です）。さらなる改善があると思いますが、これはある種のドラフトと考えてください。

shopt -s globstar

awk 'NR==FNR { if ($0 ~ /^#define/) found[$2]=""; next; } 
     {
       for (word in found){
         if ($0 ~ word) 
           found[word]=found[word] "\t" FILENAME ":" FNR "\n";
       } 
     }
     END { for (word in found) print word ":\n" found[word]}
    ' Strings.h **/*.m

あなたが投稿したStrings.hのスニペットを使用して、これが私が取得する出力の種類です（私が作成したいくつかのテストファイルを含む）

LATITUDE_WORD:
    lala1.m, 2
    lala3.m, 1

DEGREES_WORD:
    lala2.m, 5

SECONDS_WORD:

MINUTES_WORD:
    lala3.m, 3

LONGITUDE_WORD:
    lala3.m, 2

p / s：globstar現在使用しているbashはv3（pfff！）であるため、これをテストしていません。

score 0 · Accepted Answer

試してみてください：

grep -oP '^#define\s+\K\S+' strings.h

grepオプションがない場合-P：

perl -lne 'print $& if /^#define\s+\K\S+/' strings.h

score 0 · Accepted Answer

これがPythonプログラムです。おそらく削減して単純化することができますが、機能します。

import re
l=filecontent.split('\n')
for item in l:
  if item.startswith("#define"):
    print re.findall("#define .+? ", item)[0].split(' ')[1]

score 0 · Accepted Answer

#!/bin/bash
# Assuming $files constains a list of your files
word_list=( $(grep '^#define' "${files[@]}" | awk '{ print $2 }') )

python - ファイルから単語を抽出し、それらの単語を含む行番号とともにファイルを一覧表示します

6 に答える 6

Related

Reference