1

私のデータは構造に従います:

../data/study_ID/FF_Number/Exam_Number/date,

データ ディレクトリには 176 の参加者のサブディレクトリが含まれています。ID 番号は参加者 ID を表し、次の各サブディレクトリは実験番号を表します。参加者ごとに 1 行を含む txt ファイルを作成し、次の列を作成したいと考えています: 試験 ID、FF_number、Exam_Number、および日付。

ただし、次の分析のために、参加者をチャンクあたり約 15 ~ 20 ppt のチャンクに分割したいので、少し複雑になります。

助言がありますか?乾杯。

4

1 に答える 1

1

Hmm, nobody?

You should redirect output of "find" command, consider switches -type d, and -maxdepth, and probably parse it with sed, replacing "/" with "spaces". Maybe piping through "cut" and "column -t" commands, and "sort" and "uniq" will be useful. Do names, except FF and ID, contain spaces, or special characters e.g. related to names of participants?

It should be possible to get a TXT with "one liner" and a few pipes.

You should try, and post first results of your work on this :)

EDIT: Alright, I created for me a structure with several thousands of directories and subdirectories numbered by participant, by exam number etc., which look like this ( maybe it's not identical with what you have, but don't worry ). Studies are numbered from 5 to 150, FF from 45 to 75, and dates from 2012_01_00 to 2012_01_30 - which makes really huge quantity of directories in total.

/Users/pwadas/bzz/data
/Users/pwadas/bzz/data/study_005
/Users/pwadas/bzz/data/study_005/05_Num
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_00
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_01
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_02
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_03
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_04
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_05
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_06
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_07
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_08
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_09
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_10
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_11
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_12

Now, I want ( quote ) "txt file with one line per participants and the following columns: study ID, FF_number, Exam_Number and date."

So I use the following one-liner:

find /Users/pwadas/bzz/data -type d | head -n 5000 |cut -d'/' -f5-7  | uniq |while read line; do echo -n "$line: " && ls -d /Users/pwadas/bzz/$line/*Exam/* | perl -0pe 's/.*2012/2012/g;s/\n/ /g' && echo ; done  > out.txt

and here is the output ( a few first lines from out.txt ). Lines are very long, I cutted it on output for first 80-90 characters:

dtpwmbp:data pwadas$ cat out.txt |cut -c1-90
data: 
data/study_005: 
data/study_005/05_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
data/study_005/06_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
data/study_005/07_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
data/study_005/08_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
dtpwmbp:data pwadas$ 

I hope this will help you a little, and you'll be able to modify it according to your needs and patterns, and that seems to be all I can do :) You should analyze the one liner, especially "cut" command, and perl-regex part, which removes newlines and full directory name from "ls" output. This is probably fair from optimal, but beautifying is not the point here, I guess :) So, good luck :) PS. "head" command limits output for N first lines, you'll probably want to skip out | head .. | part.

于 2012-09-19T17:43:11.557 に答える