python - Pythonで4つのcsvファイルを読み取り、列IDに基づいて行を出力します

Question

こんにちは、私は Python の初心者で、現在学習中です。私が直面している問題について誰かが私を助けてくれるかどうか疑問に思っていました。routes.txt、trips.txt、stop_times.txt、stops.txt の 4 つのファイルがあります。ファイルは次のようになります (ファイルには数千行あります)。

routes.txt 
"route_id","agency_id","route_short_name","route_long_name","route_desc","route_type","route_url","route_color","route_text_color"
"01","1","1",,,3,,"FFFF7C","000000"
"04","1","4",,,3,,"FFFF7C","000000"
"05","1","5",,,3,,"FFFF7C","000000"
"07","1","7",,,3,,"FFFF7C","000000"

trips.txt
"route_id","service_id","trip_id","trip_headsign","direction_id","block_id","shape_id"
"108","BUSN13-hbf13011-Weekday-02","19417636","Malden Station via Salem St.",1,"F411-75","1080037"
"94","BUSN13-hbf13011-Weekday-02","19417637","Medford Square via West Medford",0,"F94-5","940014"

stop_times.txt
"trip_id","arrival_time","departure_time","stop_id","stop_sequence","stop_headsign","pickup_type","drop_off_type"
"19417636","14:40:00","14:40:00","7412",1,,0,0
"19417636","14:41:00","14:41:00","6283",2,,0,0
"19417636","14:41:00","14:41:00","6284",3,,0,0

stops.txt
stop_id","stop_code","stop_name","stop_desc","stop_lat","stop_lon","zone_id","stop_url","location_type","parent_station"
"place-alfcl","","Alewife Station","","42.395428","-71.142483","","",1,""
"place-alsgr","","Allston St. Station","","42.348701","-71.137955","","",1,""
"place-andrw","","Andrew Station","","42.330154","-71.057655","","",1,""

列 ID に基づいて行を印刷しようとしています。たとえば、route_id = "01" の場合。

check the ID in the routes.txt file and check if that ID is equal to the route_id in the Trips.txt file.

試合が同点の場合

take the trip_id from the trips.txt file and compare it with the trip_id in the stop_times.txt file

それが一致する場合は、

stop_id is equal to the stop_id of the stops_file.txt file then print. Now the stop_id can be a number or a     string

私が印刷しようとしているのは、たとえば次のようなものを印刷することです。

route_id, trip_id, arrival_time, departure_time, stop_name
01,19417636, 14:40:00,14:40:00, Alewife Station

とても有難い

score 0 · Accepted Answer

あなたがやろうとしていることは結合操作と呼ばれ、パンダライブラリを使用して非常に簡単に行うことができます。

import pandas as pd

routes = pd.read_csv('routes.txt')
trips = pd.read_csv('trips.txt')
stop_times = pd.read_csv('stop_times.txt')
stops = pd.read_csv('stops.txt')

データを正しく解釈するために、read_csvのオプションを変更する必要がある場合があります（特に、sの先行ゼロroute_id）

#   Please excuse the Dr. Seuss variable names
routes_trips = pd.merge(routes, trips, on=['route_id'])
routes_trips_stop_times = pd.merge(routes_trips, stop_times, on=['trip_id'])
routes_trips_stop_times_names = pd.merge(routes_trips_stop_times, stops, on=['stop_id'])

デフォルトでは、pandasは内部結合を実行するためroute_id、s、trip_ids、およびstop_idsが一致する行のみが表示されます。

score 0 · Accepted Answer

この場合の最も簡単な方法は、データをデータベースにインポートして SQL 結合を使用することだと思います。非常に簡単な sqlite3 を使用できます。データの量とスクリプトの実行頻度によっては、メモリ内データベースでも機能します。

外部キーフィールドのインデックスを必ず作成してください。そうしないと、検索が遅くなる可能性があります。

また、sqlite3 には、CSV ファイルから直接データをインポートする機能があります。テーブルを作成し、「.import」コマンドを使用します (sqlite3 を実行して .help と入力するか、ドキュメントを参照してください)。

ショーン

python - Pythonで4つのcsvファイルを読み取り、列IDに基づいて行を出力します

2 に答える 2

Related

Reference