First of all, I would like to point out that I am a python newbie and I am totally inexperienced at coding, so please be patient. I've already searched for an answer to my problem but with no success. I have a bunch of lines in text with names and teams in this format:
Team (year)|Surname1, Name1
e.g.
Yankees (1993)|Abbot, Jim
Yankees (1994)|Abbot, Jim
Yankees (1993)|Assenmacher, Paul
Yankees (2000)|Buddies, Mike
Yankees (2000)|Canseco, Jose
and so on for several years and several teams. I would like to aggregate names of players according to team (year) combination deleting any duplicated names (it may happen that in the original database there is some redundant information). In the example, my output should be:
Yankees (1993)|Abbot, Jim, Assenmacher, Paul
Yankees (1994)|Abbot, Jim
Yankees (2000)|Buddies, Mike, Canseco, Jose
I've written this code so far:
file_in = open('filein.txt')
file_out = open('fileout.txt', 'w+')
from collections import defaultdict
teams = defaultdict(set)
for line in file_in:
items = [line.split('|')]
team = items[0]
name = items[1]
teams[team].add(name)
I end up with a big dictionary made up by keys (the name of the team and the year) and sets of values. But I don't know exactly how to go on to aggregate things.
I would also be able to compare my final sets of values (e.g. how many players have Yankee's team of 1993 and 1994 in common?). How can I do this?
Any help is appreciated