I have a DataFrame data
laid out like this:
Observation A_1 A_2 A_3 B_1 B_2 B_3
Obs1 yes no yes no no no
Obs2 no no no yes yes yes
Obs3 yes yes yes yes yes yes
The goal: calculate the frequency of all observations marked "yes" that are:
- only in "A" samples
- only in "B" samples
- In both groups
EDIT: This means that I need to exclude, for the first two counts, the observations that contain "yes" for both the A and B group (see third line).
I thought about using groupby
:
grouper = data.groupby(lambda x: x.split("_")[0], axis=1)
grouped = grouper.agg(lambda x: sum(x == "yes"))
But I have counts divided by row, which is not what I want.
What would be the best couse of action here?
EDIT: As requested, more information on the output. I'd like something like
Frequency of valid [meaning "yes"] observations in group A: X
Frequency of valid observations in group "B": Y
Frequency for all valid observations: Z
Where X, Y, and Z are the counts returned.
I'm not caring for this specific output for the individual observations. I'm interested in values across all of them.