Drop rows entries if a column values are all equal in grouped subset(如果列值在分组子集中都相等,则删除行条目)

I have this example df:

info = {'name': ['Jason', 'Jason', 'Jason', 'Jason','Molly', 'Molly', 'Molly', 'Molly','Nicky', 'Nicky', 'Nicky', 'Nicky'], 
'city': ['Las Vegas', 'New York', 'Dallas', 'Los Angeles','Las Vegas', 'New York', 'Dallas', 'Los Angeles','Las Vegas', 'New York', 'Dallas', 'Los Angeles'],
'Visits' :[2,2,2,2,1,3,4,1,2,8,2,8]}
df = pd.DataFrame(data=info)
df

gives:

    name    city        Visits
0   Jason   Las Vegas     2
1   Jason   New York      2
2   Jason   Dallas        2
3   Jason   Los Angeles   2
4   Molly   Las Vegas     1
5   Molly   New York      3
6   Molly   Dallas        4
7   Molly   Los Angeles   1
8   Nicky   Las Vegas     2
9   Nicky   New York      8
10  Nicky   Dallas        2
11  Nicky   Los Angeles   8

I want to drop entries from names if all values under Visits are equivalent which is true in case of the df[‘name’]== jason. I used drop_duplicates of two subsets name and Visits but the output drops also other duplicated values under other names.

df.drop_duplicates(['name','Visits'], keep=False)

This gives:

    name    city     Visits
5   Molly   New York    3
6   Molly   Dallas      4

output should be:

4   Molly   Las Vegas     1
5   Molly   New York      3
6   Molly   Dallas        4
7   Molly   Los Angeles   1
8   Nicky   Las Vegas     2
9   Nicky   New York      8
10  Nicky   Dallas        2
11  Nicky   Los Angeles   8

What is the best approach to achieve this?

Solution:

Use nunique:

df = df[df.groupby('name')['Visits'].transform('nunique').ne(1)]

Or

df = df.groupby('name').filter(lambda x: x['Visits'].nunique() != 1)

Output:

>>> df
     name         city  Visits
4   Molly    Las Vegas       1
5   Molly     New York       3
6   Molly       Dallas       4
7   Molly  Los Angeles       1
8   Nicky    Las Vegas       2
9   Nicky     New York       8
10  Nicky       Dallas       2
11  Nicky  Los Angeles       8
————————

我有一个例子:

info = {'name': ['Jason', 'Jason', 'Jason', 'Jason','Molly', 'Molly', 'Molly', 'Molly','Nicky', 'Nicky', 'Nicky', 'Nicky'], 
'city': ['Las Vegas', 'New York', 'Dallas', 'Los Angeles','Las Vegas', 'New York', 'Dallas', 'Los Angeles','Las Vegas', 'New York', 'Dallas', 'Los Angeles'],
'Visits' :[2,2,2,2,1,3,4,1,2,8,2,8]}
df = pd.DataFrame(data=info)
df

给予:

    name    city        Visits
0   Jason   Las Vegas     2
1   Jason   New York      2
2   Jason   Dallas        2
3   Jason   Los Angeles   2
4   Molly   Las Vegas     1
5   Molly   New York      3
6   Molly   Dallas        4
7   Molly   Los Angeles   1
8   Nicky   Las Vegas     2
9   Nicky   New York      8
10  Nicky   Dallas        2
11  Nicky   Los Angeles   8

如果访问下的所有值都相等,我想删除名称中的条目,这在df[‘name’]==jason的情况下是真的。我使用了两个子集name和visions的drop_副本,但输出也会在其他名称下删除其他重复值。

df.drop_duplicates(['name','Visits'], keep=False)

这使得:

    name    city     Visits
5   Molly   New York    3
6   Molly   Dallas      4

输出应为:

4   Molly   Las Vegas     1
5   Molly   New York      3
6   Molly   Dallas        4
7   Molly   Los Angeles   1
8   Nicky   Las Vegas     2
9   Nicky   New York      8
10  Nicky   Dallas        2
11  Nicky   Los Angeles   8

实现这一目标的最佳方法是什么?

解决方法:

使用nunique:

df = df[df.groupby('name')['Visits'].transform('nunique').ne(1)]

df = df.groupby('name').filter(lambda x: x['Visits'].nunique() != 1)

输出:

>>> df
     name         city  Visits
4   Molly    Las Vegas       1
5   Molly     New York       3
6   Molly       Dallas       4
7   Molly  Los Angeles       1
8   Nicky    Las Vegas       2
9   Nicky     New York       8
10  Nicky       Dallas       2
11  Nicky  Los Angeles       8