GroupBy#

GroupBy operations allow you to split data into groups based on some criteria, apply a function to each group independently, and combine the results.

Example#

import pandasCore as pd

df = pd.DataFrame({
    'A': ['foo', 'bar', 'foo', 'bar'],
    'B': [1, 2, 3, 4],
    'C': [2.0, 4.0, 6.0, 8.0]
})

# Group by single column
grouped = df.groupby('A')

# Aggregation
df.groupby('A').sum()
df.groupby('A').mean()
df.groupby('A').agg(['sum', 'mean'])

Parameters#

Parameter	Type	Default	Description
by	str/list	required	Column(s) to group by
axis	int	0	Split along rows (0) or columns (1)
level	int/str	None	Group by index level
as_index	bool	True	Return with group labels as index
sort	bool	True	Sort group keys
group_keys	bool	True	Add group keys to index
observed	bool	False	Only show observed values for categoricals
dropna	bool	True	Drop groups with NA values

Aggregation Methods#

Method	Description
count()	Count of values
sum()	Sum of values
mean()	Mean of values
median()	Median of values
std()	Standard deviation
var()	Variance
min()	Minimum
max()	Maximum
first()	First value
last()	Last value
prod()	Product
size()	Group sizes
sem()	Standard error of mean
describe()	Descriptive statistics
nunique()	Count unique values

Transformation Methods#

Method	Description
apply(func)	Apply function to each group
transform(func)	Transform with function
filter(func)	Filter groups with function
agg(func)	Aggregate with function(s)

Iteration#

# Iterate over groups
for name, group in df.groupby('A'):
    print(name)
    print(group)