GroupBy#
GroupBy operations allow you to split data into groups based on some criteria, apply a function to each group independently, and combine the results.
Example#
import pandasCore as pd
df = pd.DataFrame({
'A': ['foo', 'bar', 'foo', 'bar'],
'B': [1, 2, 3, 4],
'C': [2.0, 4.0, 6.0, 8.0]
})
# Group by single column
grouped = df.groupby('A')
# Aggregation
df.groupby('A').sum()
df.groupby('A').mean()
df.groupby('A').agg(['sum', 'mean'])
Parameters#
Parameter |
Type |
Default |
Description |
|---|---|---|---|
by |
str/list |
required |
Column(s) to group by |
axis |
int |
0 |
Split along rows (0) or columns (1) |
level |
int/str |
None |
Group by index level |
as_index |
bool |
True |
Return with group labels as index |
sort |
bool |
True |
Sort group keys |
group_keys |
bool |
True |
Add group keys to index |
observed |
bool |
False |
Only show observed values for categoricals |
dropna |
bool |
True |
Drop groups with NA values |
Aggregation Methods#
Method |
Description |
|---|---|
count() |
Count of values |
sum() |
Sum of values |
mean() |
Mean of values |
median() |
Median of values |
std() |
Standard deviation |
var() |
Variance |
min() |
Minimum |
max() |
Maximum |
first() |
First value |
last() |
Last value |
prod() |
Product |
size() |
Group sizes |
sem() |
Standard error of mean |
describe() |
Descriptive statistics |
nunique() |
Count unique values |
Transformation Methods#
Method |
Description |
|---|---|
apply(func) |
Apply function to each group |
transform(func) |
Transform with function |
filter(func) |
Filter groups with function |
agg(func) |
Aggregate with function(s) |
Iteration#
# Iterate over groups
for name, group in df.groupby('A'):
print(name)
print(group)