GroupBy#

GroupBy operations allow you to split data into groups based on some criteria, apply a function to each group independently, and combine the results.

Example#

import pandasCore as pd

df = pd.DataFrame({
    'A': ['foo', 'bar', 'foo', 'bar'],
    'B': [1, 2, 3, 4],
    'C': [2.0, 4.0, 6.0, 8.0]
})

# Group by single column
grouped = df.groupby('A')

# Aggregation
df.groupby('A').sum()
df.groupby('A').mean()
df.groupby('A').agg(['sum', 'mean'])

Parameters#

Parameter

Type

Default

Description

by

str/list

required

Column(s) to group by

axis

int

0

Split along rows (0) or columns (1)

level

int/str

None

Group by index level

as_index

bool

True

Return with group labels as index

sort

bool

True

Sort group keys

group_keys

bool

True

Add group keys to index

observed

bool

False

Only show observed values for categoricals

dropna

bool

True

Drop groups with NA values

Aggregation Methods#

Method

Description

count()

Count of values

sum()

Sum of values

mean()

Mean of values

median()

Median of values

std()

Standard deviation

var()

Variance

min()

Minimum

max()

Maximum

first()

First value

last()

Last value

prod()

Product

size()

Group sizes

sem()

Standard error of mean

describe()

Descriptive statistics

nunique()

Count unique values

Transformation Methods#

Method

Description

apply(func)

Apply function to each group

transform(func)

Transform with function

filter(func)

Filter groups with function

agg(func)

Aggregate with function(s)

Iteration#

# Iterate over groups
for name, group in df.groupby('A'):
    print(name)
    print(group)