Xarray now supports grouping by multiple variables (docs). 🎉 😱 🤯 🥳. Try it out!
Install xarray>=2024.08.0
and optionally flox for better performance with reductions.
Set up a multiple variable groupby using Grouper objects.
1import xarray as xr 2from xarray.groupers import UniqueGrouper 3 4da = xr.DataArray( 5 np.array([1, 2, 3, 0, 2, np.nan]), 6 dims="d", 7 coords=dict( 8 labels1=("d", np.array(["a", "b", "c", "c", "b", "a"])), 9 labels2=("d", np.array(["x", "y", "z", "z", "y", "x"])), 10 ), 11) 12 13gb = da.groupby(labels1=UniqueGrouper(), labels2=UniqueGrouper()) 14gb 15
<DataArrayGroupBy, grouped over 2 grouper(s), 9 groups in total: 'labels1': 3 groups with labels 'a', 'b', 'c' 'labels2': 3 groups with labels 'x', 'y', 'z'>
Reductions work as usual:
1gb.mean() 2
xarray.DataArray (labels1: 3, labels2: 3)> Size: 72B array([[1. , nan, nan], [nan, 2. , nan], [nan, nan, 1.5]]) Coordinates: * labels1 (labels1) object 24B 'a' 'b' 'c' * labels2 (labels2) object 24B 'x' 'y' 'z'
So does map
:
1gb.map(lambda x: x[0]) 2
<xarray.DataArray (labels1: 3, labels2: 3)> Size: 72B array([[ 1., nan, nan], [nan, 2., nan], [nan, nan, 3.]]) Coordinates: * labels1 (labels1) object 24B 'a' 'b' 'c' * labels2 (labels2) object 24B 'x' 'y' 'z'
Combining different grouper types is allowed, that is you can combine
categorical grouping with UniqueGrouper
, binning with BinGrouper
, and
resampling with TimeResampler
.
1ds = xr.Dataset( 2 {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))}, 3 coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, 4 ) 5gb = ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) 6gb 7
from xarray.groupers import BinGrouper ds = xr.Dataset( {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))}, coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, ) gb = ds.foo.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) gb
<DatasetGroupBy, grouped over 2 grouper(s), 4 groups in total: 'x_bins': 2 groups with labels (5,, 15], (15,, 25] 'letters': 2 groups with labels 'a', 'b'>
1gb.mean() 2
<xarray.DataArray 'foo' (x_bins: 2, letters: 2, y: 3)> Size: 96B array([[[ 0., 1., 2.], [nan, nan, nan]], [[nan, nan, nan], [ 3., 4., 5.]]]) Coordinates: * x_bins (x_bins) object 16B (5, 15] (15, 25] * letters (letters) object 16B 'a' 'b' Dimensions without coordinates: y