Xarray with snowtools

In order to deal with NetCDF files in general, xarray is a well known tool. The interaction with snowtools is made this way (to allows compatibility with old SURFEX simulation for example)

Opening a netCDF file with xarray

The xarray_snowtools_backend module contains all elements defining the xarray entry points for SURFEX Input / Outputs in NetCDF format (methods open_dataset, open_dataarray and open_mfdataset, responsible for reading files and returning an xarray DataArray or Dataset Object) to be used within the snowtools package.

These entry points are defined in the SnowtoolsBackendEntrypoint class inheriting from the xarray-native BackendEntrypoint class (https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html). WARNING : xarray BackendEntrypoint class is not implemented on xarray before 0.18 (default Meteo-France install)

These entry points are designed to deal with the following requirements :

  • Ensure that the coordinates associated to the time dimension are datetime objects. By default xarray try to decode time variables as datetime objects based on their “unit” attribute. However, some SURFEX variables, such as “SNOWAGE”, have an unit attribute that can not be interpreted by the default cftime tool, raising the following error:

    “ValueError: Failed to decode variable ‘SNOWAGE’: unable to decode time units ‘days since snowfall’ with ‘the default calendar’. Try opening your dataset with decode_times=False or installing cftime if it is not installed.”

    The solution here is to switch off the decoding of time variables when reading data (“decode_times=False”) and doing so manually afterward for the dime dimension alone (see “decode_time_dimension” method)

  • Ensure that variable and dimension names are standard names for backward compatibility of SURFEX versions and to deal with data coming from different sources (this replaces the former prosimu functionality designed to reduce dependency to variable and dimension names of SURFEX outputs). This is done by mapping the possible names of a variable / dimension (when different names coexist) to the a standard name through the “dimension_map” and “variables_map” dictionnaries (see “update_varname” and “update_dimname” methods).

  • Remove len 1 dimensions (the replaces the former prosimu functionality designed to remove Number_of_patches / Number_of_tiles dimensions from SURFEX outputs). This is done by calling the native xarray “squeeze” method on the target DataArray / Dataset.

  • Ensure that the time dimension is in the first one for numpy-based tools backward compatibility ((this replaces the former prosimu functionality designe to reduce the dependency to the dimensions order of SURFEX outputs). This is done by calling the xarray native “transpose” method.

  • Ensure that ‘missing_value’ and “_FillValue” attributes do not differ to avoid crashing when trying to write data (see “check_encoding” method for more information)

To use these entry points, the native xarray methods open_dataset, open_dataarray and open_mfdataset should simply be called with the keyword argument “engine=’snowtools’” except if you have an older xarray version (see below).

Meteo-France usage (until next xarray update from version 0.16.0)

import xarray as xr
from snowtools.utils import xarray_snowtools

ds = xr.open_dataset('INPUT.nc', decode_times=False)
ds = xarray_snowtools.preprocess(ds)

Usage with xarray > 0.18

from snowtools.utils import xarray_snowtools
import xarray as xr

ds = xr.open_dataset(filename, engine='snowtools')
ds = xr.open_mfdataset(list_of_files, engine='snowtools')

Use with statements!

We encourage you to use with statements as soon as possible to ensure that the files are correctly closed:

from snowtools.utils import xarray_snowtools
import xarray as xr

with xr.open_dataset(filename, decode_times=False) as ds:
    ds = xarray_snowtools.preprocess(ds)
    ...  # Do what you want with the data inside of these files.
class utils.xarray_snowtools_backend.SnowtoolsBackendEntrypoint[source]

Bases: BackendEntrypoint

Xarray entry point to deal with snowtools-specific NetCDF files : - Ensure backward compatibility in case of a change of variable/dimension name - Map between identical variables coming from different sources with different names - Deal with SURFEX-specific time dimension issues

This is done by applying a preprocess (see the “preprocess” method) to the data before returning the xarray dataset or dataarray object.

Usage:

from snowtools.utils import xarray_snowtools
import xarray

ds = xr.open_dataset(filename, engine='snowtools')

External documentation :

https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html

guess_can_open()[source]

guess_can_open is used to identify the proper engine to open your data file automatically in case the engine is not specified explicitly.

open_dataset(filename_or_obj, *, mapping={}, **kw)[source]

Snowtools-specific version of the xarray’s “open_dataset” method, which calls the native method and carries out a preprocessing of the data before returning the dataset object. If a list of paths is provided, the files are automatically opened with “open_mfdataset”. WARNING : the direct use of open_mfdataset is recomended whenever you are sure to deal with more than one file.

Parameters:

filename_or_obj (str, Path, file_like, DataStore or nested sequence of paths) – Pathi or list of paths of the file(s) to read

Mapping:

User-defined dictionnary to map variable or dimension names. It can be used as a complement to the default mapping dictionnary in case of a code based on variable / dimension names different than the standard ones (for example a code written after a SURFEX update). This is a snowtools-specific argument.

Functions (accessors) provided by snowtools adaptation of xarray

The module xarray_snowtools_accessor aims at wrapping and extending the xarray module for snowtools-specific usage. The wrapping of existing methods is designed to reduce dependency to native xarray method changes (in order to centralise required adaptations).

Following the xarray project’s recomandations, it is based on the use of accessor : https://tutorial.xarray.dev/advanced/accessors/01_accessor_examples.html

This accessor is automatically made available when you import snowtools.utils.xarray_snowtools.

Usage examples

from snowtools.utils import xarray_snowtools
import xarray as xr

ds = xr.open_dataset('INPUT.nc', decode_times=False)
ds = xarray_snowtools.preprocess(ds)
  1. Select subset of points from a S2M file in the massif geometry, based on the massif number, elevation, slope and aspect

ds.semidistributed.sel_points(massif_num=3, ZS=[900, 1800, 2700, 3600], slope=40)
  1. Project gridded data from Lambert-93 to lat/lon :

ds.distributed.proj(crs_in="EPSG:2154", crs_out="EPSG:4326")
  1. Interpret time-like variable or dimension as datetime :

ds.surfex.decode_time_variable(varname)
  1. Compute 24-hour precipitation accumulations from 6h J to 6h J+1 from hourly precipitation dataset:

ds.Precipitation.meteo.daily_accumulation()
  1. Example of groupby with the first-test PRO file:

from snowtools.utils import xarray_snowtools
import xarray as xr
import matplotlib.pyplot as plt
ds = xr.open_dataset('PRO_2010080106_2011080106.nc', decode_times=False)
ds = xarray_snowtools.preprocess(ds)
dszs = ds.semidistributed.sel_points(ZS=2400)
meanmonthgroup = dszs.groupby("time.month").mean() # mean the variables on a monthly base
meanmonthgroup.TG1.plot() # choose one variable to plot
plt.show()
  1. Example of resample with the first-test PRO file:

from snowtools.utils import xarray_snowtools
import xarray as xr
ds = xr.open_dataset('PRO_2010080106_2011080106.nc', decode_times=False)
ds = xarray_snowtools.preprocess(ds)
dszs = ds.semidistributed.sel_points(ZS=2400)
dszs.resample(time='12h').mean() # time resampling to 12h timestep
  1. Use of custom “daily_accumulation” method

Resample Rainf variable from hourly values to daily accumulations, starting at 03:00 :

from snowtools.utils import xarray_snowtools
import xarray as xr

ds_hourly = xr.open_dataset('FORCING_test_2d.nc', decode_times=False)
ds = xarray_snowtools.preprocess(ds)
ds_daily = ds_hourly.Rainf.snowtools.daily_accumulation(start_hour=3)

New features integration rules:

Any native xarray function/method NOT part of xarray’s public API can be overwritten in these accessors in order to centralise required adaptations in case of any change of behavior of the native method.

Informations on the list of xarray function/method considered public API can be found in the xarray documentation :

class utils.xarray_snowtools_accessor.DistributedAccessor(xarray_obj)[source]

Bases: SnowtoolsAccessor

Additionnal methods in distributed geometry (ex: EDELWEISS)

Usage example:

from snowtools.utils import xarray_snowtools
import xarray as xr

ds = xr.open_dataset('INPUT.nc', decode_times=False)
ds = xarray_snowtools.preprocess(ds)
ds.distributed.proj("EPSG:4326", "EPSG:2154")
plot_ensemble(variable=None, vmin=None, vmax=None, cmap=None, dem=None, isolevels=None, members: str | int = 'all', projection=None)[source]

Plot field(s) from an ensemble. To control the members of the ensemble to be plotted, use the “members” argument. The dataset must have a ‘member’ dimension, and the spatial dimensions (‘xx’, ‘yy’). This implies that the selection of the time step must be done before calling the method.

Parameters:
  • variable (str) – Variable name to plot (Dataset only)

  • vmin (float) – Min colorbar value.

  • vmax (float) – Max colorbar value.

  • cmap (str) – Matplotlib colormap

  • dem (DataArray) – Digital Elevation Model covering the dataset area.

  • isolevels (list) – List of iso-levels to plot

  • members (str or int) – How to plot the ensemble data: ‘all’: Plot all ensemble members on the same figure; ‘mean’: Plot the mean ensemble field; int: Plot the given ensemble member only.

proj(crs_in='EPSG:4326', crs_out='EPSG:2154')[source]

Projection of an xarray dataset or dataarray into a new CRS. This method implies a dependency to rioxarray.

Parameters:
  • ds (xarray Dataset or Dataarray) – xarray object to preprocess

  • crs_in (str) – CRS of the input object

  • crs_out (str) – CRS of the output object

class utils.xarray_snowtools_accessor.MeteoAccessor(xarray_obj)[source]

Bases: SnowtoolsAccessor

Accessor designed to deal with meteorological files.

Usage example:

from snowtools.utils import xarray_snowtools
import xarray as xr

ds = xr.open_dataset('FORCING.nc', decode_times=False)
ds = xarray_snowtools.preprocess(ds)
ds.meteo.[...]
class utils.xarray_snowtools_accessor.SemiDistributedAccessor(xarray_obj)[source]

Bases: SnowtoolsAccessor

Additionnal methods in semi-distributed geometry (ex: S2M simulaitions)

Usage example:

from snowtools.utils import xarray_snowtools
import xarray as xr

ds = xr.open_dataset('INPUT.nc', decode_times=False)
ds = xarray_snowtools.preprocess(ds)
ds.semidistributed.sel_points(massif_num=3, ZS=[900, 1800, 2700, 3600], slope=40)
sel_points(massif_num=None, ZS=None, slope=None, aspect=None)[source]

Method used to select a user-defined list of points in semi-distributed geometry (SAFRAN massifs geometry) from their elevation (ZS), massif number (massif_num), slope and aspect.

NB : More advanced indexing (for example to select all elevations above 1800m or use a slice as argument), use the native xarray “where” method directly.

Parameters:
  • massif_num – Massif number(s) of points to select

  • massif_num – list, range or int

  • ZS – Elevation(s) of points to select

  • ZS – list, range or int

  • slope – Slope(s) of points to select

  • slope – list, range or int

  • aspect – Aspects(s) of points to select

  • aspect – list, range or int

class utils.xarray_snowtools_accessor.SnowtoolsAccessor(xarray_obj)[source]

Bases: object

Common snowtools-specific additionnal methods for xarray

backtrack_preprocess()[source]

Method to undo the changes done in the preprocess step if the dataset was opened with the “snowtools” backend. Initial variable / dimension names come from the ‘original_variable_name’, ‘original_dimension_name’ or ‘original_name’ attribute created during the preprocess step.

Usage example:

from snowtools.utils import xarray_snowtools
import xarray as xr
ds = xr.open_dataset('old_PRO_20180807032000_002400.nc', decode_times=False)
ds = xarray_snowtools.preprocess(ds)
original = ds.backtrack_preprocess()
daily_accumulation(start_hour=6)[source]

Compute 24-hour accumulation starting at strat_hour, with an output label set at the end of the accumulation period. This method will return the sum of all values between “start_hour” at day J and “start_hour” at day J+1 whatever the available frequency in the original dataset.

It is important to use dask when calling the resample function for huge dataset to optimise computation time. Since the operations are carried out along the time dimension by slices of 24 hours, chunk sizes are automatically set to 24 over the time dimension and the dimension’s length over all other dimensions.

Execution time for a file over the Grandes Rousses at 25m resolution (~1M of points, total file size ~177 Go)
  • 1000 hourly time steps : < 30s

  • 2000 hourly time steps : ~ 2 minutes

  • 4000 hourly time steps (~6 months) : ~ 6 minutes

  • 1 full year (8760 hourly timte steps) : crash

Execution time for a file over the French Alps at 250m resolution (~2.5M of points, total file size 13 Go) over 1 month (721 hourly time steps) : ~ 20s

Usage example:

Compute 24-hour precipitation accumulations from 6h J to 6h J+1 from hourly precipitation dataset, so that the new index ‘YYYY-MM-DD O6’ contains the sum of all indices between ‘YYYY-MM-{DD-1} 07’ and ‘YYYY-MM-DD 06’ (corresponding to precipitation accumulations between ‘YYYY-MM-{DD-1} 06’ and ‘YYYY-MM-DD 06’)

tmp = ds.snowtools.daily_accumulation()
out = tmp.sel(time='2021-12-05 06')

is equivalent to :

out = ds.sel({'time': slice('2021-12-04 07', '2021-12-05 06')}).sum('time')
Parameters:

start_hour (int) – Hour of the day from which the 24-hour accumulation is to be computed

snow_cover_stats(snow_depth_variable='DSN_T_ISBA', snow_depth_threshold=0.2)[source]

Compute snow cover statistics from a snow depth time serie. Returned stats are:

  • LCSCD : Longest Concurent Snow Cover Duration period

  • LCSMOD : Snow Melt Out Date of the Longest Concurent snow cover period

  • LCSOD : Snow Cover Onset date of the Longest Concurent snow cover period

  • SD : Snow duration : total number of snow coverage

This method calls the “lcscd” function defined in snowtools/tools/SnowCoverDuration.py, see the associated documentation for more information.

Usage example:

from snowtools.utils import xarray_snowtools
import xarray as xr
ds=xr.open_dataset('PRO_WJF_2010-2016.nc', decode_times=False)
ds = xarray_snowtools.preprocess(ds)
stats=ds.snowtools.snow_cover_stats()
Parameters:
  • snow_depth_variable – Name of the variable containing the snow depth (default value for SURFEX outpus)

  • threshold – Snow depth threshold to consider a given point / day as “snow covered”

squeeze()[source]

Drop dimensions with only 1 coordinate (ex : Number_of_patches / Number_of_tiles)

transpose()[source]

Put time dimension as first dimension in case of data processing through numpy arrays

class utils.xarray_snowtools_accessor.SurfexAccessor(xarray_obj)[source]

Bases: SnowtoolsAccessor

Accessor designed to deal with SURFEX output files.

Usage example:

from snowtools.utils import xarray_snowtools
import xarray as xr

ds = xr.open_dataset('PRO.nc', decode_times=False)
ds = xarray_snowtools.preprocess(ds)
ds.surfex.decode_time_variable('time')
decode_time_variable(varname)[source]

Manually decode any time-like variable from a SURFEX output

Parameters:

varname (str) – Name of the variable to decode

drop_tile_dimension(tile=0)[source]

Select a single value for “tile” dimension (or equivalent) and squeeze the dataset to drop the dimension.

Parameters:

tile (int) – Value of the “tile” dimension to select

Good practices

For example, to compute the maximum of snow depth over time for one specific point (lon, lat), do :

from snowtools.utils import xarray_snowtools
import xarray as xr

ds = xr.open_dataset(filename, decode_times=False)
ds = xarray_snowtools.preprocess(ds)
htn = ds.DSN_T_ISBA
subdata = htn.sel(xx=lon, yy=lat)
out = subdata.max('time')

instead of :

from snowtools.utils import xarray_snowtools
import xarray as xr

ds = xr.open_dataset(filename, decode_times=False)
ds = xarray_snowtools.preprocess(ds)
htn = ds.DSN_T_ISBA
maxhtn = hnt.max('time')
out = maxhtn.sel(xx=lon, yy=lat)

In particular, avoid using the “where” method (which does not support lazy indexing) to select sub-data when / if possible.