evaltools package#

evaltools.evaluator module#

This module defines the classes managing the data.

class evaltools.evaluator.Evaluator(observations, simulations, color='k')#

Bases: object

Class gathering observations and simulations of the studied case.

An object of class Simulations will be specific to one species, one series type, one period and one model. This class contain several methods to compare simulations and observations.

FDscores(score_list, forecast_days='all', threshold=0.75, output_file=None)#

Compute spatio-temporal scores.

Scores are computed using all data available for a given forecast day (values for all station at all times are considered as a simple 1D array).

Parameters:
  • score_list (list of str) – List of scores to compute.

  • forecast_days (‘all’ or list of int) – Forcast days used for computation. The returned DataFrame will contain one row per forcast day.

  • threshold (int or float) – Minimal number (if type(threshold) is int) or minimal rate (if type(threshold) is float) of data available in both obs and sim required to compute the scores.

  • output_file (str) – File where to save the result. If None, result is not saved in csv.

Returns:

pandas.DataFrame – DataFrame with one column per score and one row per forecast day.

FTscores(score_list, availability_ratio=0.75, output_file=None, coords=False)#

Compute forecast time scores for each station.

Only available for hourly time step data.

Parameters:
  • score_list (list of str) – List of scores to compute.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast time to compute the scores for each station.

  • output_file (str) – File where to save the result. If None, result is not saved in csv. The file name must contain {score} instead of the score name. A file in created for each score.

  • coords (bool) – If True, lat/lon coordinates are copied in the output.

Returns:

dictionary – Dictionary, with one key per score, that contains pandas.DataFrame with one row per station and one column per forecast time except the first two columns that are latitude and longitude.

average_ft_scores(score_list, availability_ratio=0.75, min_nb_sta=10, output_file=None, score_type='temporal', averaging='median')#

Compute average scores for each forecast time.

Parameters:
  • score_list (list of str) – List of scores to compute.

  • availability_ratio (float) – Minimum required rate of available data over the period to calculate the score if score_type is ‘temporal, or to calculate the average of the score if score_type is ‘spatial’.

  • min_nb_sta (int) – Minimal number of stations required to compute the average of the score if score_type is ‘temporal, or to compute the score itself if score_type is ‘spatial’.

  • output_file (str) – File where to save the result. If None, result is not saved in csv. The file name must contain {score} instead of the score name. A file is created for each score.

  • score_type (str) – Computing method selected from ‘temporal’ or ‘spatial’.

  • averaging (str) – Type of score averaging choosen among ‘mean’ or ‘median’.

Returns:

pandas.DataFrame – DataFrame with one row per forecast time and one column per score.

average_spatial_scores(score_list, min_nb_sta=10, averaging='median', availability_ratio=0.75)#

Compute average spatial scores for each forecast day.

Parameters:
  • score_list (list of str) – List of computed scores.

  • min_nb_sta (int) – Minimum required number of stations to compute the scores.

  • averaging (str) – Type of score averaging selected from ‘mean’ or ‘median’.

  • availability_ratio (float) – Minimum required rate of available data over the period to calculate the average of the scores.

Returns:

pandas.DataFrame – Dataframe with one column per score and one line per forecast day (‘D0’, ‘D1’, …).

average_temporal_scores(score_list, availability_ratio=0.75, averaging='median', min_nb_sta=10)#

Compute average temporal scores for each forecast day.

Parameters:
  • score_list (list of str) – List of computed scores.

  • availability_ratio (float) – Minimum required rate of available data over the period to calculate the scores.

  • averaging (str) – Type of score averaging selected from ‘mean’ or ‘median’.

  • min_nb_sta (int) – Minimum required number of stations to compute the average of the scores.

Returns:

pandas.DataFrame – Dataframe with one column per score and one line per forecast day (‘D0’, ‘D1’, …).

colocate_nan()#

Colocate missing values between observations and simulations.

conc_scores(score_list, conc_range, output_file=None, min_nb_val=10, based_on='obs', forecast_day=0)#

Compute scores for an interval of concentration values.

For each station, scores are computed keeping only times where the observed values (if based_on=’obs’) or the simulated values (if based_on=’sim’) fall within conc_range.

Parameters:
  • score_list (list of str) – List of computed scores.

  • conc_range (list of two scalars) – Interval of concentrations to keep to compute the scores.

  • output_file (str) – File where to save the result. The file name can contain {forecast_day} instead of the forecast day number. If None, result is not saved in csv.

  • min_nb_val (int) – Minimal number of (obs, sim) couple required for a score to be computed.

  • based_on (str) – If ‘sim’, the concentration interval is determined from simulation data. Else (‘obs’) it is determined with observations.

  • forecast_day (int) – Integer corresponding to the chosen forecast day.

Returns:

  • pandas.DataFrame – Dataframe with one column per computed score.

  • scalar – Number of values kept to compute the scores.

contingency_table(threshold, output_file)#

Contingency table.

The table is computed from daily data for each forecast day. Before computed the table, values for every stations and every days of the period are concatenated. Tables corresponding to the different forecast days are stored in the same file.

Parameters:
  • threshold (scalar) – Threshold value.

  • output_file (str) – File where to save the result.

dailyMax(availability_ratio=0.75)#

Return Evaluator object working on daily maximum.

This method compute the daily maximum of observations and simulations of the current object and returns a new Evaluator object with series_type = ‘daily’ and with data corresponding to the computed daily maxima.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily maximum.

Returns:

evaluator.Evaluator – Evaluator object with series_type = ‘daily’ and with data corresponding to the computed daily maximum.

dailyMean(availability_ratio=0.75)#

Build Evaluator object working on daily mean.

This method computes the daily mean of observations and simulations of the current object.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily mean.

Returns:

evaluator.Evaluator – Evaluator object with series_type = ‘daily’ and with data corresponding to the computed daily mean.

daily_max(availability_ratio=0.75)#

Return Evaluator object working on daily maximum.

This method compute the daily maximum of observations and simulations of the current object and returns a new Evaluator object with series_type = ‘daily’ and with data corresponding to the computed daily maxima.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily maximum.

Returns:

evaluator.Evaluator – Evaluator object with series_type = ‘daily’ and with data corresponding to the computed daily maximum.

daily_mean(availability_ratio=0.75)#

Build Evaluator object working on daily mean.

This method computes the daily mean of observations and simulations of the current object.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily mean.

Returns:

evaluator.Evaluator – Evaluator object with series_type = ‘daily’ and with data corresponding to the computed daily mean.

dump(output_file_path)#

Dump the evaluator.Evaluator object.

Save an evaluator.Evaluator object in binary format. Use evaltools.evaluator.load function to get the object back.

Parameters:

output_file_path (str) – Path of the output binary file.

property endDate#

Deprecated.

property end_date#

Get the ending of the object.

fairmodeBenchmark(target_file=None, summary_file=None, output_csv=None, availability_ratio=0.75, label=None, target_title=None, summary_title=None, color=None, file_formats=['png'], forecast_day=0, mark_by=None, indicative_color=False, output_indicators=None)#

Plot FAIRMODE target and summary diagrams.

Concentration values must be in µg/m^3. Supported species are ‘o3’, ‘no2’, ‘pm10’ and ‘pm2p5’.

Parameters:
  • target_file (str or None) – File where to save the target diagram (without extension). If None, the figure is shown in a popping window.

  • summary_file (str or None) – File where to save the summary diagram (without extension). If None, the figure is shown in a popping window.

  • output_csv (str or None) – File where to save the target data.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • label (str) – Label for the legend.

  • target_title (str) – Target diagram title.

  • summary_title (str) – Summary diagram title.

  • color (str)

  • file_formats (list of str) – List of file extensions.

  • forecast_day (int) – Forecast day used to compute the two diagrams.

  • mark_by (1D array-like) – This argument allows to choose different markers for different station groups according to a variable of self.stations. It must be of length two. First element is the label of the column used to define the markers. Second element is a dictionary defining which marker to use for each possible values. Ex: (‘area’, {‘urb’: ‘s’, ‘rur’: ‘o’, ‘sub’: ‘^’})

  • indicative_color (bool) – If True, legend labels for in the target plot are green if MQI90 < 1 and Y90 < 1 and else they are red.

  • output_indicators (str or None) – File where to save the mqi90 and MPCs.

fairmode_benchmark(target_file=None, summary_file=None, output_csv=None, availability_ratio=0.75, label=None, target_title=None, summary_title=None, color=None, file_formats=['png'], forecast_day=0, mark_by=None, indicative_color=False, output_indicators=None)#

Plot FAIRMODE target and summary diagrams.

Concentration values must be in µg/m^3. Supported species are ‘o3’, ‘no2’, ‘pm10’ and ‘pm2p5’.

Parameters:
  • target_file (str or None) – File where to save the target diagram (without extension). If None, the figure is shown in a popping window.

  • summary_file (str or None) – File where to save the summary diagram (without extension). If None, the figure is shown in a popping window.

  • output_csv (str or None) – File where to save the target data.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • label (str) – Label for the legend.

  • target_title (str) – Target diagram title.

  • summary_title (str) – Summary diagram title.

  • color (str)

  • file_formats (list of str) – List of file extensions.

  • forecast_day (int) – Forecast day used to compute the two diagrams.

  • mark_by (1D array-like) – This argument allows to choose different markers for different station groups according to a variable of self.stations. It must be of length two. First element is the label of the column used to define the markers. Second element is a dictionary defining which marker to use for each possible values. Ex: (‘area’, {‘urb’: ‘s’, ‘rur’: ‘o’, ‘sub’: ‘^’})

  • indicative_color (bool) – If True, legend labels for in the target plot are green if MQI90 < 1 and Y90 < 1 and else they are red.

  • output_indicators (str or None) – File where to save the mqi90 and MPCs.

filteredSeries(availability_ratio=0.75)#

Return Evaluator object working on the filtered series.

This method compute the filtered series of observations and simulations of the current object and returns a new Evaluator object with series_type = ‘hourly’ and with data corresponding to the computed filtered series.

Parameters:

availability_ratio (float) – Minimal rate of values available for a given hour to compute the daily filtered series for this hour in all days.

Returns:

evaluator.Evaluator – Evaluator object with series_type = ‘hourly’ and with data corresponding to the computed filtered series.

filtered_series(availability_ratio=0.75)#

Return Evaluator object working on the filtered series.

This method compute the filtered series of observations and simulations of the current object and returns a new Evaluator object with series_type = ‘hourly’ and with data corresponding to the computed filtered series.

Parameters:

availability_ratio (float) – Minimal rate of values available for a given hour to compute the daily filtered series for this hour in all days.

Returns:

evaluator.Evaluator – Evaluator object with series_type = ‘hourly’ and with data corresponding to the computed filtered series.

property forecastHorizon#

Deprecated.

property forecast_horizon#

Get the forecast horizon of the object.

property freq#

Get the time step of the object as a datetime.timedelta.

get_obs(forecast_day, start_end=None)#

Get observations according to the forecast day.

Parameters:
  • forecast_day (int) – Forecast day for which to get observations.

  • start_end (None or list of two datetime.date objects) – Dates between which getting the data.

Returns:

pandas.DataFrame – Observations for the time period corresponding to the forecast day.

get_sim(forecast_day, start_end=None)#

Get simulations according to the forecast day.

Parameters:
  • forecast_day (int) – Forecast day for which to get simulations.

  • start_end (None or list of two datetime.date objects) – Dates between which getting the data.

Returns:

pandas.DataFrame – Simulations corresponding to the forecast day.

meanTimeScores(score_list, output_file=None, min_nb_sta=10, availability_ratio=0.75)#

Compute the mean of time scores for each forecast time.

Parameters:
  • score_list (list of str) – List of computed scores.

  • output_file (str) – File where to save the result. The file name must contain {forecast_day} instead of the forecast day number. If None, result is not saved in csv.

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required per datetime to compute the scores (before applying mean for each forecast time).

  • availability_ratio (float) – Minimal rate of data (computed scores per time) available on the period required per forecast time to compute the mean scores.

Returns:

dictionary – Dictionary, with one key per score, that contains lists of the means for each forecast time. For example if the forecast horizon is 4, the lists will be of length 96.

mean_time_scores(score_list, output_file=None, min_nb_sta=10, availability_ratio=0.75)#

Compute the mean of time scores for each forecast time.

Parameters:
  • score_list (list of str) – List of computed scores.

  • output_file (str) – File where to save the result. The file name must contain {forecast_day} instead of the forecast day number. If None, result is not saved in csv.

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required per datetime to compute the scores (before applying mean for each forecast time).

  • availability_ratio (float) – Minimal rate of data (computed scores per time) available on the period required per forecast time to compute the mean scores.

Returns:

dictionary – Dictionary, with one key per score, that contains lists of the means for each forecast time. For example if the forecast horizon is 4, the lists will be of length 96.

medianStationScores(score_list, availability_ratio=0.75, min_nb_sta=10, output_file=None)#

Compute median station scores for each forecast time.

Parameters:
  • score_list (list of str) – List of scores to compute.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast time to compute the scores for each station.

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required to compute the median.

  • output_file (str) – File where to save the result. If None, result is not saved in csv. The file name must contain {score} instead of the score name. A file is created for each score.

Returns:

pandas.DataFrame – DataFrame with one row per forecast time and one column per score.

median_station_scores(score_list, availability_ratio=0.75, min_nb_sta=10, output_file=None)#

Compute median station scores for each forecast time.

Parameters:
  • score_list (list of str) – List of scores to compute.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast time to compute the scores for each station.

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required to compute the median.

  • output_file (str) – File where to save the result. If None, result is not saved in csv. The file name must contain {score} instead of the score name. A file is created for each score.

Returns:

pandas.DataFrame – DataFrame with one row per forecast time and one column per score.

property model#

Get the model name of the object.

movingAverageDailyMax(availability_ratio=0.75)#

Compute the daily maximum of the moving average.

This method compute the daily maximum of the moving average for observations and simulations of the current object and returns a new Evaluator object.

Parameters:

availability_ratio (float) – Minimal rate of values available to compute the average for a given 8-hour window, and also minimal rate of values available for a given day to compute the daily maximum of the moving average. For example with availability_ratio=0.75, a daily maximum eight hours average can only be calculated if 18 eight hours average are available each of which requires 6 hourly values to be available.

Returns:

evaluator.Evaluator – Evaluator object with series_type = ‘daily’ and with data corresponding to the computed daily maximum of the moving average.

moving_average_daily_max(availability_ratio=0.75)#

Compute the daily maximum of the moving average.

This method compute the daily maximum of the moving average for observations and simulations of the current object and returns a new Evaluator object.

Parameters:

availability_ratio (float) – Minimal rate of values available to compute the average for a given 8-hour window, and also minimal rate of values available for a given day to compute the daily maximum of the moving average. For example with availability_ratio=0.75, a daily maximum eight hours average can only be calculated if 18 eight hours average are available each of which requires 6 hourly values to be available.

Returns:

evaluator.Evaluator – Evaluator object with series_type = ‘daily’ and with data corresponding to the computed daily maximum of the moving average.

mqi(threshold=0.75, forecast_day=0)#

Calculate the modelling quality indicator.

Parameters:
  • threshold (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

Returns:

pandas.Series – Series with index corresponding to the object stations and containing modelling quality incator for each station.

mqi90(threshold=0.75, forecast_day=0)#

Calculate the 90th percentile of modelling quality indicator values.

Parameters:
  • threshold (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

Returns:

float – 90th percentile of modelling quality incator.

normalized_series()#

Return Evaluator object with normalized series.

This method normalizes each series of observations and simulations by substracting its median and dividing by its interquartile range.

Returns:

evaluator.Evaluator – Evaluator object with series_type = ‘hourly’ and with data corresponding to the computed normalized series.

property obsDF#

Deprecated.

property obs_df#

Get the observation data of the object.

obs_exceedances(threshold, output_file)#

Look for exceedances in observed time series.

Parameters:
  • threshold (scalar) – Threshold value.

  • output_file (str) – File where to save the result.

quarterlyMedianScore(file_path, score='RMSE', forecast_day=0, score_type='temporal', averaging='median', availability_ratio=0.75, min_nb_sta=10)#

Calculate an average score.

The period of the object must corresponds to a valid quarter.

Parameters:
  • file_path (str) – File where to find previous values and save the result.

  • score (str) – The score to process.

  • forecast_day (int) – Integer corresponding to the chosen forecast day used for computation.

  • score_type (str) – Computing method selected from ‘temporal’, ‘spatial’ or ‘spatiotemporal’.

  • averaging (str) – Type of score averaging choosen among ‘mean’ or ‘median’. This parameter is ignored if score_type == spatiotemporal.

  • availability_ratio (float) – Minimum required rate of available data over the period to calculate the score if score_type is ‘temporal, or to calculate the average of the score if score_type is ‘spatial’. This parameter is ignored if score_type is ‘spatiotemporal’.

  • min_nb_sta (int) – Minimal number of values required to compute the average of the score if score_type is ‘temporal, or to compute the score itself if score_type is ‘spatial’ or ‘spatiotemporal’.

Returns:

The median for the computed score.

quarterly_median_score(file_path, score='RMSE', forecast_day=0, availability_ratio=0.75, min_nb_sta=1)#

Compute median on station scores.

Parameters:
  • file_path (str) – File where to find previous values and save the result.

  • score (str) – The score to process.

  • forecast_day (int) – Integer corresponding to the chosen forecast_day used for computation.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute the scores.

  • min_nb_sta (int) – Minimal number of station values available required to compute the median.

Returns:

The median for the computed score.

quarterly_score(file_path, score='RMSE', forecast_day=0, score_type='temporal', averaging='median', availability_ratio=0.75, min_nb_sta=10)#

Calculate an average score.

The period of the object must corresponds to a valid quarter.

Parameters:
  • file_path (str) – File where to find previous values and save the result.

  • score (str) – The score to process.

  • forecast_day (int) – Integer corresponding to the chosen forecast day used for computation.

  • score_type (str) – Computing method selected from ‘temporal’, ‘spatial’ or ‘spatiotemporal’.

  • averaging (str) – Type of score averaging choosen among ‘mean’ or ‘median’. This parameter is ignored if score_type == spatiotemporal.

  • availability_ratio (float) – Minimum required rate of available data over the period to calculate the score if score_type is ‘temporal, or to calculate the average of the score if score_type is ‘spatial’. This parameter is ignored if score_type is ‘spatiotemporal’.

  • min_nb_sta (int) – Minimal number of values required to compute the average of the score if score_type is ‘temporal, or to compute the score itself if score_type is ‘spatial’ or ‘spatiotemporal’.

Returns:

The median for the computed score.

remove_negvalues(rpl_value='nan')#

Replace negative values.

Replace negative values of observations and simulations data by another value.

Parameters:

rpl_value (scalar or ‘nan’) – Replacement value for negative values.

replace_value(needle='nan', replace_value=-999)#

Replace a choosen value.

Replace a choosen value in observations and simulations data by another value.

Parameters:
  • needle (scalar or ‘nan’) – Value to be replaced.

  • replace_value (scalar or ‘nan’) – Replacement value for needle.

rmsu(threshold=0.75, forecast_day=0)#

Calculate the root mean square of measurement uncertainty.

Parameters:
  • threshold (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

Returns:

pandas.Series – Series with index corresponding to the object stations and containing root mean square of measurement uncertainty for each station.

selectCountries(countries, inplace=True)#

Only keep stations within certain countries.

This method assumes that the first caracters of a station code refer to the station country.

Parameters:
  • countries (1D array of str.) – List of first letters of station codes to keep (e.g. [‘FRA’, ‘IT’, ‘AUT’, ‘ES’]).

  • inplace (bool) – If True, the Evaluator object is modified inplace, else, a new object is returned.

select_countries(countries, inplace=True)#

Only keep stations within certain countries.

This method assumes that the first caracters of a station code refer to the station country.

Parameters:
  • countries (1D array of str.) – List of first letters of station codes to keep (e.g. [‘FRA’, ‘IT’, ‘AUT’, ‘ES’]).

  • inplace (bool) – If True, the Evaluator object is modified inplace, else, a new object is returned.

property seriesType#

Deprecated.

property series_type#

Get the series type of the object.

set_fairmode_params(availability_ratio=0.75)#

Set Fairmode coefficients used to calculate the measurement uncertainty.

The coefficients are

thresholdscalar

Limit concentration value fixed by air quality policies.

Uscalar

$U^{95}_{95,r}$ as defined by FAIRMODE for measurement uncertainty calculation.

alphascalar

$alpha$ as defined by FAIRMODE for measurement uncertainty calculation.

RVscalar

Reference value as defined by FAIRMODE for measurement uncertainty calculation.

percscalar

Selected percentile value used in the calculation of FAIRMODE’s modeling perfomance criteria for high percentiles.

Np, Nnp :

Coefficients used to compute in FAIRMODE’s observation uncertainty for annual averages.

Parameters:

availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

property simDF#

Deprecated.

property sim_df#

Get the simulation data of the object.

sim_exceedances(threshold, output_file)#

Look for exceedances in simulated time series.

Parameters:
  • threshold (scalar) – Threshold value.

  • output_file (str) – File where to save the result. The file name must contain {forecast_day} instead of the forecast day number.

spatial_scores(score_list, output_file=None, min_nb_sta=10)#

Compute spatial scores per time step.

Parameters:
  • score_list (list of str) – List of computed scores.

  • output_file (str.) – File where to save the result. The file name must contain {forecast_day} instead of the forecast day number. If None, result is not saved in csv.

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required per datetime to compute the scores.

Returns:

dictionary – Dictionary with one key per forecast day (‘D0’, ‘D1’, …) containing pandas.DataFrames with datetime index and one column per computed score.

spatiotemporal_scores(score_list, forecast_days='all', threshold=0.75, output_file=None)#

Compute spatio-temporal scores.

Scores are computed using all data available for a given forecast day (values for all station at all times are considered as a simple 1D array).

Parameters:
  • score_list (list of str) – List of scores to compute.

  • forecast_days (‘all’ or list of int) – Forcast days used for computation. The returned DataFrame will contain one row per forcast day.

  • threshold (int or float) – Minimal number (if type(threshold) is int) or minimal rate (if type(threshold) is float) of data available in both obs and sim required to compute the scores.

  • output_file (str) – File where to save the result. If None, result is not saved in csv.

Returns:

pandas.DataFrame – DataFrame with one column per score and one row per forecast day.

property species#

Get the species date of the object.

property startDate#

Deprecated.

property start_date#

Get the starting date of the object.

stationScores(score_list, output_file=None, availability_ratio=0.75)#

Compute temporal scores per station for each forecast day.

Parameters:
  • score_list (list of str) – List of computed scores.

  • output_file (str) – File where to save the result. The file name must contain {forecastDay} instead of the forecast day number. If None, result is not saved in csv.

  • availability_ratio (float) – Minimum required rate of available data over the period to calculate the scores.

Returns:

dictionary – Dictionary with one key per forecast day (‘D0’, ‘D1’, …) containing pandas.DataFrames with station names as index and one column per computed score.

stationSubList(station_list, inplace=True)#

Drop stations not contained in the given list.

Parameters:
  • station_list (1D array of str.) – List of stations to keep.

  • inplace (bool) – If True, the Evaluator object is modified inplace, else, a new object is returned.

station_sub_list(station_list, inplace=True)#

Drop stations not contained in the given list.

Parameters:
  • station_list (1D array of str.) – List of stations to keep.

  • inplace (bool) – If True, the Evaluator object is modified inplace, else, a new object is returned.

property stations#

Get the station list of the object.

property step#

Get the time step of the object.

subArea(min_lon, max_lon, min_lat, max_lat, inplace=True)#

Drop stations not contained within the given lat/lon boundaries.

Parameters:
  • min_lon, max_lon, min_lat, max_lat (scalars) – Lat/lon boundaries.

  • inplace (bool) – If True, the Evaluator object is modified inplace, else, a new object is returned.

subPeriod(start_date, end_date)#

Build a new Evaluator object define on a shorter period.

Parameters:
  • start_date (datetime.date) – Starting date of the new object.

  • end_date (datetime.date) – Ending date of the new object.

Returns:

Evaluator object.

sub_area(min_lon, max_lon, min_lat, max_lat, inplace=True)#

Drop stations not contained within the given lat/lon boundaries.

Parameters:
  • min_lon, max_lon, min_lat, max_lat (scalars) – Lat/lon boundaries.

  • inplace (bool) – If True, the Evaluator object is modified inplace, else, a new object is returned.

sub_period(start_date, end_date)#

Build a new Evaluator object define on a shorter period.

Parameters:
  • start_date (datetime.date) – Starting date of the new object.

  • end_date (datetime.date) – Ending date of the new object.

Returns:

Evaluator object.

summary()#

Print a summary of the object.

temporal_ft_scores(score_list, availability_ratio=0.75, output_file=None, coords=False)#

Compute forecast time scores for each station.

Only available for hourly time step data.

Parameters:
  • score_list (list of str) – List of scores to compute.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast time to compute the scores for each station.

  • output_file (str) – File where to save the result. If None, result is not saved in csv. The file name must contain {score} instead of the score name. A file in created for each score.

  • coords (bool) – If True, lat/lon coordinates are copied in the output.

Returns:

dictionary – Dictionary, with one key per score, that contains pandas.DataFrame with one row per station and one column per forecast time except the first two columns that are latitude and longitude.

temporal_scores(score_list, output_file=None, availability_ratio=0.75)#

Compute temporal scores per station for each forecast day.

Parameters:
  • score_list (list of str) – List of computed scores.

  • output_file (str) – File where to save the result. The file name must contain {forecastDay} instead of the forecast day number. If None, result is not saved in csv.

  • availability_ratio (float) – Minimum required rate of available data over the period to calculate the scores.

Returns:

dictionary – Dictionary with one key per forecast day (‘D0’, ‘D1’, …) containing pandas.DataFrames with station names as index and one column per computed score.

timeScores(score_list, output_file=None, min_nb_sta=10)#

Compute spatial scores per time step.

Parameters:
  • score_list (list of str) – List of computed scores.

  • output_file (str.) – File where to save the result. The file name must contain {forecast_day} instead of the forecast day number. If None, result is not saved in csv.

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required per datetime to compute the scores.

Returns:

dictionary – Dictionary with one key per forecast day (‘D0’, ‘D1’, …) containing pandas.DataFrames with datetime index and one column per computed score.

time_series_to_csv(obs_output_file, sim_output_file)#

Save timeseries dataframes as csv.

Parameters:
  • obs_output_file (str) – File path where to save observed timeseries.

  • sim_output_file (str) – File path where to save simulated timeseries. The path name must contain {forecast_day} instead of the forecast day number.

y90(availability_ratio=0.75, forecast_day=0)#

Calculate the 90th percentile of MQIs for the average of model values.

The period over which to average the data should preferably be one year.

Parameters:
  • availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

Returns:

float – 90th percentile of modelling quality incator for yearly average model results.

class evaltools.evaluator.Observations(species, start_date, end_date, stations, series_type, forecast_horizon=1, step=1, path='')#

Bases: object

Class gathering observations of the studied case.

An object of class Observations will be specific to one species, one series type and one period.

check_values(threshold, drop=False, file_path=None)#

Check if observation values exceed a threshold.

If there are values above the threshold, a message is printed and these values are set to nan if drop == True.

Parameters:
  • threshold (scalar) – Threshold value.

  • drop (bool) – If True, values above the threshold are set to nan.

  • file_path (None or str) – File path where to save the names of stations that exceed the threshold.

dailyMax(availability_ratio=0.75)#

Return Observations object working on daily maximum.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily maximum.

Returns:

evaluator.Observations – Observations object with series_type = ‘daily’ and with data corresponding to the computed daily maximum.

dailyMean(availability_ratio=0.75)#

Build Observations object working on daily mean.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily mean.

Returns:

evaluator.Observations – Observations object with series_type = ‘daily’ and with data corresponding to the computed daily mean.

daily_max(availability_ratio=0.75)#

Return Observations object working on daily maximum.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily maximum.

Returns:

evaluator.Observations – Observations object with series_type = ‘daily’ and with data corresponding to the computed daily maximum.

daily_mean(availability_ratio=0.75)#

Build Observations object working on daily mean.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily mean.

Returns:

evaluator.Observations – Observations object with series_type = ‘daily’ and with data corresponding to the computed daily mean.

drop_unrepresentative_stations(availability_ratio=0.75)#

Drop stations with a certain rate of missing values.

Modify Dataset object in place.

Parameters:

availability_ratio (float or None) – Minimal rate of data available on the period required to keep a station. If None, stations with only nan values are dropped.

property endDate#

Deprecated.

property end_date#

Get the ending date of the object.

filteredSeries(availability_ratio=0.75)#

Return Observations object working on the filtered series.

Parameters:

availability_ratio (float) – Minimal rate of values available for a given hour to compute the daily filtered series for this hour in all days.

Returns:

evaluator.Observations – Observations object with series_type = ‘hourly’ and with data corresponding to the computed filtered series.

filtered_series(availability_ratio=0.75)#

Return Observations object working on the filtered series.

Parameters:

availability_ratio (float) – Minimal rate of values available for a given hour to compute the daily filtered series for this hour in all days.

Returns:

evaluator.Observations – Observations object with series_type = ‘hourly’ and with data corresponding to the computed filtered series.

property forecastHorizon#

Deprecated.

property freq#

Get the time step of the object as a datetime.timedelta.

fromDataset(forecast_horizon=1, correc_unit=1, listing_path=None, step=1, **kwargs)#

Initialize from an evaltools.dataset.Dataset object.

Parameters:
  • ds (evaltools.dataset.Dataset object) – Dataset object corresponding observation data.

  • forecast_horizon (int) – Number of forecasted days.

  • correc_unit (float) – Factor to apply to original values.

  • listing_path (str) – Path of the station listing where to retrieve metadata variables. This listing is optional and only used to get metadata.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

  • **kwargs – These parameters (like ‘sep’, ‘sub_list’, …) will be passed to evaltools.utils.read_listing().

Returns:

evaltools.evaluator.Observations object

fromTimeSeries(species, start, end, stations, correc_unit=1, series_type='hourly', forecast_horizon=1, availability_ratio=False, step=1)#

Class method used to construct an object from timeseries files.

Parameters:
  • generic_file_path (str) – Generic path of timeseriesfiles with {year} instead of the year number and {station} instead of the station name.

  • species (str) – Species name (ex: “o3”).

  • start (datetime.date) – Starting day of the studied period.

  • end (datetime.date) – Ending day (included) of the studied period.

  • stations (pandas.DataFrame) – DataFrame with station names as index, and metadata variables as columns.

  • correc_unit (float) – multiplicative factor applied to original values.

  • series_type (str) – Must be equal to ‘hourly’ if time series files contain one row per hour, or equal to ‘daily’ if they contain one row per day.

  • forecast_horizon (int) – Number of day corresponding to the forecast horizon of the model. For example, if the forcast horizon is 4 days, the end date of the studied observations has to be 3 days further than the end of the studied period since the studied period corresponds to the period along which the model has been executed.

  • availability_ratio (float or None) – Minimal rate of data available on the period required to keep a station. If None, stations with only nan values are dropped.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

evaltools.evaluator.Observations object

classmethod from_dataset(ds, forecast_horizon=1, correc_unit=1, listing_path=None, step=1, **kwargs)#

Initialize from an evaltools.dataset.Dataset object.

Parameters:
  • ds (evaltools.dataset.Dataset object) – Dataset object corresponding observation data.

  • forecast_horizon (int) – Number of forecasted days.

  • correc_unit (float) – Factor to apply to original values.

  • listing_path (str) – Path of the station listing where to retrieve metadata variables. This listing is optional and only used to get metadata.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

  • **kwargs – These parameters (like ‘sep’, ‘sub_list’, …) will be passed to evaltools.utils.read_listing().

Returns:

evaltools.evaluator.Observations object

classmethod from_nc(start, end, paths, species, series_type='hourly', forecast_horizon=1, group=None, dim_names={}, coord_var_names={}, listing_path=None, metadata_var={}, correc_unit=1, step=1, **kwargs)#

Construct an object from netcdf files.

To be handle by this method, netcdf variables must be 2-dimensional: the first dimension corresponding to time and the second one to the diferent measurement sites.

Parameters:
  • start (datetime.date) – Starting day of the studied period.

  • end (datetime.date) – Ending day (included) of the studied period.

  • paths (list of str) – List of netcdf files where to retrieve concentration values.

  • species (str) – Species name that must correspond to the name of the retrieved netcdf variable.

  • series_type (str) – It can be ‘hourly’ (values stored with a hourly timestep) or ‘daily’ (values stored with a daily timestep).

  • forecast_horizon (int) – Number of forecasted days.

  • group (None or str) – Group to read within the netcdf file. If equal to None, the root group is read.

  • dim_names (dict) – Use to specify dimension names of the netcdf file. Default names are {‘time’: ‘time’, ‘station_id’: ‘station_id’}.

  • coord_var_names (dict) – Use to specify coordinate variable names of the netcdf file. Default names are {‘time’: ‘time’, ‘station_id’: ‘station_id’}.

  • listing_path (str) – Path of the station listing where to retrieve metadata variables. This listing is optional and only used to get metadata.

  • metadata_var (dict) – Dictionary that define metadata variables to get from de netcdf file. Keys of the provided dictionary are variable names as found in the file, and its values are variable names used for the returned dataset. These metadata variables must have one dimension only, corresponding to the station codes.

  • correc_unit (float) – Factor to apply to original values.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

  • **kwargs – These parameters (like ‘sep’, ‘sub_list’, …) will be passed to evaltools.utils.read_listing().

Returns:

evaltools.evaluator.Observations object

classmethod from_sqlite(start, end, paths, species, table, time_key_name='dt', series_type='hourly', step=1, forecast_horizon=1, listing_path=None, correc_unit=1, **kwargs)#

Construct an object from netcdf files.

To be handle by this method, netcdf variables must be 2-dimensional: the first dimension corresponding to time and the second one to the diferent measurement sites.

Parameters:
  • start (datetime.date) – Starting day of the studied period.

  • end (datetime.date) – Ending day (included) of the studied period.

  • paths (list of str) – List of netcdf files where to retrieve concentration values.

  • species (str) – Species name.

  • table (str) – Name of the table to read in the sqlite file.

  • time_key_name (str) – Unique id of the sqlite table corresponding to the time of the observations.

  • series_type (str) – It can be ‘hourly’ (values stored with a hourly timestep) or ‘daily’ (values stored with a daily timestep).

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

  • forecast_horizon (int) – Number of forecasted days.

  • listing_path (str) – Path of the station listing where to retrieve metadata variables. This listing is optional and only used to get metadata.

  • correc_unit (float) – Factor to apply to original values.

  • **kwargs – These parameters (like ‘sep’, ‘sub_list’, …) will be passed to evaltools.utils.read_listing().

Returns:

evaltools.evaluator.Observations object

classmethod from_time_series(generic_file_path, species, start, end, stations, correc_unit=1, series_type='hourly', forecast_horizon=1, availability_ratio=False, step=1)#

Class method used to construct an object from timeseries files.

Parameters:
  • generic_file_path (str) – Generic path of timeseriesfiles with {year} instead of the year number and {station} instead of the station name.

  • species (str) – Species name (ex: “o3”).

  • start (datetime.date) – Starting day of the studied period.

  • end (datetime.date) – Ending day (included) of the studied period.

  • stations (pandas.DataFrame) – DataFrame with station names as index, and metadata variables as columns.

  • correc_unit (float) – multiplicative factor applied to original values.

  • series_type (str) – Must be equal to ‘hourly’ if time series files contain one row per hour, or equal to ‘daily’ if they contain one row per day.

  • forecast_horizon (int) – Number of day corresponding to the forecast horizon of the model. For example, if the forcast horizon is 4 days, the end date of the studied observations has to be 3 days further than the end of the studied period since the studied period corresponds to the period along which the model has been executed.

  • availability_ratio (float or None) – Minimal rate of data available on the period required to keep a station. If None, stations with only nan values are dropped.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

evaltools.evaluator.Observations object

movingAverageDailyMax(availability_ratio=0.75)#

Compute the daily maximum of the 8-hour moving average.

Parameters:

availability_ratio (float) – Minimal rate of values available to compute the average for a given 8-hour window, and also minimal rate of values available for a given day to compute the daily maximum of the moving average. For example with availability_ratio=0.75, a daily maximum eight hours average can only be calculated if 18 eight hours average are available each of which requires 6 hourly values to be available.

Returns:

evaluator.Observations – Observations object with series_type = ‘daily’ and with data corresponding to the computed daily maximum of the moving average.

moving_average_daily_max(availability_ratio=0.75)#

Compute the daily maximum of the 8-hour moving average.

Parameters:

availability_ratio (float) – Minimal rate of values available to compute the average for a given 8-hour window, and also minimal rate of values available for a given day to compute the daily maximum of the moving average. For example with availability_ratio=0.75, a daily maximum eight hours average can only be calculated if 18 eight hours average are available each of which requires 6 hourly values to be available.

Returns:

evaluator.Observations – Observations object with series_type = ‘daily’ and with data corresponding to the computed daily maximum of the moving average.

normalized_series()#

Return Observations object with normalized series.

This method normalizes each series of observations by substracting its median and dividing by its interquartile range.

Returns:

evaluator.Observations – Observations object with data corresponding to the computed normalized series.

property obs_df#

Get the observation data of the object.

persistenceModel(color='k')#

Build an Evaluator object based on persistence model.

Parameters:

color (str) – Default color that will be used in plotting functions for the new object.

Returns:

Evaluator object.

persistence_model(color='k')#

Build an Evaluator object based on persistence model.

Parameters:

color (str) – Default color that will be used in plotting functions for the new object.

Returns:

Evaluator object.

property seriesType#

Deprecated.

property series_type#

Get the series type of the object.

simVSobs(grid, time, point_size=20, vmin=None, vmax=None, cmap=None, colors=None, bounds=None, output_file=None, file_formats=['png'])#

Scatter plot of observations above simulation raster.

Parameters:
  • grid (evaltools.interpolation.Grid) – Grid object that must contain data for the species of the current Observations object.

  • time (datetime.datetime or datetime.date) – Time for which to plot the observation values. This time must be contained in the current Observations object.

  • point_size (float) – Point size (as define in matplotlib.pyplot.scatter).

  • vmin, vmax (None or scalar) – Min and max values for the legend colorbar. If None, these values are found automatically.

  • cmap (None or matplotlib.colors.Colormap object) – Colors used for plotting (default: matplotlib.cm.jet).

  • colors (None or list of str) – List of color used in the chart if you want to discretize the values.

  • bounds (None or list of scalar) – Boundary values for each category if you want to discretize the values. Arguments vmin and vmax must not be None, and the boundary values contained between vmin and vmax. Ignored if colors is None.

  • output_file (None str) – File where to save the plot (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

sim_vs_obs(grid, time, point_size=20, vmin=None, vmax=None, cmap=None, colors=None, bounds=None, output_file=None, file_formats=['png'])#

Scatter plot of observations above simulation raster.

Parameters:
  • grid (evaltools.interpolation.Grid) – Grid object that must contain data for the species of the current Observations object.

  • time (datetime.datetime or datetime.date) – Time for which to plot the observation values. This time must be contained in the current Observations object.

  • point_size (float) – Point size (as define in matplotlib.pyplot.scatter).

  • vmin, vmax (None or scalar) – Min and max values for the legend colorbar. If None, these values are found automatically.

  • cmap (None or matplotlib.colors.Colormap object) – Colors used for plotting (default: matplotlib.cm.jet).

  • colors (None or list of str) – List of color used in the chart if you want to discretize the values.

  • bounds (None or list of scalar) – Boundary values for each category if you want to discretize the values. Arguments vmin and vmax must not be None, and the boundary values contained between vmin and vmax. Ignored if colors is None.

  • output_file (None str) – File where to save the plot (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

property species#

Get the species of the object.

property startDate#

Deprecated.

property start_date#

Get the starting date of the object.

property step#

Get the time step of the object.

subPeriod(start_date, end_date)#

Build a new Observations object define on a shorter period.

Parameters:
  • start_date (datetime.date) – Starting date of the new object.

  • end_date (datetime.date) – Ending date of the new object.

Returns:

Observations object.

sub_period(start_date, end_date)#

Build a new Observations object define on a shorter period.

Parameters:
  • start_date (datetime.date) – Starting date of the new object.

  • end_date (datetime.date) – Ending date of the new object.

Returns:

Observations object.

to_csv(file_path)#

Save timeseries dataframe as csv.

Parameters:

file_path (str) – Csv file path.

class evaltools.evaluator.Simulations(start_date, end_date, stations, species, model, series_type, forecast_horizon, step=1, path='')#

Bases: object

Class gathering simulations of the studied case.

An object of class Simulations will be specific to one species, one series type, one period and one model.

Parameters:
  • start_date (datetime.date) – Start day of the studied period.

  • end_date (datetime.date) – End day (included) of the studied period.

  • stations (1D array-like of str) – List of the names of studied stations.

  • species (str) – Species name (ex: “o3”).

  • model (str) – Name of the model that produced the simulated data.

  • series_type (str) – It can be ‘hourly’ or ‘daily’.

  • forecast_horizon (int) – Number of day corresponding to the forcast horizon of the model.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

check_values(threshold, drop=False, file_path=None)#

Check if observation values exceed a threshold.

If there are values above the threshold, a message is printed and these values are set to nan if drop == True.

Parameters:
  • threshold (scalar) – Threshold value.

  • drop (bool) – If True, values above the threshold are set to nan.

  • file_path (None or str) – File path where to save the names of stations that exceed the threshold, the path must contain {forecast_day} instead of the forecast day number.

dailyMax(availability_ratio=0.75)#

Return Simulations object working on daily maximum.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily maximum.

Returns:

evaluator.Simulations – Simulations object with series_type = ‘daily’ and with data corresponding to the computed daily maximum.

dailyMean(availability_ratio=0.75)#

Build Simulations object working on daily mean.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily mean.

Returns:

evaluator.Simulations – Simulations object with series_type = ‘daily’ and with data corresponding to the computed daily mean.

daily_max(availability_ratio=0.75)#

Return Simulations object working on daily maximum.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily maximum.

Returns:

evaluator.Simulations – Simulations object with series_type = ‘daily’ and with data corresponding to the computed daily maximum.

daily_mean(availability_ratio=0.75)#

Build Simulations object working on daily mean.

Parameters:

availability_ratio (float) – Minimal rate of data available in a day required to compute the daily mean.

Returns:

evaluator.Simulations – Simulations object with series_type = ‘daily’ and with data corresponding to the computed daily mean.

drop_unrepresentative_stations(availability_ratio=0.75)#

Drop stations with a certain rate of missing values.

Modify Dataset object in place. If for every forecast days, the condition is not fulfilled, a station is dropped.

Parameters:

availability_ratio (float or None) – Minimal rate of data available on the period required to keep a station. If None, stations with only nan values are dropped.

property endDate#

Deprecated.

filteredSeries(availability_ratio=0.75)#

Return Simulations object working on the filtered series.

Parameters:

availability_ratio (float) – Minimal rate of values available for a given hour to compute the daily filtered series for this hour in all days.

Returns:

evaluator.Simulations – Simulations object with series_type = ‘hourly’ and with data corresponding to the computed filtered series.

filtered_series(availability_ratio=0.75)#

Return Simulations object working on the filtered series.

Parameters:

availability_ratio (float) – Minimal rate of values available for a given hour to compute the daily filtered series for this hour in all days.

Returns:

evaluator.Simulations – Simulations object with series_type = ‘hourly’ and with data corresponding to the computed filtered series.

property forecastHorizon#

Deprecated.

property freq#

Get the time step of the object as a datetime.timedelta.

fromDataset(ds_list, stations_idx=None, correc_unit=1, step=1, path='')#

Initialize from an evaltools.dataset.Dataset object.

Stations kept in the returned Dataset are the intersection of stations from the Dataset objects of ds_list.

Parameters:
  • model (str) – Name of the model.

  • ds_list (list of evaltools.dataset.Dataset objects) – Dataset objects corresponding to the different forecast terms. The order in the list matters.

  • stations_idx (list of str) – List of the names of studied stations.

  • correct_unit (float) – Factor to apply to original values.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

evaltools.evaluator.Simulations object

fromTimeSeries(stations_idx, species, model, start, end, forecast_horizon=1, correc_unit=1, series_type='hourly', availability_ratio=False, step=1)#

Class method used to construct an object from timeseries files.

Parameters:
  • generic_file_path (str) – Generic path of timeseriesfiles with {year} instead of the year number, {station} instead of the station name and {forecastDay} instead of the forecast day number.

  • stations_idx (list of str) – List of the names of studied stations.

  • species (str) – Species name (ex: “o3”).

  • model (str) – Name of the model that produced the simulated data.

  • start (datetime.date) – Start day of the studied period.

  • end (datetime.date) – End day (included) of the studied period.

  • forecast_horizon (int) – Number of day corresponding to the forcast horizon of the model.

  • correc_unit (float) – Multiplicative factor applied to original values.

  • series_type (str) – It can be ‘hourly’ or ‘daily’.

  • availability_ratio (float or None) – Minimal rate of data available on the period required to keep a station. If None, stations with only nan values are dropped.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

classmethod from_dataset(model, ds_list, stations_idx=None, correc_unit=1, step=1, path='')#

Initialize from an evaltools.dataset.Dataset object.

Stations kept in the returned Dataset are the intersection of stations from the Dataset objects of ds_list.

Parameters:
  • model (str) – Name of the model.

  • ds_list (list of evaltools.dataset.Dataset objects) – Dataset objects corresponding to the different forecast terms. The order in the list matters.

  • stations_idx (list of str) – List of the names of studied stations.

  • correct_unit (float) – Factor to apply to original values.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

evaltools.evaluator.Simulations object

classmethod from_nc(model, start, end, paths, species, series_type='hourly', stations=None, forecast_horizon=1, group=None, dim_names={}, coord_var_names={}, correc_unit=1, step=1)#

Construct an object from netcdf files.

To be handle by this method, netcdf variables must be 2-dimensional: the first dimension corresponding to time and the second one to the diferent measurement sites.

Parameters:
  • model (str) – Name of the model.

  • start (datetime.date) – Starting day of the studied period.

  • end (datetime.date) – Ending day (included) of the studied period.

  • species (str) – Species name that must correspond to the name of the retrieved netcdf variable.

  • forecast_horizon (int) – Forecast term given as the number of forecasted days.

  • paths (list or dictionary) – If ‘paths’ is given as a dictionary, keys must be ‘D0’, ‘D1’, … corresponding to the forecast terms, and values must be lists of str corresponding to paths to netcdf files where to find concentration values. If ‘paths’ is given as a list, this list will be used for every forcast terms. Thus, either forecast_horizon=1 or the netcdf group to fetch in the files must depends on the forecast term (ex: group=’D{fd}’).

  • series_type (str) – It can be ‘hourly’ (values stored with a hourly timestep) or ‘daily’ (values stored with a daily timestep).

  • stations (None or list of str) – List of the names of studied stations. If None, all stations are kept.

  • group (None or str) – Group to read within the netcdf file. If equal to None, the root group is read. The group name you provide can contains ‘{fd}’ that will be replaced according to the number of the forecasted day (ex: group=’D{fd}’).

  • dim_names (dict) – Use to specify dimension names of the netcdf file. Default names are {‘time’: ‘time’, ‘station_id’: ‘station_id’}.

  • coord_var_names (dict) – Use to specify coordinate variable names of the netcdf file. Default names are {‘time’: ‘time’, ‘station_id’: ‘station_id’}.

  • correc_unit (float) – Factor to apply to original values.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

evaltools.evaluator.Simulations object

classmethod from_sqlite(model, start, end, paths, species, table, time_key_name='dt', series_type='hourly', step=1, forecast_horizon=1, stations=None, correc_unit=1)#

Construct an object from sqlite files.

To be handle by this method, sqlite tables must have their unique id corresponding to the time and columns corresponding to the measurements sites.

Parameters:
  • model (str) – Name of the model.

  • start (datetime.date) – Starting day of the studied period.

  • end (datetime.date) – Ending day (included) of the studied period.

  • paths (list or dictionary) – If ‘paths’ is given as a dictionary, keys must be ‘D0’, ‘D1’, … corresponding to the forecast terms, and values must be lists of str corresponding to paths to sqlite files where to find concentration values. If ‘paths’ is given as a list, this list will be used for every forcast terms. Thus, either forecast_horizon=1 or the sqlite table name must depend on the forecast term (ex: ‘D{fd}’).

  • species (str) – Species name.

  • table (str) – Name of the table to read in the sqlite file.

  • time_key_name (str) – Unique id of the sqlite table corresponding to the time of the observations.

  • series_type (str) – It can be ‘hourly’ (values stored with a hourly timestep) or ‘daily’ (values stored with a daily timestep).

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

  • forecast_horizon (int) – Forecast term given as the number of forecasted days.

  • stations (None or list of str) – List of the names of studied stations. If None, all stations are kept.

  • correc_unit (float) – Factor to apply to original values.

Returns:

evaltools.evaluator.Simulations object

classmethod from_time_series(generic_file_path, stations_idx, species, model, start, end, forecast_horizon=1, correc_unit=1, series_type='hourly', availability_ratio=False, step=1)#

Class method used to construct an object from timeseries files.

Parameters:
  • generic_file_path (str) – Generic path of timeseriesfiles with {year} instead of the year number, {station} instead of the station name and {forecastDay} instead of the forecast day number.

  • stations_idx (list of str) – List of the names of studied stations.

  • species (str) – Species name (ex: “o3”).

  • model (str) – Name of the model that produced the simulated data.

  • start (datetime.date) – Start day of the studied period.

  • end (datetime.date) – End day (included) of the studied period.

  • forecast_horizon (int) – Number of day corresponding to the forcast horizon of the model.

  • correc_unit (float) – Multiplicative factor applied to original values.

  • series_type (str) – It can be ‘hourly’ or ‘daily’.

  • availability_ratio (float or None) – Minimal rate of data available on the period required to keep a station. If None, stations with only nan values are dropped.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

movingAverageDailyMax(availability_ratio=0.75)#

Compute the daily maximum of the 8-hour moving average.

Parameters:

availability_ratio (float) – Minimal rate of values available to compute the average for a given 8-hour window, and also minimal rate of values available for a given day to compute the daily maximum of the moving average. For example with availability_ratio=0.75, a daily maximum eight hours average can only be calculated if 18 eight hours average are available each of which requires 6 hourly values to be available.

Returns:

evaluator.Simulations – Simulations object with series_type = ‘daily’ and with data corresponding to the computed daily maximum of the moving average.

moving_average_daily_max(availability_ratio=0.75)#

Compute the daily maximum of the 8-hour moving average.

Parameters:

availability_ratio (float) – Minimal rate of values available to compute the average for a given 8-hour window, and also minimal rate of values available for a given day to compute the daily maximum of the moving average. For example with availability_ratio=0.75, a daily maximum eight hours average can only be calculated if 18 eight hours average are available each of which requires 6 hourly values to be available.

Returns:

evaluator.Simulations – Simulations object with series_type = ‘daily’ and with data corresponding to the computed daily maximum of the moving average.

normalized_series()#

Return Simulations object with normalized series.

This method normalizes each series of simulations by substracting its median and dividing by its interquartile range.

Returns:

evaluator.Simulations – Simulations object with series_type = ‘hourly’ and with data corresponding to the computed normalized series.

property seriesType#

Deprecated.

property simDF#

Deprecated.

property sim_df#

Get the simulation data of the object.

property startDate#

Deprecated.

property stations#

Get the station list of the object.

property step#

Get the time step of the object.

subPeriod(start_date, end_date)#

Build a new Simulations object define on a shorter period.

Parameters:
  • start_date (datetime.date) – Starting date of the new object.

  • end_date (datetime.date) – Ending date of the new object.

Returns:

Simulations object.

sub_period(start_date, end_date)#

Build a new Simulations object define on a shorter period.

Parameters:
  • start_date (datetime.date) – Starting date of the new object.

  • end_date (datetime.date) – Ending date of the new object.

Returns:

Simulations object.

to_csv(file_path)#

Save timeseries dataframe as csv.

Parameters:

file_path (str) – File path with {forecast_day} instead of forecast day number.

to_obs(forecast_day=0)#

Transform Simulations object to Observations object.

Parameters:

forecast_day (int) – Forecast day for which to keep simulation data.

evaltools.evaluator.load(input_file_path)#

Load an evaluator.Evaluator object.

Load an evaluator.Evaluator object saved in binary format with evaluator.Evaluator.dump method.

Parameters:

input_file_path (str) – Path of the binary file to load.

Returns:

evaluator.Evaluator object

evaltools.plotting module#

This module gathers scores plotting functions.

evaltools.plotting.plotting.plot_average_ft_scores(objects, score, score_name=None, min_nb_sta=10, availability_ratio=0.75, score_type='temporal', averaging='median', labels=None, colors=None, linestyles=None, markers=None, title='', xlabel=None, black_axes=False, nb_of_minor_ticks=(3, 1), outlier_thresh=None, output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot the average score for each forecast time.

This function is based on Evaluator.average_ft_scores method. If score_type is ‘temporal’, the score is first computed for each station at every times. Then, the spatial average of these score values is taken for each forecast time. If score_type is ‘spatial’, the score is first computed for each time and then the temporal average of these score values is taken for each forecast time.

On the plot, there will be as many lines as there are objects given in the object list.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • score (str) – Computed score.

  • score_name (str | dict | None) – String to write in the title instead of the default score name. If a dictionary is passed, it must have a key corresponding to the score argument. If None, the score name is written as in the score argument.

  • min_nb_sta (int) – Minimal number of stations required to compute the average of the score if score_type is ‘temporal, or to compute the score itself if score_type is ‘spatial’.

  • availability_ratio (float) – Minimum required rate of available data over the period to calculate the score if score_type is ‘temporal, or to calculate the average of the score if score_type is ‘spatial’.

  • score_type (str) – Computing method selected from ‘temporal’ or ‘spatial’.

  • averaging (str) – Type of score averaging choosen among ‘mean’ or ‘median’.

  • labels (None or list of str) – List of labels for the legend.

  • colors (None or list of str) – Line colors corresponding to each object.

  • linestyles (None or list of str) – Line styles corresponding to each object.

  • markers (None or list of str) – Line markers corresponding to each object.

  • title (str) – Title for the plot. It can contain {score} instead of the score name.

  • xlabel (str) – Label for the x axis.

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • nb_of_minor_ticks (tuple of int) – Number of minor ticks for x and y axes.

  • outlier_thresh (scalar or None) – If not None, it correspond to the threshold used in evaltools.plotting._is_outlier to determine if a model is an outlier. If outliers are detected, y boudaries do not take them into account.

  • output_csv (str or None) – File where to save the data. The file name can contain {score} instead of the score name.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_bar_contingency_table(objects, threshold, forecast_day=0, start_end=None, title='', labels=None, ymin=None, ymax=None, bar_kwargs=None, output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Draw a barplot from the contingency table.

For each object, draw a bar for good detections, false alarms, and missed alarms.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • threshold (scalar) – Threshold value.

  • forecast_day (int) – Integer corresponding to the chosen forecast day used for plotting.

  • start_end (None or list of two datetime.date objects) – Boundary dates for the studied period.

  • title (str) – Title for the figure.

  • labels (None or list of str) – List of labels coresponding to each object.

  • ymin, ymax (None or scalar) – Limits of the y axis.

  • output_csv (str or None) – File where to save the data.

  • bar_kwargs (dict) – Additional keyword arguments passed to pandas.DataFrame.plot.bar.

  • annotation (str or None) – Additional information to write in figure’s upper left corner.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_bar_exceedances(obj, threshold, data='obs', start_end=None, forecast_day=0, labels=None, title='', ylabel=None, ymin=None, ymax=None, subregions=None, xticking='daily', date_format='%Y-%m-%d', bar_kwargs={}, output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Draw a Barplot of threshold exceedances.

Draw a barplot of threshold exceedances for the period defined by start_end. If there are subregions, their bars will be respectively drawn one above the other.

Parameters:
  • obj (evaltools.Evaluator object) – Evaluator object used for plotting.

  • threshold (scalar) – Threshold value.

  • data (str) – Data to be used to compute exceedances. Can be “obs” or “sim”.

  • start_end (None or list of two datetime.date objects) – Boundary dates for abscissa.

  • forecast_day (int.) – Integer corresponding to the chosen forecast day used for plotting.

  • labels (None or list of str) – List of labels for the legend.

  • title (str) – Title for the plot. May contain {total} to print total number of obs/sim that exceed threshold.

  • ylabel (str) – Ordinate axis label.

  • subregions (None or list of list of str) – One set of bars per sub-region will be drawn. The sub-regions must be given like [[‘FR’], [‘ES’, ‘FR’], ‘all’, …], where ‘FR’, ‘ES’, … are the first letters of the station codes you want to keep and ‘all’ means that all stations are kept.

  • xticking (str) – Defines the method used to set x ticks. It can be either ‘daily’, ‘mondays’ or ‘bimonthly’.

  • date_format (str) – String format for dates as understood by python module datetime.

  • bar_kwargs (dict) – Additional keyword arguments passed to pandas.DataFrame.plot.bar.

  • ymin, ymax (None or scalar) – Limits of the y axis.

  • output_csv (str or None) – File where to save the data.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_bar_scores(objects, score, forecast_day=0, averaging='mean', title='', labels=None, colors=None, subregions=None, xtick_labels=None, availability_ratio=0.75, bar_kwargs={}, ref_line=None, output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Draw a barplot of scores.

Draw a barplot with one bar per object. If there are subregions, one set of bars per region will be drawn.

The score are first computed for each measurement sites and then averaged.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • score (str) – Computed score.

  • forecast_day (int) – Integer corresponding to the chosen forecast day used for plotting.

  • averaging (str) – Type of score averaging choosen among ‘mean’ or ‘median’.

  • title (str) – Title for the figure.

  • labels (None or list of str) – List of labels for the legend.

  • colors (None or list of str) – Bar colors corresponding to each object.

  • subregions (None or list of list of str) – One set of bars per sub-region will be drawn. The sub-regions must be given like [[‘FR’], [‘ES’, ‘FR’], ‘all’, …], where ‘FR’, ‘ES’, … are the first letters of the station codes you want to keep and ‘all’ means that all stations are kept.

  • xtick_labels (None or list of str) – List of labels for the xticks. These labels corresponds to sub-region define subregions argument. Labels can contain ‘{nbStations}’ that will be replaced by the corresponding number of stations used to compute the score (warning: if several objects are displayed, the number of stations corresponds to the first one).

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast term to compute the mean scores.

  • bar_kwargs (dict) – Additional keyword arguments passed to pandas.DataFrame.plot.bar.

  • ref_line (dict of args or None) – Plot an horizontal line whose arguments are passed to matplotlib.pyplot.hline.

  • output_csv (str or None) – File where to save the data.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_bar_scores_conc(objects, score, conc_range, forecast_day=0, averaging='mean', title=None, labels=None, colors=None, xtick_labels=None, min_nb_val=10, based_on='obs', bar_kwargs={}, nb_vals=True, output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Barplot for scores per concentration class.

Data is grouped depending on the desired concentration classes, then scores are computed for each site and averaged.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • score (str) – Computed score.

  • conc_range (list of scalars) – List used to determine concentration intervals on which to compute scores. Must contain at least two values. e.g [25, 45, 80] determines scores with concentrations between 25 and 45, and between 45 and 80.

  • forecast_day (int) – Integer corresponding to the chosen forecast day used for plotting.

  • averaging (str) – Type of score averaging choosen among ‘mean’ or ‘median’.

  • title (str) – Title for the figure.

  • labels (None or list of str) – List of labels for the legend.

  • colors (None or list of str) – Bar colors corresponding to each object.

  • xtick_labels (None or list of str) – List of labels for the xticks.

  • min_nb_val (int) – Minimal number of (obs, sim) couple required for a score to be computed.

  • based_on (str) – If ‘sim’, concentrations are determined from simulation data. Else (‘obs’) they are determined with observations.

  • bar_kwargs (dict) – Additional keyword arguments passed to pandas.DataFrame.plot.bar.

  • nb_vals (boolean) – Whether the number of computed values for each bar must be displayed or not.

  • output_csv (str or None) – File where to save the data. The file name can contain {score} instead of the score name.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_bb(objects_lol, forecast_day=0, averaging='mean', mean_obs=True, title='', labels_list=None, colors_list=None, groups_labels=None, output_file=None, file_formats=['png'], availability_ratio=0.75, adapt_size=0.25, ncol=2, subplots_adjust={'top': 0.85}, fig=None, ax=None)#

Synthetic plot of scores.

For each object, plots a bar for RMSE and lollipops for bias and correlation. Different groups of objects are separated by white space.

Scores are first computed for each measurement sites and then averaged.

Parameters:
  • objects_lol (list of lists of evaltools.Evaluator objects) – Evaluator objects used for plotting. One list per group.

  • forecast_day (int) – Integer corresponding to the chosen forecast day used for plotting.

  • averaging (str) – Type of score averaging choosen among ‘mean’ or ‘median’.

  • mean_obs (boolean) – Whether to represent mean obs concentration or not.

  • title (str) – Title for the figure.

  • labels_list (None or list of str) – List of labels for the legend.

  • colors_list (None or list of str) – Bar colors corresponding to each object.

  • groups_labels (None or list of str) – List of labels for the groups.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast term to compute the mean scores.

  • adapt_size (float) – Coefficient to increase or reduce subplots width.

  • ncol (int) – Number of columns in legend.

  • subplots_adjust (dict) – Keyword arguments passed to matplotlib.pyplot.subplots_adjust.

evaltools.plotting.plotting.plot_comparison_scatter_plot(score, xobject, yobject, forecast_day=0, title='', xlabel=None, ylabel=None, availability_ratio=0.75, nb_outliers=5, black_axes=False, color_by=None, output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Scatter plot to compare two Evaluator objects.

One score is calculated for two Evaluator objects and used as xy coordinates. Points are colored according to the density of points.

Parameters:
  • score (str) – Computed score.

  • xobject, yobject (evaltools.Evaluator object) – Objects used for plotting.

  • forecast_day (int) – Integer corresponding to the chosen forecast day used for plotting.

  • title (str) – Title for the plots.

  • xlabel, ylabel (str) – Labels corresponding to xobject and yobject used for axis labels.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute station scores.

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • color_by (None or dictionary) – Dictionary with keys corresponding to station names and values corresponding to colors.

  • annotation (str or None) – Additional information to write in figure’s upper left corner.

  • output_csv (str or None) – File where to save the data.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_data_box(objects, forecast_day=0, labels=None, colors=None, obs_style={}, title='', showfliers=True, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot distribution of the data values in a boxplot.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • forecast_day (int.) – Integer corresponding to the chosen forecast day used for plotting.

  • labels (list of str) – List of labels for the legend.

  • colors (None or list of str) – Line colors corresponding to each object.

  • obs_style (dict) – Dictionary of arguments passed to pyplot.plot for observation curve. Default value is {‘color’: ‘k’}.

  • title (str) – Title for the plot.

  • showfliers (bool) – Show the outliers beyond the caps.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_data_density(objects, forecast_day=0, labels=None, colors=None, linestyles=None, title='', xmin=None, xmax=None, obs_style=None, black_axes=False, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot the probability density function.

Draw the probability density of observed (for the first object only) and simulated data. This function is based on kernel density estimation.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • forecast_day (int.) – Integer corresponding to the chosen forecast day used for plotting.

  • labels (list of str) – List of labels for the legend.

  • colors (None or list of str) – Line colors corresponding to each object.

  • linestyles (None or list of str) – Line styles corresponding to each object.

  • title (str) – Title for the plot.

  • xmin, xmax (None or scalar) – Limits of the x axis.

  • obs_style (dict) – Dictionary of arguments passed to pyplot.plot for observation curve. Default value is {‘color’: ‘k’, ‘alpha’: 0.5}.

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_diurnal_cycle(objects, station_list=None, availability_ratio=0.75, colors=None, linestyles=None, markers=None, labels=None, title='', xlabel=None, ylabel='concentration', ymin=None, ymax=None, normalize=False, plot_type=None, envelope=False, black_axes=False, obs_style=None, nb_of_minor_ticks=(3, 1), output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot the diurnal cycle of observations and simulations.

On each plot, there is one line per station and per object. Observations are taken from the first object of the list.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • station_list (None or list of str) – List of stations to display. If None, all stations of the first element of <objects> argument are processed.

  • colors (None or list of str) – Line colors corresponding to each object.

  • linestyles (None or list of str) – Line styles corresponding to each object.

  • markers (None or list of str) – Line markers corresponding to each object.

  • labels (None or list of str) – List of labels for the legend.

  • title (str) – Title for the plot.

  • xlabel, ylabel (str) – Label for x and y axes.

  • ymin, ymax (None or scalar) – Limits of the axes.

  • normalize (bool) – If True, values are normalized for each station by substracting the mean and dividing by the standard deviation of the diurnal cycle values for the station.

  • plot_type (str) – If ‘median’ the median of all stations values at a forecast hour is plotted. If ‘mean’ the mean of all stations values at a forecast hour is plotted. For any other value, all station values are plotted separately.

  • envelope (bool) – Only use if plot_type is ‘median’ or ‘mean’. If True, draw quartiles one and three around the median curve (case plot_type == ‘median’) or draw +/- the standard deviation around the mean curve (case plot_type == ‘mean’).

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • obs_style (dict) – Dictionary of arguments passed to pyplot.plot for observation curve. Default value is {‘color’: ‘k’, ‘alpha’: 0.5}.

  • nb_of_minor_ticks (tuple of int) – Number of minor ticks for x and y axes.

  • output_csv (str or None) – File where to save the data. The file name must contain {model} instead of the model name.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_exceedances_scores(objects, threshold, score_list=None, forecast_day=0, title='', labels=None, colors=None, subregions=None, subregion_labels=None, file_formats=['png'], output_file=None, output_csv=None, bar_kwargs={}, start_end=None)#

Barplot for scores.

Draw a barplot showing thirteen scores, with one bar per object. If there are subregions, thirteen barplots are built with one set of bars per region.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • forecast_day (int) – Integer corresponding to the chosen forecast day used for plotting.

  • threshold (scalar) – Threshold value used to compute the scores.

  • score_list (list of str) – List of scores to display. If None, all available scores are displayed. Available scores are

    • ‘accuracy’ : Accuracy

    • ‘bias_score’ : Bias score

    • ‘success_ratio’ : Success ratio

    • ‘hit_rate’ : probability of detection (Hit rate)

    • ‘false_alarm_ratio’ : false alarm ratio

    • ‘prob_false_detect’ : probability of false detection

    • ‘threat_score’ : Threat Score

    • ‘equitable_ts’ : Equitable Threat Score

    • ‘peirce_ss’ : Peirce Skill Score (Hanssen and Kuipers discriminant)

    • ‘heidke_ss’ : Heidke Skill Score

    • ‘rousseau_ss’ : Rousseau Skill Score

    • ‘odds_ratio’ : Odds Ratio

    • ‘odds_ratio_ss’ : Odds Ratio Skill Score

  • title (str) – Title for the plot. If subregions is not None, it can contain ‘{score}’.

  • labels (None or list of str) – List of labels for the legend corresponding to the objects.

  • colors (None or list of str) – Bar colors corresponding to each object.

  • subregions (None or list of list of str) – One set of bars per sub-region will be drawn. The sub-regions must be given like [[‘FR’], [‘ES’, ‘FR’], ‘all’, …], where ‘FR’, ‘ES’, … are the first letters of the station codes you want to keep and ‘all’ means that all stations are kept.

  • subregion_labels (None or list of str) – List of labels for the xticks. These labels corresponds to sub-region define subregions argument. Labels can contain ‘{nbStations}’ that will be replaced by the corresponding number of stations used to compute the score (warning: if several objects are displayed, the number of stations corresponds to the first one). This argument is ignore if subregions is None.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window. If subregions is not None, it must contain ‘{score}’ instead of the score name.

  • output_csv (str or None) – File where to save the data. The file name must contain {score} instead of the score name.

  • file_formats (list of str) – List of file extensions.

  • bar_kwargs (dict) – Additional keyword arguments passed to pandas.DataFrame.plot.bar.

  • start_end (None or list of two datetime.date objects) – Boundary dates used to select only data for a sub-period.

Returns:

List of couples (matplotlib.figure.Figure, matplotlib.axes._axes.Axes) – Figure and Axes objects corresponding to each plots. Note that if the plots have been shown in the user interface window, these figures and axes will not be usable again.

evaltools.plotting.plotting.plot_line_exceedances(objects, threshold, start_end=None, forecast_day=0, labels=None, colors=None, linestyles=None, markers=None, title='', ylabel=None, xticking='daily', date_format=None, ymin=None, ymax=None, obs_style=None, output_csv=None, black_axes=False, nb_of_minor_ticks=(1, 2), fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot threshold exceedances over time.

This function is based on contingency table. On each plot, there will be as many lines as there are objects the object list, plus one line with the observations of first object.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • threshold (scalar) – Threshold value.

  • start_end (None or list of two datetime.date objects) – Boundary dates for abscissa.

  • forecast_day (int.) – Integer corresponding to the chosen forecast day used for plotting.

  • labels (None or list of str) – List of labels for the legend.

  • colors (None or list of str) – Line colors corresponding to each object.

  • linestyles (None or list of str) – Line styles corresponding to each object.

  • markers (None or list of str) – Line markers corresponding to each object.

  • title (str) – Title for the plot.

  • ylabel (str) – Ordinate axis label.

  • xticking (str) – Defines the method used to set x ticks. It can be either ‘daily’, ‘mondays’, ‘bimonthly’ or ‘auto’.

  • date_format (str) – String format for dates as understood by python module datetime.

  • ymin, ymax (None or scalar) – Limits of the y axis.

  • obs_style (dict) – Dictionary of arguments passed to pyplot.plot for observation curve. Default value is {‘color’: ‘k’, ‘alpha’: 0.5}.

  • output_csv (str or None) – File where to save the data.

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • nb_of_minor_ticks (tuple of int) – Number of minor ticks for x and y axes.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_mean_time_scores(objects, score, score_name=None, min_nb_sta=10, availability_ratio=0.75, labels=None, colors=None, linestyles=None, markers=None, title='', xlabel=None, black_axes=False, nb_of_minor_ticks=(3, 1), outlier_thresh=None, output_csv=None, annotation=None, output_file=None, file_formats=['png'], fig=None, ax=None)#

Plot the temporal mean of spatial scores for each forecast time.

This function is based on Evaluator.average_ft_scores method. The score is first computed for each time along available stations. Then, the mean of this score is taken for each forecast time.

On the plot, there will be as many lines as there are objects given in the object list.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • score (str) – Computed score.

  • score_name (str | dict | None) – String to write in the title instead of the default score name. If a dictionary is passed, it must have a key corresponding to the score argument. If None, the score name is written as in the score argument.

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required per datetime to compute the scores (before applying mean for each forecast time).

  • availability_ratio (float) – Minimal rate of data (computed scores per time) available on the period required per forecast time to compute the mean scores.

  • labels (None or list of str) – List of labels for the legend.

  • colors (None or list of str) – Line colors corresponding to each object.

  • linestyles (None or list of str) – Line styles corresponding to each object.

  • markers (None or list of str) – Line markers corresponding to each object.

  • title (str) – Title for the plot. It can contain {score} instead of the score name.

  • xlabel (str) – Label for the x axis.

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • nb_of_minor_ticks (tuple of int) – Number of minor ticks for x and y axes.

  • outlier_thresh (scalar or None) – If not None, it correspond to the threshold used in evaltools.plotting._is_outlier to determine if a model is an outlier. If outliers are detected, y boudaries do not take them into account.

  • output_csv (str or None) – File where to save the data. The file name can contain {score} instead of the score name.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_median_station_scores(objects, score, score_name=None, min_nb_sta=10, availability_ratio=0.75, labels=None, colors=None, linestyles=None, markers=None, title='', xlabel=None, black_axes=False, nb_of_minor_ticks=(3, 1), outlier_thresh=None, annotation=None, output_file=None, file_formats=['png'], output_csv=None, fig=None, ax=None)#

Plot the spatial median of temporal scores for each forecast time.

This function is based on Evaluator.average_ft_scores method. The score is first computed for each station at every times. Then, the median of these score values is taken for each forecast time.

On the plot, there will be as many lines as there are objects given in the object list.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • score (str) – Computed score.

  • score_name (str | dict | None) – String to write in the title instead of the default score name. If a dictionary is passed, it must have a key corresponding to the score argument. If None, the score name is written as in the score argument.

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required per datetime to compute the scores (before applying mean for each forecast time).

  • availability_ratio (float) – Minimal rate of data (computed scores per time) available on the period required per forecast time to compute the mean scores.

  • labels (None or list of str) – List of labels for the legend.

  • colors (None or list of str) – Line colors corresponding to each object.

  • linestyles (None or list of str) – Line styles corresponding to each object.

  • markers (None or list of str) – Line markers corresponding to each object.

  • title (str) – Title for the plot. It can contain {score} instead of the score name.

  • xlabel (str) – Label for the x axis.

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • nb_of_minor_ticks (tuple of int) – Number of minor ticks for x and y axes.

  • outlier_thresh (scalar or None) – If not None, it correspond to the threshold used in evaltools.plotting._is_outlier to determine if a model is an outlier. If outliers are detected, y boudaries do not take them into account.

  • output_csv (str or None) – File where to save the data. The file name can contain {score} instead of the score name.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_performance_diagram(objects, threshold, forecast_day=0, labels=None, colors=None, markers=None, title='Performance Diagram', start_end=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot a performance diagram.

This diagram is created by plotting the probability of detection (true positive rate) against the success ratio (positive predictive value) relative to the chosen detection threshold. It also displays the critical success index (threat score) and the frequency bias.

References

Roebber, P.J., 2009: Visualizing multiple measures of forecast quality.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • threshold (scalar) – Threshold value.

  • forecast_day (int) – Forecast day for which to use data to compute the diagram.

  • labels (list of str) – List of labels for the legend.

  • colors (None or list of str) – Marker colors corresponding to each object.

  • markers (None or list of str) – Marker shapes corresponding to each object.

  • title (str) – Diagram title.

  • start_end (None or list of two datetime.date objects) – Boundary dates used to select only data for a sub-period.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_quarterly_score(files, labels, score, first_quarter=None, last_quarter=None, colors=None, linestyles=None, markers=None, title=None, thres=None, ylabel=None, origin_zero=False, black_axes=False, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot quarterly score values saved in some files.

This function is based on Evaluator.quarterlyMedianScore method. On each plots, there will be as many lines as there are file paths in <files> argument.

Parameters:
  • files (list of str) – Paths of files used for plotting (one file corresponds to one line on the plot).

  • labels (list of str) – List of labels for the legend (length of the label list must be equal to the length of the file list).

  • score (str) – The score that should correspond to data stored in the files.

  • first_quarter (str) – String corresponding to the oldest plotted quarter.

  • last_quarter (str) – String corresponding to the latest plotted quarter.

  • colors (list of str) – Line colors corresponding to each files.

  • linestyles (None or list of str) – Line styles corresponding to each object.

  • markers (None or list of str) – Line markers corresponding to each object.

  • title (str) – Title for the plots.

  • thres (None or float) – If not None, a horizontal yellow line is plotted with equation y = thres.

  • ylabel (str) – Ordinate axis label.

  • origin_zero (bool) – If True, minimal value of y axis is set to zero.

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_roc_curve(objects, thresholds, forecast_day=0, labels=None, colors=None, markers=None, title='ROC diagram', xlabel=None, ylabel=None, start_end=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Draw the ROC curve for each object.

A ROC curve is created by plotting the true positive rate against the false positive rate at various threshold values. Thus, it can be useful to assess model performance regarding threshold exceedances. In the legend, SSr (Skill Score Ratio) corresponds to the Gini coefficient which is the area between the ROC curve and the no-discrimination line multiplied by two.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • thresholds (list of scalars) – Threshold values.

  • forecast_day (int) – Forecast day for which to use data to compute the ROC diagram.

  • labels (list of str) – List of labels for the legend.

  • colors (None or list of str) – Marker colors corresponding to each object.

  • markers (None or list of str) – Marker shapes corresponding to each object.

  • title (str) – Chart title.

  • xlabel (str) – Label for the x axis.

  • ylabel (str) – Label for the y axis.

  • start_end (None or list of two datetime.date objects) – Boundary dates used to select only data for a sub-period.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_score_box(objects, score, forecast_day=0, score_type='temporal', availability_ratio=0.75, min_nb_sta=10, labels=None, colors=None, title='', nb_stations=False, showfliers=True, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot distribution of the score values in a boxplot.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • score (str) – Computed score.

  • forecast_day (int.) – Integer corresponding to the chosen forecast day used for plotting.

  • score_type (str) – Computing method selected among ‘temporal’ or ‘spatial’.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast term to compute a temporal score (ignored if score_type = ‘spatial’).

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required per datetime to compute a spatial score (ignored if score_type = ‘temporal’).

  • labels (list of str) – List of labels for the legend.

  • colors (None or list of str) – Line colors corresponding to each object.

  • title (str) – Title for the plot.

  • nb_stations (bool) – If True, the number of stations used to draw the boxes is displayed in the legend (ignored if score_type = ‘spatial’).

  • showfliers (bool) – Show the outliers beyond the caps.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_score_density(objects, score, forecast_day=0, score_type='temporal', availability_ratio=0.75, min_nb_sta=10, labels=None, colors=None, linestyles=None, title='', nb_stations=False, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot the probability density function of score values.

This function is based on kernel density estimation.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • score (str) – Computed score.

  • forecast_day (int.) – Integer corresponding to the chosen forecast day used for plotting.

  • score_type (str) – Computing method selected among ‘temporal’ or ‘spatial’.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast term to compute a temporal score (ignored if score_type = ‘spatial’).

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required per datetime to compute a spatial score (ignored if score_type = ‘temporal’).

  • labels (list of str) – List of labels for the legend.

  • colors (None or list of str) – Line colors corresponding to each object.

  • linestyles (None or list of str) – Line styles corresponding to each object.

  • title (str) – Title for the plot.

  • nb_stations (bool) – If True, write the number of stations in the legend (ignored if score_type = ‘spatial’).

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_score_quartiles(objects, xscore, yscore, score_type='temporal', forecast_day=0, availability_ratio=0.75, min_nb_sta=10, title='', colors=None, labels=None, invert_xaxis=False, invert_yaxis=False, black_axes=False, xmin=None, xmax=None, ymin=None, ymax=None, output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Scatter plot of median station scores.

Plot the median of the station scores surrounded by a rectangle correponding to the first and third quartiles. This chart is based on Evaluator.stationScores method.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • xscore (str) – Score used for the x axis.

  • yscore (str) – Score used for the y axis.

  • score_type (str) – Computing method selected from ‘temporal’ or ‘spatial’.

  • forecast_day (int) – Forecast day used for score computing.

  • availability_ratio (float) – Minimum required rate of available data over the period to calculate the scores (or the quartiles of the scores if score_type=’spatial’).

  • min_nb_sta (int) – Minimal number of stations required to compute the quartiles of the scores (or the scores themself if score_type=’spatial’).

  • title (str) – Chart title.

  • colors (None or list of str) – Marker colors corresponding to each object.

  • labels (list of str) – List of objects labels for the legend.

  • invert_xaxis, invert_yaxis (bool) – If True, the axis is inverted.

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • xmin, xmax, ymin, ymax (None or scalar) – Limits of the axes.

  • output_csv (str or None) – File where to save the data.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_significant_differences(score_list, former_objects, later_objects, score_type='temporal', forecast_day=0, title='', xlabels=None, ylabels=None, availability_ratio=0.75, min_nb_sta=10, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Chart showing the significativity of differences between simulations.

Statictical tests are applied to temporal or spatial score values. Differences between the two simulations are considered significant if both tests are significant (ttest_ind : H0 = the two samples have the same mean; mannwhitneyu : H0 = the two samples have the same distribution). If one of the tests is not significant, the cell is yellow (value = 0). Else, the value is computed from the 9 percentiles (10, …, 90) of each samples :

value = sum_i(sign(p_i - q_i)*1) for PearsonR, SpearmanR or FactOf2

value = sum_i(sign(abs(p_i) - abs(q_i))*1) for other scores

with (p_i) et (q_i) the percentiles of the two samples.

Parameters:
  • score_list (list of str) – List of scores for which investigating differences.

  • former_objects, later_objects (list of evaltools.Evaluator objects) – Two list of objects that are compared.

  • score_type (str) – Score computing method selected from ‘temporal’ or ‘spatial’.

  • forecast_day (int) – Integer corresponding to the chosen forecast day used for plotting.

  • title (str) – Title for the plots. It must contain {score} instead of the score name.

  • xlabels, ylabels (None or list of str) – Labels for the x axis (equal to score_list if None) and labels for the y axis (corresponding to each object couple comparison and equal to the model name of the first object of each couple if None).

  • availability_ratio (float) – Minimum required rate of available data over the period to calculate the scores (only used if score_type is ‘temporal’).

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required per datetime to compute the scores (only used if score_type is ‘spatial’).

  • annotation (str or None) – Additional information to write in figure’s upper left corner.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_station_score_box(objects, score, forecast_day=0, availability_ratio=0.75, labels=None, colors=None, title='', nb_stations=False, showfliers=True, annotation=None, output_file=None, file_formats=['png'], fig=None, ax=None)#

Plot distribution of the score values in a boxplot.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • score (str) – Computed score.

  • forecast_day (int.) – Integer corresponding to the chosen forecast day used for plotting.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast term to compute a temporal score.

  • labels (list of str) – List of labels for the legend.

  • colors (None or list of str) – Line colors corresponding to each object.

  • title (str) – Title for the plot.

  • nb_stations (bool) – If True, the number of stations used to draw the boxes is displayed in the legend.

  • showfliers (bool) – Show the outliers beyond the caps.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_station_score_density(objects, score, forecast_day=0, availability_ratio=0.75, labels=None, colors=None, linestyles=None, title='', nb_stations=False, annotation=None, output_file=None, file_formats=['png'], fig=None, ax=None)#

Plot the probability density function of score values.

This function is based on kernel density estimation.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • score (str) – Computed score.

  • forecast_day (int.) – Integer corresponding to the chosen forecast day used for plotting.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast term to compute a temporal score.

  • labels (list of str) – List of labels for the legend.

  • colors (None or list of str) – Line colors corresponding to each object.

  • linestyles (None or list of str) – Line styles corresponding to each object.

  • title (str) – Title for the plot.

  • nb_stations (bool) – If True, write the number of stations in the legend.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_station_scores(obj, score, ref=None, forecast_day=0, output_file=None, title='', bbox=None, file_formats=['png'], point_size=5, higher_above=True, order_by=None, availability_ratio=0.75, vmin=None, vmax=None, vcenter=None, cmap=None, norm=None, rivers=False, output_csv=None, interp2d=False, sea_mask=False, land_mask=False, bnd_resolution='50m', cmaplabel='', extend='neither', land_color='none', sea_color='none', marker='o', mark_by=None, boundaries_above=False, bbox_inches='tight', grid_resolution=None, ne_features=None, fig=None, ax=None)#

Plot scores per station on a map.

This function is based on Evaluator.stationScores.

Each station is drawn with a circle colored according to it score value.

Parameters:
  • obj (evaltools.Evaluator object) – Object used for plotting.

  • score (str) – Computed score.

  • ref (None or evaltools.Evaluator object) – Reference object used for comparison. If provided, values plotted on the map will be de score difference between the main object and the reference object. For scores where the better is to be close to zero (like mean bias), the difference is calculated with absolute values.

  • forecast_day (int.) – Integer corresponding to the chosen forecast day used for plotting.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • title (str) – Title for the plot.

  • bbox (list of floats) – Bounding box of plotting area [min_lon, max_lon, min_lat, max_lat].

  • file_formats (list of str) – List of file extensions.

  • point_size (float) – Point size (as define in matplotlib.pyplot.scatter).

  • higher_above (bool) – If True, stations with higher score are plotted above. If False, stations with lower score are plotted above.

  • order_by (str) – Vertically order point according to this argument. It must be a column of the Evaluator object stations attribute.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast term to compute the mean scores.

  • vmin, vmax (None or scalar) – Min and max values for the legend colorbar. If None, the respective min and max of the score values are used.

  • vcenter (None or scalar) – If not None, matplotlib.colors.TwoSlopeNorm(vcenter, vmin, vmax) is used as the colorbar norm.

  • cmap (matplotlib.colors.Colormap object) – Colors used for plotting (default: matplotlib.cm.jet).

  • norm (matplotlib.colors.Normalize) – The Normalize instance scales the data values to the canonical colormap range [0, 1] for mapping to colors.

  • rivers (bool) – If True, rivers and lakes are drawn.

  • output_csv (str or None) – File where to save the data. The file name can contain {score} instead of the score name.

  • interp2d (bool) – If True, a 2D linear interplation is performed on scores values.

  • sea_mask (bool) – If True, scores ought to be drawn over sea are masked.

  • land_mask (bool) – If True, scores ought to be drawn over land are masked.

  • bnd_resolution (str) – Resolution of coastlines and boundary lines. It can be ‘10m’, ‘50m’ or ‘110m’.

  • cmaplabel (str) – Label for the colormap.

  • extend (str) – Chosen among ‘neither’, ‘both’, ‘min’ or ‘max’. If not ‘neither’, make pointed end(s) to colorbar for out-of-range values.

  • land_color, sea_color (str) – Land/sea colors.

  • marker (str) – The marker style (ignored if you pass a mark_by instance).

  • mark_by (1D array-like) – This argument allows to choose different markers for different station groups according to a variable of self.stations. It must be of length two. First element is the label of the column used to define the markers. Second element is a dictionary defining which marker to use for each possible values. Ex: (‘area’, {‘urb’: ‘s’, ‘rur’: ‘o’, ‘sub’: ‘^’})

  • boundaries_above (bool) – If True, boundaries and coast lines are drawn above score data.

  • bbox_inches (str or matplotlib.transforms.Bbox) – Bounding box in inches: only the given portion of the figure is saved. If ‘tight’, try to figure out the tight bbox of the figure.

  • grid_resolution (couple of scalars) – Couple of scalars corresponding to meridians and parallels spacing in degrees.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or cartopy.mpl.geoaxes.GeoAxes) – Axis to use for the plot. If None, a new axis is created.

  • ne_features (list of dicts) – Each dictionary contains arguments to instanciate a cartopy.feature.NaturalEarthFeature(…). E.g. [dict(category=’cultural’, name=’admin_1_states_provinces’, facecolor=’none’, linestyle=’:’),] will add states/departments/provinces. If this argument is provided, rivers, land_color and sea_color arguments will not be taken into account.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • cartopy.mpl.geoaxes.GeoAxes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_summary_bar_chart(objects_lol, forecast_day=0, averaging='mean', mean_obs=True, title='', labels_list=None, colors_list=None, groups_labels=None, output_file=None, file_formats=['png'], availability_ratio=0.75, adapt_size=0.25, ncol=2, subplots_adjust={'top': 0.85}, fig=None, ax=None)#

Synthetic plot of scores.

For each object, plots a bar for RMSE and lollipops for bias and correlation. Different groups of objects are separated by white space.

Scores are first computed for each measurement sites and then averaged.

Parameters:
  • objects_lol (list of lists of evaltools.Evaluator objects) – Evaluator objects used for plotting. One list per group.

  • forecast_day (int) – Integer corresponding to the chosen forecast day used for plotting.

  • averaging (str) – Type of score averaging choosen among ‘mean’ or ‘median’.

  • mean_obs (boolean) – Whether to represent mean obs concentration or not.

  • title (str) – Title for the figure.

  • labels_list (None or list of str) – List of labels for the legend.

  • colors_list (None or list of str) – Bar colors corresponding to each object.

  • groups_labels (None or list of str) – List of labels for the groups.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast term to compute the mean scores.

  • adapt_size (float) – Coefficient to increase or reduce subplots width.

  • ncol (int) – Number of columns in legend.

  • subplots_adjust (dict) – Keyword arguments passed to matplotlib.pyplot.subplots_adjust.

evaltools.plotting.plotting.plot_taylor_diagram(objects, forecast_day=0, norm=True, colors=None, markers=None, point_size=100, labels=None, title='', threshold=0.75, frame=False, crmse_levels=10, output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Taylor diagram.

This function is based on Evaluator.spatiotemporal_scores method. Pearson correlation and variance ratio are first computed from all data of a choosen forecast day (values for all station at all times are considered as a simple 1D array).

References

Karl E. Taylor (2001), “Summarizing multiple aspects of model performance in a single diagram”, JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 106.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • forecast_day (int) – Forecast day for which to use data to compute the Taylor diagram.

  • norm (bool) – If True, standard deviation and CRMSE are divided by the standard deviation of observations.

  • colors (None or list of str) – Marker colors corresponding to each object.

  • markers (None or list of str) – Marker shapes corresponding to each object.

  • point_size (float) – Point size (as define in matplotlib.pyplot.scatter).

  • labels (list of str) – List of labels for the legend.

  • title (str) – Diagram title.

  • threshold (int or float) – Minimal number (if type(threshold) is int) or minimal rate (if type(threshold) is float) of data available in both obs and sim required to compute the scores.

  • frame (bool) – If false, top and right figure boundaries are not drawn.

  • crmse_levels (int) – Number of CRMSE arcs of circle.

  • output_csv (str or None) – File where to save the data.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_time_scores(objects, score, term, hourly_timeseries=False, min_nb_sta=10, labels=None, colors=None, linestyles=None, markers=None, start_end=None, title='', score_name=None, black_axes=False, nb_of_minor_ticks=(2, 2), xticking='auto', date_format=None, outlier_thresh=None, output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot time scores at a chosen time for each forecast day.

This function is based on Evaluator.timeScores method. On each plots, there will be as many lines as there are objects the object list.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for plotting.

  • score (str) – Computed score.

  • term (int.) – Integer corresponding to the chosen term used for plotting. If the series type of the objects is ‘hourly’, it refers to the forecast time (for example between 0 and 95 if the forecast horizon is 4 days). If the series type of the objects is ‘daily’, it refers to the forcast day (for example between 0 and 3 if the forecast horizon is 4 days).

  • hourly_timeseries (bool) – If True, every time step are plotted. In this case, argument term refers to the forecast day. This argument is ignored if the series type of the objects is ‘daily’.

  • min_nb_sta (int) – Minimal number of station values available in both obs and sim required per datetime to compute the scores.

  • labels (None or list of str) – List of labels for the legend.

  • colors (None or list of str) – Line colors corresponding to each object.

  • linestyles (None or list of str) – Line styles corresponding to each object.

  • markers (None or list of str) – Line markers corresponding to each object.

  • title (str) – Title for the plot. It can contain {score} instead of the score name.

  • score_name (str | dict | None) – String to write in the title instead of the default score name. If a dictionary is passed, it must have a key corresponding to the score argument. If None, the score name is written as in the score argument.

  • xticking (str) – Defines the method used to set x ticks. It can be ‘auto’ (automatic), ‘mondays’ (a tick every monday) or ‘daily’ (a tick everyday).

  • date_format (str) – String format for dates as understood by python module datetime.

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • outlier_thresh (scalar or None) – If not None, it correspond to the threshold used in evaltools.plotting._is_outlier to determine if a model is an outlier. If outliers are detected, y boudaries do not take them into account.

  • nb_of_minor_ticks (tuple of int) – Number of minor ticks for x and y axes.

  • start_end (None or list of two datetime.date objects) – Boundary dates for abscissa.

  • output_csv (str or None) – File where to save the data. The file name must contain {score} instead of the score name.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_time_series(objects, station_list=None, start_end=None, forecast_day=0, plot_type=None, envelope=False, min_nb_sta=1, colors=None, linestyles=None, markers=None, labels=None, obs_style=None, title='', ylabel='concentration', xticking='auto', date_format='%Y-%m-%d', ymin=None, ymax=None, black_axes=False, nb_of_minor_ticks=(2, 2), thresh=None, thresh_kw={}, output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot time series of obs and sim at a chosen forecast day.

By default, there is one line per station and per object. Observations are taken from the fisrt object of the list (if of type Evaluator).

Parameters:
  • objects (list of evaltools.Evaluator or evaltools.Simulations objects) – Evaluator or Simulations objects used for plotting.

  • station_list (None or list of str) – List of stations to display. If None, all stations of the first element of <objects> argument are processed.

  • start_end (None or list of two datetime.date objects) – Boundary dates for abscissa.

  • forecast_day (int) – Forecast day for which to plot the data.

  • plot_type (str) – If ‘median’ the median of all stations values for a given time is plotted. If ‘mean’ the mean of all stations values for a given time is plotted. For any other value, all station values are plotted separately.

  • envelope (bool) – Only use if plot_type is ‘median’ or ‘mean’. If True, draw quartiles one and three around the median curve (case plot_type == ‘median’) or draw +/- the standard deviation around the mean curve (case plot_type == ‘mean’).

  • min_nb_sta (int) – Minimal number of value required to compute the median or mean of all stations. Ingored if plot_type == None.

  • colors (None or list of str) – Line colors corresponding to each object.

  • linestyles (None or list of str) – Line styles corresponding to each object.

  • markers (None or list of str) – Line markers corresponding to each object.

  • labels (None or list of str) – List of labels for the legend.

  • obs_style (dict) – Dictionary of arguments passed to pyplot.plot for observation curve. Default value is {‘color’: ‘k’, ‘alpha’: 0.5}.

  • title (str) – Title for the plot.

  • ylabel (str) – Label for the y axis.

  • xticking (str) – Defines the method used to set x ticks. It can be ‘auto’ (automatic), ‘mondays’ (a tick every monday) or ‘daily’ (a tick everyday).

  • date_format (str) – String format for dates as understood by python module datetime.

  • ymin, ymax (None or scalar) – Limits of the axes.

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • nb_of_minor_ticks (tuple of int) – Number of minor ticks for x and y axes.

  • thresh (None or float) – If not None, a horizontal line y = val is drawn (only if the line would appear inside the y axis limits).

  • thresh_kw (dict) – Additional keyword arguments passed to matplotlib when drawing the threshold line (only used if thresh argument is not None) e.g. {‘color’: ‘#FFA600’}.

  • output_csv (str or None) – File where to save the data. The file name must contain {model} instead of the model name.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.plotting.plotting.plot_values_scatter_plot(obj, station_list=None, start_end=None, forecast_day=0, title='', xlabel='observations', ylabel=None, black_axes=False, color_by=None, group_by=None, xmin=None, xmax=None, ymin=None, ymax=None, output_csv=None, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Scatter plot to compare directly observations and simulations.

By default, points are colored according to the density of points.

Parameters:
  • obj (evaltools.Evaluator object) – Object used for plotting.

  • station_list (None or list of str) – List of stations to display. If None, all stations of the first element of <objects> argument are processed.

  • start_end (None or list of two datetime.date objects) – Boundary dates used to select only data for a sub-period.

  • forecast_day (int) – Integer corresponding to the chosen forecast day used for plotting.

  • title (str) – Title for the plots. It must contain {score} instead of the score name.

  • xlabel, ylabel (str) – Labels for x and y axes.

  • black_axes (bool) – If true, y=0 and x=0 lines are painted in black.

  • color_by (None or dictionary) – Dictionary with keys corresponding to station names and values corresponding to colors.

  • group_by (None or str) – If equal to ‘time’, the median of all stations is displayed for each time. If equal to ‘station’, the median of all times is displayed for each station. Otherwise, one point is plotted for each time of each station.

  • xmin, xmax, ymin, ymax (None or scalar) – Limits of the axes.

  • output_csv (str or None) – File where to save the data.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.tables module#

This module gathers functions designed to compute tables of scores.

evaltools.tables.average_scores(objects, forecast_day, score_list, score_type='temporal', averaging='median', title=None, labels=None, availability_ratio=0.75, min_nb_sta=10, output_file=None, output_latex=None, float_format=None)#

Build a table with average values of temporal or spatial scores.

This function is based on Evaluator.temporal_scores and Evaluator.spatial_scores methods. Scores are first computed for each station (score_type = ‘temporal’) or for each time (score_type = ‘spatial’). Then, the median or mean of these scores is taken.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for computation.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

  • score_list (list of str) – List of computed scores.

  • score_type (str) – Computing method selected among ‘temporal’ or ‘spatial’.

  • averaging (str) – Type of score averaging selected from ‘mean’ or ‘median’.

  • title (str or None) – Title to add in the output file.

  • labels (None or list of str) – Labels of the objects.

  • availability_ratio (float) – Minimal rate of data available on the period required to compute the temporal scores (if score_type = ‘temporal’) or to compute the temporal average (if score_type = ‘spatial’).

  • min_nb_sta (int) – Minimal number of stations required to compute the spatial average (if score_type = ‘temporal’) or to compute the spatial score (if score_type = ‘spatial’).

  • output_file (str) – File where to save the table. If None, the table is printed.

  • output_latex (str or None) – If not None, file where to save the table in a LaTeX layout.

  • float_format (str) – String format for table values.

evaltools.tables.exceedancesScores(objects, forecast_day, thresholds, score_list=None, title=None, output_file=None, output_latex=None, labels=None, float_format=None, start_end=None)#

Contingency table.

Tables corresponding to the different thresholds are stored in the same file. Table values are computed with evaltools.scores.exceedances_scores function.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for computation.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

  • thresholds (list of scalar) – Threshold values.

  • score_list (list of str) – List of scores to display. If None, all available scores are displayed. Available scores are

    • ‘accuracy’ : Accuracy

    • ‘bias_score’ : Bias score

    • ‘success_ratio’ : Success ratio

    • ‘hit_rate’ : probability of detection (Hit rate)

    • ‘false_alarm_ratio’ : false alarm ratio

    • ‘prob_false_detect’ : probability of false detection

    • ‘threat_score’ : Threat Score

    • ‘equitable_ts’ : Equitable Threat Score

    • ‘peirce_ss’ : Peirce Skill Score (Hanssen and Kuipers discriminant)

    • ‘heidke_ss’ : Heidke Skill Score

    • ‘rousseau_ss’ : Rousseau Skill Score

    • ‘odds_ratio’ : Odds Ratio

    • ‘odds_ratio_ss’ : Odds Ratio Skill Score

  • title (str or None) – Title to add in the output file.

  • output_file (str or None) – File where to save the tables. If None, the tables are printed.

  • output_latex (str or None) – If not None, file where to save the tables in a LaTeX layout.

  • labels (None or list of str) – Labels of the objects.

  • float_format (str) – String format for table values.

  • start_end (None or list of two datetime.date objects) – Boundary dates used to select only data for a sub-period.

evaltools.tables.exceedances_scores(objects, forecast_day, thresholds, score_list=None, title=None, output_file=None, output_latex=None, labels=None, float_format=None, start_end=None)#

Contingency table.

Tables corresponding to the different thresholds are stored in the same file. Table values are computed with evaltools.scores.exceedances_scores function.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for computation.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

  • thresholds (list of scalar) – Threshold values.

  • score_list (list of str) – List of scores to display. If None, all available scores are displayed. Available scores are

    • ‘accuracy’ : Accuracy

    • ‘bias_score’ : Bias score

    • ‘success_ratio’ : Success ratio

    • ‘hit_rate’ : probability of detection (Hit rate)

    • ‘false_alarm_ratio’ : false alarm ratio

    • ‘prob_false_detect’ : probability of false detection

    • ‘threat_score’ : Threat Score

    • ‘equitable_ts’ : Equitable Threat Score

    • ‘peirce_ss’ : Peirce Skill Score (Hanssen and Kuipers discriminant)

    • ‘heidke_ss’ : Heidke Skill Score

    • ‘rousseau_ss’ : Rousseau Skill Score

    • ‘odds_ratio’ : Odds Ratio

    • ‘odds_ratio_ss’ : Odds Ratio Skill Score

  • title (str or None) – Title to add in the output file.

  • output_file (str or None) – File where to save the tables. If None, the tables are printed.

  • output_latex (str or None) – If not None, file where to save the tables in a LaTeX layout.

  • labels (None or list of str) – Labels of the objects.

  • float_format (str) – String format for table values.

  • start_end (None or list of two datetime.date objects) – Boundary dates used to select only data for a sub-period.

evaltools.tables.medianStationScores(objects, forecast_day, score_list, title=None, output_file=None, output_latex=None, labels=None, availability_ratio=0.75, min_nb_sta=10, float_format=None)#

Build a table with median values of station scores.

This function is based on Evaluator.stationScores method. Scores are first computed for each station at the wanted forecast day. Then, the median of these scores is taken.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for computation.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

  • score_list (list of str) – List of computed scores.

  • title (str or None) – Title to add in the output file.

  • output_file (str) – File where to save the table. If None, the table is printed.

  • output_latex (str or None) – If not None, file where to save the table in a LaTeX layout.

  • labels (None or list of str) – Labels of the objects.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast time to compute the scores for each station.

  • min_nb_sta (int) – Minimal number of stations required to compute the median of the scores.

  • float_format (str) – String format for table values.

evaltools.tables.median_station_scores(objects, forecast_day, score_list, title=None, output_file=None, output_latex=None, labels=None, availability_ratio=0.75, min_nb_sta=10, float_format=None)#

Build a table with median values of station scores.

This function is based on Evaluator.stationScores method. Scores are first computed for each station at the wanted forecast day. Then, the median of these scores is taken.

Parameters:
  • objects (list of evaltools.Evaluator objects) – Evaluator objects used for computation.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

  • score_list (list of str) – List of computed scores.

  • title (str or None) – Title to add in the output file.

  • output_file (str) – File where to save the table. If None, the table is printed.

  • output_latex (str or None) – If not None, file where to save the table in a LaTeX layout.

  • labels (None or list of str) – Labels of the objects.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast time to compute the scores for each station.

  • min_nb_sta (int) – Minimal number of stations required to compute the median of the scores.

  • float_format (str) – String format for table values.

evaltools.quarter module#

This module defines the Quarter class.

class evaltools.quarter.Quarter(start_date, end_date)#

Bases: object

Year quarter.

Class of objects defining year quarters (used in Evaluator.quarterlyMedianScore and plotting.plot_quarterlyMedianScore) Quarters can be cut in a classic way (the first quarter begins in January) or in the climatology way (the first quarter begins in March).

property endDate#

Deprecated.

classmethod from_string(string)#

Construct a Quarter from its string representation.

Parameters:

string (str) – String representation of the Quarter to construct.

range(start_quarter)#

Consecutive quarters list.

Parameters:

start_quarter (Quarter object) – First quarter to insert in the list.

Returns:

list of quarter.Quarter – List of consecutive quarters ranging from start_quarter to the current object.

property startDate#

Deprecated.

evaltools.timeseries module#

This module gathers time series processing functions.

evaltools.timeseries.check_timeseries_equality(file1, file2, start_date=None, end_date=None, significant=5, verbose=False)#

Check integrity of a timeseries file.

Parameters:
  • file1 (str) – Path of the first file to read.

  • file2 (str) – Path of the second file to read.

  • end_date (DateTime object) – The date when to end comparing the timeseries. If None, the maximum of the timeseries ending date is taken.

  • significant (int) – Number of significant digits to take into account when comparing to values.

  • start_date (DateTime object) – The date when to start comparing the timeseries. If None, the maximum of the timeseries starting date is taken.

  • verbose (bool) – If set to True, info is printed when a difference is found.

Returns:

bool – Boolean set to True if the timeseries are equal or set to False if not.

evaltools.timeseries.check_timeseries_integrity(file_path, verbose=False, correction=False)#

Check integrity of a timeseries file.

Parameters:
  • file_path (str) – Path of the file to read.

  • verbose (bool) – If set to True, info is printed about the ts.

  • correction (bool) – If set to True, the ts is corrected and the original file is overwritten.

Returns:

  • ok (bool) – Boolean set to True if the timeseries is in a correct format or set to False if not.

  • f (pandas.DataFrame) – Corrected timeseries

evaltools.timeseries.daily_max(df, availability_ratio=0.75)#

Compute daily maximum.

Parameters:
  • df (pandas.DataFrame) – DataFrame with columns corresponding to stations and with datetime index in such way that there are 24 rows for each day.

  • availability_ratio (float) – Minimal rate of data available in a day required to compute the daily maximum.

Returns:

pandas.DataFrame – DataFrame with columns corresponding to stations and with date index.

evaltools.timeseries.daily_mean(df, availability_ratio=0.75)#

Compute daily mean.

Parameters:
  • df (pandas.DataFrame) – DataFrame with columns corresponding to stations and with datetime index in such way that there are 24 rows for each day.

  • availability_ratio (float) – Minimal rate of data available in a day required to compute the daily mean.

Returns:

pandas.DataFrame – DataFrame with columns corresponding to stations and with date index.

evaltools.timeseries.filtered_series(df, availability_ratio=0.75)#

Substract daily cycle to the original series.

The daily cycle is defined as the mean along all days for a given hour.

Parameters:
  • df (pandas.DataFrame) – DataFrame with columns corresponding to stations and with datetime index in such way that there are 24 rows for each day.

  • availability_ratio (float) – Minimal rate of values available for a given hour to compute the daily filtered series for this hour in all days.

Returns:

pandas.DataFrame with same shape as df.

evaltools.timeseries.get_DF(stations_idx, generic_file_path, start_date, end_date, lag, correc_unit=1, series_type='hourly', keep='last', step=1)#

Collect data from timeseries files for specified stations.

Parameters:
  • stations_idx (list of str) – List containing the names of studied stations.

  • generic_file_path (str) – Generic path of timeseriesfiles with {year} instead of the year number and {station} instead of the station name.

  • start_date (datetime object) – The date from which data is collected.

  • end_date (datetime object) – The data until which data is collected.

  • lag (int) – Number of days with which the period start_date->end_date is shifted.

  • correc_unit (float) – Multiplicative factor applied to original values.

  • series_type (str) – Must be equal to ‘hourly’ if time series files contain one row per hour, or equal to ‘daily’ if they contain one row per day.

  • keep ({‘first’, ‘last’, False}) – Adopted behavior when duplicated times are found in a file. - ‘first’ : keep the first occurrence only. - ‘last’ : keep the last occurrence only. - False : drop every occurences.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

pandas.DataFrame – DataFrame with datetime index and one column per station (stations which corresponding file is not found are dropped).

evaltools.timeseries.get_df(stations_idx, generic_file_path, start_date, end_date, lag, correc_unit=1, series_type='hourly', keep='last', step=1)#

Collect data from timeseries files for specified stations.

Parameters:
  • stations_idx (list of str) – List containing the names of studied stations.

  • generic_file_path (str) – Generic path of timeseriesfiles with {year} instead of the year number and {station} instead of the station name.

  • start_date (datetime object) – The date from which data is collected.

  • end_date (datetime object) – The data until which data is collected.

  • lag (int) – Number of days with which the period start_date->end_date is shifted.

  • correc_unit (float) – Multiplicative factor applied to original values.

  • series_type (str) – Must be equal to ‘hourly’ if time series files contain one row per hour, or equal to ‘daily’ if they contain one row per day.

  • keep ({‘first’, ‘last’, False}) – Adopted behavior when duplicated times are found in a file. - ‘first’ : keep the first occurrence only. - ‘last’ : keep the last occurrence only. - False : drop every occurences.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

pandas.DataFrame – DataFrame with datetime index and one column per station (stations which corresponding file is not found are dropped).

evaltools.timeseries.moving_average(x, n, availability_ratio=0.75)#

Compute the moving average with window size n on a vector x.

Parameters:
  • x (1D numpy.ndarray) – Vector used for computing.

  • n (int) – Window size of the moving average.

  • availability_ratio (float) – Minimal rate of values available to compute the average for a given window.

Returns:

numpy.ndarray of length len(x)-n+1.

evaltools.timeseries.moving_average_daily_max(df, availability_ratio=0.75)#

Compute the daily maximum eight hourly average.

The maximum daily 8-hour mean concentration shall be selected by examining 8-hour running averages, calculated from hourly data and updated each hour. Each 8-hour average so calculated shall be assigned to the day on which it ends, that is, the first calculation period for any one day shall be the period from 17:00 on the previous day to 01:00 on that day; thelast calculation period for any one day will be the period from 16:00 to 24:00 on the day.

As we need data from the day before to compute the daily maximum eight hourly average, the resulting DataFrame will contain one day less than the input one.

Parameters:
  • df (pandas.DataFrame) – DataFrame with columns corresponding to stations and with datetime index in such way that there are 24 rows for each day.

  • availability_ratio (float) – Minimal rate of values available to compute the average for a given 8-hour window, and also minimal rate of values available for a given day to compute the daily maximum of the moving average. For example with availability_ratio=0.75, a daily maximum eight hours average can only be calculated if 18 eight hours average are available each of which requires 6 hourly values to be available.

Returns:

pandas.DataFrame – DataFrame with columns corresponding to stations and with date index.

evaltools.timeseries.normalized_series(df)#

Normalize series by substracting the median and dividing by Q3-Q1.

Parameters:

df (pandas.DataFrame) – DataFrame with columns corresponding to stations and with datetime index.

Returns:

pandas.DataFrame with same shape as df.

evaltools.timeseries.readbigtimeseries(start_date, end_date, lag, fichier, series_type, stations, correc_unit=1)#

Collect data from big timeseries files.

Big timeseries files are text files where each lines contains a station code, a time (yyyy-mm-dd_hh) and a value (separated by a space).

Parameters:
  • start_date (datetime object) – The date from which data is collected.

  • end_date (datetime object) – The data until which data is collected.

  • lag (int) – Number of days with which the period start_date->end_date is shifted.

  • fichier (str) – Path to the file to read.

  • correc_unit (float) – Multiplicative factor applied to original values.

  • series_type (str) – Must be equal to ‘hourly’ if time series files contain one row per hour, or equal to ‘daily’ if they contain one row per day.

  • stations (list of str) – Stations to keep.

Returns:

pandas.DataFrame – DataFrame with datetime index and one column containing read values.

evaltools.timeseries.readtimeseries(start_date, end_date, lag, lfiles, correc_unit=1, series_type='hourly', keep='last', step=1)#

Collect data from timeseries files.

Timeseries files are text files where each lines contains a time (yyyymmddhh) or a date (yyyymmdd) and a value separated by spaces.

Parameters:
  • start_date (datetime object) – The date from which data is collected.

  • end_date (datetime object) – The data until which data is collected.

  • lag (int) – Number of days with which the period start_date->end_date is shifted.

  • lfiles (list of str) – List containing the paths of files to read.

  • correc_unit (float) – Multiplicative factor applied to original values.

  • series_type (str) – Must be equal to ‘hourly’ if time series files contain one row per hour, or equal to ‘daily’ if they contain one row per day.

  • keep ({‘first’, ‘last’, False}) – Adopted behavior when duplicated times are found in a file. - ‘first’ : keep the first occurrence only. - ‘last’ : keep the last occurrence only. - False : drop every occurences.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

pandas.DataFrame – DataFrame with datetime index and one column containing read values.

evaltools.utils module#

This module gathers ancillary functions.

evaltools.utils.get_params()#

Get console arguments and read configuration file.

One of the console arguments must be: cfg=<configuration file path>. Configuration file must be structured like:

[cat1]
var1=...
var2=...
...
[cat2]
...

The function will then returns the dictionary:

{'console': {'cfg': <configuration file path>, 'arg1': ..., ...}
 'cat1': {'var1': ..., 'var2': ..., ...},
 'cat2': {'var1': ..., 'var2': ..., ...},
 ...}
evaltools.utils.read_listing(listing_path, classes='all', species=None, types='all', area_coord=None, sub_list=None, sep='\\s+', keep_all_cols=True, decimal='.', filters=None)#

Read station list file.

The input file must be a text file with one or mor spaces as data separator. The first row is interpreted as the header and must contain ‘code’, ‘lat’, ‘lon’ (‘type’, ‘area’ if a screening is performed on the station type, and ‘class’ or the species name if a screening is performed on the class). The first column must contain station codes.

Parameters:
  • listing_path (str) – Listing directory.

  • classes (str) – If not ‘all’, stations are filtered by their class number. For example, specify classes=”1-2-4” if you want to keep stations classed as 1, 2 or 4 only.

  • species (str) – Only used if classes != ‘all’, it corresponds to the class column name in the listing.

  • types (‘all’ or list of tuple types) – For example types=[(‘bac’,’urb’), (‘ind’,’urb’), (‘tra’,’urb’)]. The first element of a tuple corresponds to the type column of the listing and the second one corresponds to the area column.

  • area_coord (list) – List of the form [min_longitude, max_longitude, min_latitude, max_latitude] corresponding to the bounding box of the studied area. A station located exactly on the boundary will be accepted.

  • sub_list (None or 1D array of str) – List of station to keep.

  • sep (str) – Separator used in the listing file.

  • keep_all_cols (bool) – If True, all columns of the listing files are kept in the returned dataframe. If False, only columns used for screening are kept.

  • decimal (str) – Character to recognize as decimal point.

  • filters (dict) – Dictionary with keys corresponding to one or several columns of the listing. The values of the dictionary are Boolean functions applied to the series corresponding to their key and rows with False result are discarded. The values can also be lists, in this case, the bolean function will be x -> True if x is in the list else False.

Returns:

pandas.DataFrame – DataFrame containing ‘code’, ‘lat’ and ‘lon’ of the filtered stations; ‘type’, ‘area’ if a screening is performed on the station type; and also their class if classes is not None.

evaltools.dataset module#

This module defines Dataset and Store classes.

Dataset is designed to process input data from several formats. Store is designed to store time series values in netcdf format.

class evaltools.dataset.Dataset(stations, start_date, end_date, species='', series_type='hourly', step=1)#

Bases: object

Dataset class for evaltools input data.

This class is based on pandas.DataFrame class. The main attribute (data) is a pandas DataFrame with datetime index and a list of stations as columns.

addNewStations(stations)#

Add new stations to the Dataset.

Add new stations to the Dataset if they are not already present. Modify Dataset object in place.

Parameters:

stations (1D array-like) – List of stations to add.

add_new_stations(stations)#

Add new stations to the Dataset.

Add new stations to the Dataset if they are not already present. Modify Dataset object in place.

Parameters:

stations (1D array-like) – List of stations to add.

add_stations_metadata(listing_path, **kwargs)#

Read station metadata.

The first column of the listing file must contain the station codes. Metadata is saved in the attribute self.stations.

Parameters:
  • listing_path (str) – Path to the listing file containing metadata.

  • **kwargs – These parameters (like ‘sep’, ‘sub_list’, …) will be passed to evaltools.utils.read_listing().

check_threshold(threshold, drop=False, file_path=None)#

Check if values exceed a threshold.

If there are values above the threshold, a message is printed and these values are set to nan if drop == True.

Parameters:
  • threshold (scalar) – Threshold value.

  • drop (bool) – If True, values above the threshold are set to nan.

  • file_path (None or str) – File path where to save the names of stations that exceed the threshold.

property date_format#

Get the date format according the series type of the data.

drop_unrepresentative_stations(availability_ratio=0.75, drop=True)#

List stations with a certain rate of missing values.

Parameters:
  • availability_ratio (float or None) – Minimal rate of data available on the period required to keep a station. If None, stations with only nan values are dropped.

  • drop (bool) – If True, the dataset is modify inplace by dropping unrepresentative stations.

Returns:

List of stations that do not fulfill the condition.

property endDate#

Deprecated.

property metadata#

Get the metadata.

nan_rate()#

Compute rate of missing values in the dataframe.

nb_values_timeseries()#

Plot number of not nan values.

property seriesType#

Deprecated.

property startDate#

Deprecated.

subPeriod(start_date, end_date)#

Build a new Dataset object define on a shorter period.

Parameters:
  • start_date (datetime.date) – Starting date of the new object.

  • end_date (datetime.date) – Ending date of the new object.

Returns:

Dataset object.

sub_period(start_date, end_date)#

Build a new Dataset object define on a shorter period.

Parameters:
  • start_date (datetime.date) – Starting date of the new object.

  • end_date (datetime.date) – Ending date of the new object.

Returns:

Dataset object.

summary()#

Print summary statistics on the data.

Nan rate, minimum and maximum are computed for each station. Then, the minimum, maximum and median of each of these statistics are displayed.

to_netcdf(file_path, var_name=None, group=None, dim_names={}, coord_var_names={}, metadata_variables=[], **kwargs)#

Write data in netcdf format.

Parameters:
  • file_path (str) – Path of the output file. If the file does not already exist, it is created.

  • var_name (str) – Name of the variable to create.

  • group (None or str) – Netcdf group where to store within the netcdf file. If equal to None, the root group is used.

  • dim_names (dict) – Use to specify dimension names of the netcdf file. Default names are {‘time’: ‘time’, ‘station_id’: ‘station_id’}.

  • coord_var_names (dict) – Use to specify coordinate variable names of the netcdf file. Default names are {‘time’: ‘time’, ‘station_id’: ‘station_id’}.

  • metadata_variables (list of str) – List of metadata variables of self.metadata to add in the netcdf.

  • kwargs (dict) – Additional keyword arguments passed to Store.new_concentration_var().

to_txt(output_path)#

Write one txt file per station.

Parameters:

output_path (str) – Path of the output file. The Path must contain {station} instead of the station code.

updateFromDataset(dataset, correc_unit=1)#

Update Dataset with values from an other one.

Update nan values with values from an other Dataset or from a pandas.DataFrame.

Parameters:
  • dataset (evaltools.dataset.Dataset or pandas.DataFrame) – Dataset where to take data.

  • correc_unit (float) – Multiplicative factor applied to original values.

updateFromTimeSeries(generic_file_path, correc_unit=1)#

Update nan values of the current object with timeseries files.

Timeseries files are text files where each lines contains a time (yyyymmddhh) or a date (yyyymmdd) and a value separated by spaces.

Parameters:
  • generic_file_path (str) – Generic path of timeseriesfiles with {year} instead of the year number and {station} instead of the station name.

  • correc_unit (float) – Multiplicative factor applied to original values.

update_from_dataset(dataset, correc_unit=1)#

Update Dataset with values from an other one.

Update nan values with values from an other Dataset or from a pandas.DataFrame.

Parameters:
  • dataset (evaltools.dataset.Dataset or pandas.DataFrame) – Dataset where to take data.

  • correc_unit (float) – Multiplicative factor applied to original values.

update_from_time_series(generic_file_path, correc_unit=1)#

Update nan values of the current object with timeseries files.

Timeseries files are text files where each lines contains a time (yyyymmddhh) or a date (yyyymmdd) and a value separated by spaces.

Parameters:
  • generic_file_path (str) – Generic path of timeseriesfiles with {year} instead of the year number and {station} instead of the station name.

  • correc_unit (float) – Multiplicative factor applied to original values.

class evaltools.dataset.Store(file_path, group=None, read_only=False, dim_names={}, coord_var_names={}, series_type='hourly')#

Bases: object

Tool designed for storing time series data in netcdf format.

To be handle by this class, netcdf variables must be 2-dimensional: the first dimension corresponding to time and the second one to the diferent measurement sites.

add_metadata_var(name, values, attrs={})#

Add a new concentration variable to the netcdf group.

Parameters:
  • name (str) – Name of the variable to create.

  • values (1D array like) – Values of the metadata variable as a vector of size the number of stations stored in the netcdf group.

  • attrs (dict) – Dictionary of attributes for the new variable, (keys corresponding to name of the attributes and values to values of the attributes).

add_stations(new_stations)#

Add stations to the netcdf group.

New station codes must not already be present.

Parameters:

new_stations (list of str) – List of the codes of the new stations.

default_coord_var_names = {'station_id': 'station_id', 'time': 'time'}#
default_dim_names = {'station_id': 'station_id', 'time': 'time'}#
property endDate#

Deprecated.

get_dataset(name, dataset_name=None, start_date=None, end_date=None, stations=None, metadata_var={}, series_type='hourly', step=1, keep_dup_sta='first')#

Get data contained within a variable of the netcdf group.

Requested variable must be 2-dimensional: the first dimension corresponding to time and the second one to the diferent measurement sites.

Parameters:
  • name (str) – Name of the variable to retrieve.

  • dataset_name (str) – Species name given to the return dataset.

  • start_date (datetime.date object) – The date from which data is collected.

  • end_date (datetime.date object) – The date until which data is collected.

  • stations (None or list of str) – List of stations to keep in the returned dataset.

  • metadata_var (dict) – Dictionary that define metadata variables to get from de netcdf file. Keys of the provided dictionary are variable names as found in the file, and its values are variable names used for the returned dataset. These metadata variables must have one dimension only, corresponding to the station codes.

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

  • drop_duplicates (bool) – If True, stations with several entries are dropped.

  • keep_dup_sta ({‘first’, ‘last’, False}, default ‘first’) –

    Method to handle dropping duplicated stations:
    • ‘first’drop duplicates except for the first

      occurrence.

    • ‘last’drop duplicates except for the last

      occurrence.

    • False : drop all duplicates.

Returns:

evaltools.dataset.Dataset

get_station_ids()#

Return station codes found in the netcdf group.

get_times()#

Return time stepd found in the netcdf group.

property latitudes#

Get the sequence of latitudes in the netcdf file.

property longitudes#

Get the sequence of longitudes in the netcdf file.

new_concentration_var(name, attrs={}, zlib=False, least_significant_digit=None, complevel=4)#

Add a new concentration variable to the netcdf group.

Parameters:
  • name (str) – Name of the variable to create.

  • attrs (dict) – Dictionary of attributes for the new variable, (keys corresponding to name of the attributes and values to values of the attributes).

  • zlib (bool) – if True, data assigned to the Variable instance is compressed on disk.

  • least_significant_digit (int) – If specified, variable data will be truncated (quantized). In conjunction with zlib=True this produces ‘lossy’, but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using around(scaledata)/scale, where scale = 2*bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4).

  • complevel (int) – Level of zlib compression to use (1 is the fastest, but poorest compression, 9 is the slowest but best compression).

classmethod new_file(file_path, stations, start_date, end_date, group=None, dim_names={}, coord_var_names={}, series_type='hourly', step=1)#

Build a Store object by writing a new netcdf file.

If the file already exists, it is truncated.

Parameters:
  • file_path (str) – Path of the file to create.

  • stations (1D array of str) – List of stations in the returned Store object.

  • start_date (datetime.date) – Starting date of the file data.

  • end_date (datetime.date) – Ending date of the file data.

  • group (None or str) – Netcdf group where to store within the netcdf file. If equal to None, the root group is used.

  • dim_names (dict) – Use to specify dimension names of the netcdf file. Default names are {‘time’: ‘time’, ‘station_id’: ‘station_id’}.

  • coord_var_names (dict) – Use to specify coordinate variable names of the netcdf file. Default names are {‘time’: ‘time’, ‘station_id’: ‘station_id’}.

  • series_type (str) – It can be ‘hourly’ (values stored with a hourly timestep) or ‘daily’ (values stored with a daily timestep).

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

New Store object.

classmethod new_group(file_path, stations, start_date, end_date, group, dim_names={}, coord_var_names={}, series_type='hourly', step=1)#

Create a new group inside a netcdf file.

Parameters:
  • file_path (str) – Path of the file to create.

  • stations (1D array of str) – List of stations in the returned Store object.

  • start_date (datetime.date) – Starting date of the file data.

  • end_date (datetime.date) – Ending date of the file data.

  • group (None or str) – Netcdf group where to store within the netcdf file. If equal to None, the root group is used.

  • dim_names (dict) – Use to specify dimension names of the netcdf file. Default names are {‘time’: ‘time’, ‘station_id’: ‘station_id’}.

  • coord_var_names (dict) – Use to specify coordinate variable names of the netcdf file. Default names are {‘time’: ‘time’, ‘station_id’: ‘station_id’}.

  • series_type (str) – It can be ‘hourly’ (values stored with a hourly timestep) or ‘daily’ (values stored with a daily timestep).

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

New Store object.

property nsta#

Get the number of stations in the netcdf file.

property ntimes#

Get the number of times in the netcdf file.

property startDate#

Deprecated.

property stations#

Get the station list in the netcdf file.

property times#

Get the sequence of times in the netcdf file.

update(name, dataset, add_new_stations=False)#

Update a variable of the netcdf file with a Dataset object.

Modify a variable of the netcdf file using non-NA values from passed Dataset object.

Parameters:
  • name (str) – Name of the variable to update.

  • dataset (evaltools.dataset.Dataset object) – Dataset object containing new values to add to the file.

  • add_new_stations (bool) – If true, stations only present in dataset are also added to the netcdf file.

evaltools.dataset.delta_tool_formatting(dataset_dict, output_dir)#

Write csv compatible with the JRC-DeltaTool.

Parameters:
  • dataset_dict (dictionary of evt.dataset.Dataset) – Keys of the dictionary must be species known by DeltaTool.

  • output_dir (str) – Directory where to write the files (one file by measurement site).

evaltools.dataset.timeRange(start_date, end_date, series_type, step=1)#

Create a time range.

Parameters:
  • start_date (datetime.date object) – The date from which the range starts.

  • end_date (datetime.date object) – Ending date (included) of the range.

  • series_type (str.) – It can be ‘hourly’ (hourly timestep) or ‘daily’ (daily timestep).

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

List of datetime.date or datetime.datetime

evaltools.dataset.time_range(start_date, end_date, series_type, step=1)#

Create a time range.

Parameters:
  • start_date (datetime.date object) – The date from which the range starts.

  • end_date (datetime.date object) – Ending date (included) of the range.

  • series_type (str.) – It can be ‘hourly’ (hourly timestep) or ‘daily’ (daily timestep).

  • step (int) – Time step in hours (ignored if series_type == ‘daily’).

Returns:

List of datetime.date or datetime.datetime

evaltools.netcdf module#

This module gathers netcdf processing functions.

evaltools.netcdf.Simulations_fromNetCDF(generic_file_path, stations, species, model, start, end, forecast_horizon=1, correc_unit=1, series_type='hourly', date_format='%Y%m%d.%f', level=0, availability_ratio=0.25, fill_value=None, times_name='Times', lon_name='lon', lat_name='lat', common_grid=True, nb_ignore=0, nb_keep=0, time_delta=None)#

Construct a Simulations object from netcdf files.

Multiple datetimes across files are overwritten.

Parameters:
  • generic_file_path (str) – Generic path of timeseriesfiles with {forecastDay} instead of the forecast day number.

  • stations (pandas.DataFrame) – DataFrame with station names as index, and metadata variables as columns.

  • species (str) – Species name (ex: “o3”).

  • model (str) – Name of the model that produced the simulated data.

  • start (datetime.date) – Start day of the studied period.

  • end (datetime.date) – End day (included) of the studied period.

  • forecast_horizon (int) – Number of day corresponding to the forcast horizon of the model.

  • correc_unit (float) – Multiplicative factor applied to original values.

  • series_type (str) – It can be ‘hourly’ or ‘daily’.

  • date_format (str) – Format of dates contained in your netCDF files (strftime() reference). Special case : use ‘%Y%m%d.%f’ for floats of type 20160925.25, and ‘%H’ for times of type “hours/days since yyyy-mm-dd” OR if units and calendar attributes are set (automatic conversion).

  • level (int or False) – Indicates which level (altitude) has to be taken. Use False if there are no levels in the files.

  • availability_ratio (float or None or False) – Minimal rate of data available on the period required to keep a station. If None, stations with only nan values are dropped.

  • fill_value (scalar or any valid FillValue from netCDF) – Indicates which value is used as NaN in netCDF files. Often equals -999.

  • times_name, lon_name, lat_name (str) – Names of the variables corresponding to dates/times, longitude and latitude.

  • common_grid (boolean) – Indicates if all provided files share a common grid or not. If True, processing will be slightly faster.

  • nb_ignore (int) – For each set of datetimes in files, number of hours/dates to ignore before fetching data. e.g. if your files contain 96h of data but you only want values starting from second days (ie from the 25th hour), nb_ignore should equal to 24.

  • nb_keep (int) – If nb_ignore is different from 0, nb_keep is used to indicate how many hours/dates of data should be kept, starting from nb_ignore. Any value <= 0 means all data from nb_ignore to end is kept.

  • time_delta (dict) – Dict of arguments passed to datetime.timedelta function, used to shift time values of netcdf file. Sometimes “cdo -showtimestamp” gives a different (and truest) times list than what may be computed based on “ncdump -v times”.

Returns:

evaltools.evaluator.Simulations object.

evaltools.netcdf.get_df(species, stations, generic_file_path, start_date, end_date, lag, correc_unit=1, series_type='hourly', date_format='%Y%m%d.%f', level=0, times_name='Times', lon_name='lon', lat_name='lat', common_grid=True, ign_keep=(0, 0), time_delta=None)#

Collect data from netCDF files for specified stations.

Parameters:
  • species (str) – Name of the variable to read in netCDF files.

  • stations (pandas.DataFrame) – DataFrame with station names as index, and metadata variables as columns.

  • generic_file_path (str) – Generic path of netCDF files. * can be used as a joker character.

  • start_date (datetime object) – The date from which data is collected.

  • end_date (datetime object) – The data until which data is collected.

  • lag (int) – Number of days with which the period start_date->end_date is shifted.

  • correc_unit (float) – Multiplicative factor applied to original values.

  • series_type (str) – Must be equal to ‘hourly’ if time series files contain one row per hour, or equal to ‘daily’ if they contain one row per day.

  • date_format (str) – Format of dates contained in your netCDF files (strftime() reference). Special cases : use ‘%Y%m%d.%f’ for floats of type 20160925.25, and ‘%H’ for times of type “hours/days since yyyy-mm-dd” OR if units and calendar attributes are set (automatic conversion).

  • level (int or False) – Indicates which level (altitude) has to be taken. Use False if there are no levels in the files.

  • times_name, lon_name, lat_name (str) – Names of the variables corresponding to dates/times, longitude and latitude.

  • common_grid (boolean) – Indicates if all provided files share a common grid or not. If True, processing will be slightly faster.

  • ign_keep (tuple of int) – For each set of datetimes in files, first int is number of hours/dates to ignore before fetching data. Second int is number of hours/dates of data to keep after that. e.g. if your files contain 96h of data but you only want values from second and third days (ie 48 hours from the 25th to the 72th hour), ign_keep should equal to (24, 48).

  • time_delta (dict) – Dict of arguments passed to datetime.timedelta function, used to shift time values of netcdf file. Sometimes “cdo -showtimestamp” gives a different (and truest) times list than what may be computed based on “ncdump -v times”.

Returns:

pandas.DataFrame – DataFrame with datetime index and one column per station (stations which corresponding file is not found are dropped).

evaltools.netcdf.simulations_from_netCDF(generic_file_path, stations, species, model, start, end, forecast_horizon=1, correc_unit=1, series_type='hourly', date_format='%Y%m%d.%f', level=0, availability_ratio=0.25, fill_value=None, times_name='Times', lon_name='lon', lat_name='lat', common_grid=True, nb_ignore=0, nb_keep=0, time_delta=None)#

Construct a Simulations object from netcdf files.

Multiple datetimes across files are overwritten.

Parameters:
  • generic_file_path (str) – Generic path of timeseriesfiles with {forecastDay} instead of the forecast day number.

  • stations (pandas.DataFrame) – DataFrame with station names as index, and metadata variables as columns.

  • species (str) – Species name (ex: “o3”).

  • model (str) – Name of the model that produced the simulated data.

  • start (datetime.date) – Start day of the studied period.

  • end (datetime.date) – End day (included) of the studied period.

  • forecast_horizon (int) – Number of day corresponding to the forcast horizon of the model.

  • correc_unit (float) – Multiplicative factor applied to original values.

  • series_type (str) – It can be ‘hourly’ or ‘daily’.

  • date_format (str) – Format of dates contained in your netCDF files (strftime() reference). Special case : use ‘%Y%m%d.%f’ for floats of type 20160925.25, and ‘%H’ for times of type “hours/days since yyyy-mm-dd” OR if units and calendar attributes are set (automatic conversion).

  • level (int or False) – Indicates which level (altitude) has to be taken. Use False if there are no levels in the files.

  • availability_ratio (float or None or False) – Minimal rate of data available on the period required to keep a station. If None, stations with only nan values are dropped.

  • fill_value (scalar or any valid FillValue from netCDF) – Indicates which value is used as NaN in netCDF files. Often equals -999.

  • times_name, lon_name, lat_name (str) – Names of the variables corresponding to dates/times, longitude and latitude.

  • common_grid (boolean) – Indicates if all provided files share a common grid or not. If True, processing will be slightly faster.

  • nb_ignore (int) – For each set of datetimes in files, number of hours/dates to ignore before fetching data. e.g. if your files contain 96h of data but you only want values starting from second days (ie from the 25th hour), nb_ignore should equal to 24.

  • nb_keep (int) – If nb_ignore is different from 0, nb_keep is used to indicate how many hours/dates of data should be kept, starting from nb_ignore. Any value <= 0 means all data from nb_ignore to end is kept.

  • time_delta (dict) – Dict of arguments passed to datetime.timedelta function, used to shift time values of netcdf file. Sometimes “cdo -showtimestamp” gives a different (and truest) times list than what may be computed based on “ncdump -v times”.

Returns:

evaltools.evaluator.Simulations object.

evaltools.interpolation module#

This module defines Grid.

Grid is a class designed to interpolate values from a grid of values.

class evaltools.interpolation.Grid(min_lat, min_lon, d_lon, d_lat, nb_lon, nb_lat)#

Bases: object

Class designed to interpolate data from a lat/lon grid.

The grid must have equidistant latitudes/longitudes.

interpolate(use_same_listing=False)#

Interpolate values for all loaded grids.

Parameters:

use_same_listing (bool) – If False, for each self.grids keys, a list of stations for interpolation must have been submitted with Grid.load_station_lists method. If True, interpolation is done at every stations found in the listing file fed to Grid.load_stations_metadata.

Returns:

dictionary – Dictionary with one key per grid, containing pandas.Series of interpolated values.

load_station_lists(station_lists)#

Import station lists used for interpolation.

Parameters:

station_lists (dict of 1D arrays) – Dictionary which keys are names for the different grids (species, hours, …) and values are 1D arrays of station codes.

load_stations_metadata(listing_path, **kwargs)#

Load stations coordinate and find nearest grid points.

Retrieve a list of stations with their coordinates and find the four nearest grid points from each of them for a later use in the interpolation method.

The listing file is read with evaltools.utils.read_listing and therefore it must be in the correct format.

Parameters:
  • listing_path (str) – Path of the listing file.

  • **kwargs – These parameters (like ‘sep’, ‘sub_list’, …) will be passed to evaltools.utils.read_listing().

set_grids(grids)#

Set grids values from a dictionary of 2D arrays.

Parameters:

grids (dict of 2D arrays) – Dictionary which keys are are names for the different grids (species, hours, …), and values are 2D arrays of float with shape corresponding to (self.nb_lat, self.nb_lon).

view()#

Visualize grid values.

class evaltools.interpolation.Interpolator(lon, lat, coord_type)#

Bases: object

Class designed to interpolate data from any kind of lat/lon grid.

filter_stations(stations)#

Drop stations not contained in domain.

Check if nearest point is part of the domain’s border, and drop station if true.

Parameters:

stations (pandas.DataFrame) – DataFrame with station names as index, and metadata variables as columns.

find_four_nearest(station)#

Find all four nearest grid points of station.

Parameters:

station (str) – Station name. Must be in self.nearest_point.

find_nearest(point)#

Find nearest grid point.

Parameters:

point (tuple of two floats) – Coordinates (lon, lat) of the input point.

interpolate(values)#

Interpolate values on lon/lat grid at provided stations locations.

Parameters:

values (2D array) – Grid of values, corresponding to lon/lat grid.

Returns:

pandas.Series – pandas.Series of interpolated values.

load_stations_metadata(listing_path=None, stations=None, **kwargs)#

Load stations coordinate and find nearest grid points.

Retrieve a list of stations with their coordinates and find the four nearest grid points from each of them for a later use in the interpolation method.

The listing file is read with evaltools.utils.read_listing and therefore it must be in the correct format.

Parameters:
  • listing_path (str) – Path of the listing file.

  • stations (pandas.DataFrame) – DataFrame with station names as index, and metadata variables as columns.

  • **kwargs – Additional parameters (like ‘sep’, ‘sub_list’, …) to be passed to evaltools.utils.read_listing().

summary()#

Print a summary of the object.

evaltools.interpolation.area_triangle(p0, p1, p2)#

Compute the area between three points.

evaltools.interpolation.bilinear_interpolation(x, y, points, sort=True)#

Interpolate (x,y) from values associated with four points.

Compute a bilinear interpolation for an (x, y) coordinate located inside a rectangle defined by four points.

Parameters:
  • x, y (float) – Coordinates of the input point.

  • points (list of four float triplets) – Four points (x, y, value) forming a rectangle. The four points can be in any order.

evaltools.interpolation.find_nearest(obs_lat, obs_lon, max_lat, min_lat, max_lon, min_lon, d_lon, d_lat, nb_lon, invert_lat=True)#

Find the 4 nearest grid points for a given lat/lon.

The grid must have equidistant latitudes/longitudes.

Returned indices are corresponding to grid data stored in a 1D-array (ie the grid is flatten in row-major (C-style) order).

Parameters:
  • obs_lon (float) – Longitude of the input observation.

  • obs_lat (float) – Latitude of the input observation.

  • max_lat (float) – Minimal latitude of the grid.

  • min_lon (float) – Minimal longitude of the grid.

  • d_lat (float) – Latitude step width of the grid.

  • d_lon (float) – Longitude step width of the grid.

  • nb_lon (int) – Number of longitude steps in the grid (shape[1] of the grid).

  • invert_lat (bool) – Must be set to True if latitudes are stored in decreasing order in the grid.

Returns:

dictionary – Dictionary which keys are ‘latitudes’, ‘longitudes’ and ‘indices’, and values are list of four floats.

evaltools.interpolation.interpolation(x, y, points)#

Choose how to interpolate (x,y) according to grid type.

evaltools.interpolation.squared_dist(p1, p2)#

Compute the squared distance between two points.

evaltools.interpolation.triangular_interpolation(x, y, points)#

Interpolate (x,y) from values associated with four points.

Compute an interpolation based on triangles for a (x, y) coordinate located inside a quadrilateral defined by four points.

Parameters:
  • x, y (float) – Coordinates of the input point.

  • points (list of four float triplets) – Four points (x, y, value) forming a quadrilateral. First point must be the nearest neighbour.

evaltools.fairmode module#

This module is designed to compute some Fairmode metrics.

Documentation on the different metric can be found in the “FAIRMODE guidance document on modelling quality objectives and benchmarking”.

https://fairmode.jrc.ec.europa.eu/activity/ct2

evaltools.fairmode.fairmode_benchmark(self, target_file=None, summary_file=None, output_csv=None, availability_ratio=0.75, label=None, target_title=None, summary_title=None, color=None, file_formats=['png'], forecast_day=0, mark_by=None, indicative_color=False, output_indicators=None)#

Plot FAIRMODE target and summary diagrams.

Concentration values must be in µg/m^3. Supported species are ‘o3’, ‘no2’, ‘pm10’ and ‘pm2p5’.

Parameters:
  • target_file (str or None) – File where to save the target diagram (without extension). If None, the figure is shown in a popping window.

  • summary_file (str or None) – File where to save the summary diagram (without extension). If None, the figure is shown in a popping window.

  • output_csv (str or None) – File where to save the target data.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • label (str) – Label for the legend.

  • target_title (str) – Target diagram title.

  • summary_title (str) – Summary diagram title.

  • color (str)

  • file_formats (list of str) – List of file extensions.

  • forecast_day (int) – Forecast day used to compute the two diagrams.

  • mark_by (1D array-like) – This argument allows to choose different markers for different station groups according to a variable of self.stations. It must be of length two. First element is the label of the column used to define the markers. Second element is a dictionary defining which marker to use for each possible values. Ex: (‘area’, {‘urb’: ‘s’, ‘rur’: ‘o’, ‘sub’: ‘^’})

  • indicative_color (bool) – If True, legend labels for in the target plot are green if MQI90 < 1 and Y90 < 1 and else they are red.

  • output_indicators (str or None) – File where to save the mqi90 and MPCs.

evaltools.fairmode.mqi(self, threshold=0.75, forecast_day=0)#

Calculate the modelling quality indicator.

Parameters:
  • threshold (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

Returns:

pandas.Series – Series with index corresponding to the object stations and containing modelling quality incator for each station.

evaltools.fairmode.mqi90(self, threshold=0.75, forecast_day=0)#

Calculate the 90th percentile of modelling quality indicator values.

Parameters:
  • threshold (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

Returns:

float – 90th percentile of modelling quality incator.

evaltools.fairmode.plot_fairmode_summary(self, availability_ratio=0.75, forecast_day=0, title=None, label=None, return_mpc=False, write_categories=True, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Summary statistics diagram.

Assessement summary diagram as described in FAIRMODE guidance document on modelling quality objectives and benchmarking.

Parameters:
  • self (evaltools.Evaluator object) – Object used for plotting.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • forecast_day (int) – Forecast day used in the diagram.

  • title (str) – Diagram title.

  • label (str) – Label for the default title.

  • write_categories (bool) – If True, write “observations”, “time” and “space” on the left of the plot.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.fairmode.plot_forecast_target_diagram(obj, thr=None, availability_ratio=0.75, forecast_day=0, label=None, color=None, title=None, output_csv=None, list_stations_ouside_target=True, mark_by=None, indicative_color=False, return_mqi=False, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

FAIRMODE forecast target diagram.

Forecast forecast target diagram as described in FAIRMODE guidance document on modelling quality objectives and benchmarking.

Parameters:
  • obj (evaltools.Evaluator object) – Object used for plotting.

  • thr (scalar) – Threshold used to compute False Alarm (FA) and Missed Alarm (MA). If the FA/MA ratio is < 1 then the station point is in the negative portion of the x axis, and if FA/MA >= 1 the station point is in the positive portion.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • forecast_day (int) – Forecast day used in the diagram.

  • label (str) – Label for the legend.

  • color (None or str) – Point color.

  • title (str) – Diagram title.

  • output_csv (str or None) – File where to save the data. The File name must contain {model} instead of the model name (so that one file is written for each object showed on the graph).

  • list_stations_ouside_target (bool) – If True, codes of stations outside the target are written on the side of the graph. The option works when only one object is used for computation.

  • mark_by (1D array-like) – This argument allows to choose different markers for different station groups according to a variable of self.stations. It must be of length two. First element is the label of the column used to define the markers. Second element is a dictionary defining which marker to use for each possible values. Ex: (‘area’, {‘urb’: ‘s’, ‘rur’: ‘o’, ‘sub’: ‘^’})

  • indicative_color (bool) – If True, legend labels are green if MQI90 < 1 and Y90 < 1 and else they are red.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.fairmode.plot_target_diagram(obj, availability_ratio=0.75, forecast_day=0, label=None, color=None, title=None, output_csv=None, list_stations_ouside_target=True, mark_by=None, indicative_color=False, return_mqi=False, fig=None, ax=None, annotation=None, output_file=None, file_formats=['png'])#

Plot the assessment target diagram.

Assessement target diagram as described in FAIRMODE guidance document on modelling quality objectives and benchmarking.

Parameters:
  • obj (evaltools.Evaluator object) – Object used for plotting.

  • availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • forecast_day (int) – Forecast day used in the diagram.

  • label (str) – Label for the legend.

  • color (None or str) – Point color.

  • title (str) – Diagram title.

  • output_csv (str or None) – File where to save the data. The File name must contain {model} instead of the model name (so that one file is written for each object showed on the graph).

  • list_stations_ouside_target (bool) – If True, codes of stations outside the target are written on the side of the graph. The option works when only one object is used for computation.

  • mark_by (1D array-like) – This argument allows to choose different markers for different station groups according to a variable of obj.stations. It must be of length two. First element is the label of the column used to define the markers. Second element is a dictionary defining which marker to use for each possible values. Ex: (‘area’, {‘urb’: ‘s’, ‘rur’: ‘o’, ‘sub’: ‘^’})

  • indicative_color (bool) – If True, legend labels are green if MQI90 < 1 and Y90 < 1 and else they are red.

  • annotation (str or None) – Additional information to write in the upper left corner of the plot.

  • output_file (str or None) – File where to save the plots (without extension). If None, the figure is shown in a popping window.

  • file_formats (list of str) – List of file extensions.

  • fig (None or matplotlib.figure.Figure) – Figure to use for the plot. If None, a new figure is created.

  • ax (None or matplotlib.axes._axes.Axes) – Axis to use for the plot. If None, a new axis is created.

Returns:

  • matplotlib.figure.Figure – Figure object of the produced plot. Note that if the plot has been shown in the user interface window, the figure and the axis will not be usable again.

  • matplotlib.axes._axes.Axes – Axes object of the produced plot.

evaltools.fairmode.rmsu(self, threshold=0.75, forecast_day=0)#

Calculate the root mean square of measurement uncertainty.

Parameters:
  • threshold (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

Returns:

pandas.Series – Series with index corresponding to the object stations and containing root mean square of measurement uncertainty for each station.

evaltools.fairmode.set_fairmode_params(self, availability_ratio=0.75)#

Set Fairmode coefficients used to calculate the measurement uncertainty.

The coefficients are

thresholdscalar

Limit concentration value fixed by air quality policies.

Uscalar

$U^{95}_{95,r}$ as defined by FAIRMODE for measurement uncertainty calculation.

alphascalar

$alpha$ as defined by FAIRMODE for measurement uncertainty calculation.

RVscalar

Reference value as defined by FAIRMODE for measurement uncertainty calculation.

percscalar

Selected percentile value used in the calculation of FAIRMODE’s modeling perfomance criteria for high percentiles.

Np, Nnp :

Coefficients used to compute in FAIRMODE’s observation uncertainty for annual averages.

Parameters:

availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

evaltools.fairmode.y90(self, availability_ratio=0.75, forecast_day=0)#

Calculate the 90th percentile of MQIs for the average of model values.

The period over which to average the data should preferably be one year.

Parameters:
  • availability_ratio (float) – Minimal rate of data available on the period required per forecast day to compute the scores for each station.

  • forecast_day (int) – Forecast day corresponding to the data used in the calculation.

Returns:

float – 90th percentile of modelling quality incator for yearly average model results.