Tips#

Saving Evaluator objects#

Evaltools includes a convenient way to save Evaluator objects for later use. Let us create such an object as seen in tutorial:

In [1]: import evaltools as evt

In [2]: from datetime import date

# import stations with module utils
In [3]: stations = evt.utils.read_listing("./sample_data/listing")

In [4]: start_date = date(2017, 6, 1)

In [5]: end_date = date(2017, 6, 6)

# create an object of class Observations with module evaluator
In [6]: obs = evt.Observations.from_time_series(
   ...:     generic_file_path="./sample_data/observations/{year}_co_{station}",
   ...:     correc_unit=1e9,
   ...:     species='co',
   ...:     start=start_date,
   ...:     end=end_date,
   ...:     stations = stations,
   ...:     forecast_horizon=2,
   ...: )
   ...: 

# create an object of class Simulations with module evaluator
In [7]: sim = evt.Simulations.from_time_series(
   ...:     generic_file_path=(
   ...:         "./sample_data/ENSforecast/J{forecastDay}/{year}_co_{station}"
   ...:     ),
   ...:     stations_idx=stations.index,
   ...:     species='co',
   ...:     model='ENS',
   ...:     start=start_date,
   ...:     end=end_date,
   ...:     forecast_horizon=2,
   ...: )
   ...: 

# create an object of class Evaluator with module evaluator
In [8]: obj = evt.Evaluator(obs, sim)

In [9]: obj.summary()
Model: ENS
Species: co
Time step: 1 hour
Period: 20170601 - 20170606
Forecast horizon: 2
Color: k
Paths :
- Sim : ./sample_data/ENSforecast/J{forecastDay}/{year}_co_{station}
- Obs : ./sample_data/observations/{year}_co_{station}

Once we have an Evaluator object, it is possible to save it using evaluator.Evaluator.dump method.

In [10]: obj.dump('./sample_data/evaluatorObj.dump')

Once the file is created, it can be loaded anytime with evaluator.load function.

In [11]: obj2 = evt.load('./sample_data/evaluatorObj.dump')

In [12]: obj2.summary()
Model: ENS
Species: co
Time step: 1 hour
Period: 20170601 - 20170606
Forecast horizon: 2
Color: k
Paths :
- Sim : ./sample_data/ENSforecast/J{forecastDay}/{year}_co_{station}
- Obs : ./sample_data/observations/{year}_co_{station}

# objects do not have the same adress, they are considered different
In [13]: obj == obj2
Out[13]: False

# but attributes and data have the same values
In [14]: obj.stations.equals(obj2.stations)
Out[14]: True

In [15]: obj.obs_df.equals(obj2.obs_df)
Out[15]: True

In [16]: obj.sim_df[0].equals(obj2.sim_df[0])
Out[16]: True

In [17]: obj.sim_df[1].equals(obj2.sim_df[1])
Out[17]: True

Objects attributes#

Evaluator class lies on Observations and Simulations classes. Both of them use Dataset class, which mostly lies on Dataframes. Let us have an overview of the attributes of these classes:

Observations objects attributes:

In [18]: obs.species
Out[18]: 'co'

In [19]: obs.start_date
Out[19]: datetime.date(2017, 6, 1)

In [20]: obs.end_date
Out[20]: datetime.date(2017, 6, 6)

In [21]: obs.forecast_horizon
Out[21]: 2

In [22]: obs.series_type
Out[22]: 'hourly'

In [23]: obs.stations
Out[23]: 
        site area       lat       lon
code                                 
AD0942A  bac  urb  42.50969   1.53914
AT0VOR1  bac  rur  46.67970  12.97190
AT10001  bac  sub  47.84000  16.52670
AT31401  bac  sub  48.08610  16.30220
AT31402  tra  sub  48.12500  16.33170
CH0002R  bac  rur  46.81310   6.94447
CH0005A  bac  sub  47.40290   8.61341
CH0005R  bac  rur  47.06740   8.46334
CH0010A  bac  urb  47.37760   8.53042
CZ0ALIB  bac  sub  50.00730  14.44590
CZ0HHKB  tra  urb  50.19540  15.84640
CZ0JKOS  bac  rur  49.57340  15.08030
CZ0PPLA  tra  urb  49.73240  13.40230
CZ0TOPR  ind  urb  49.85630  18.26970

In [24]: obs.dataset
Out[24]: <evaltools.dataset.Dataset at 0x7fef69a2b320>

Simulations objects attributes:

In [25]: sim.species
Out[25]: 'co'

In [26]: sim.start_date
Out[26]: datetime.date(2017, 6, 1)

In [27]: sim.end_date
Out[27]: datetime.date(2017, 6, 6)

In [28]: sim.forecast_horizon
Out[28]: 2

In [29]: sim.series_type
Out[29]: 'hourly'

In [30]: sim.stations
Out[30]: 
Index(['AD0942A', 'AT0VOR1', 'AT10001', 'AT31401', 'AT31402', 'CH0002R',
       'CH0005A', 'CH0005R', 'CH0010A', 'CZ0ALIB', 'CZ0HHKB', 'CZ0JKOS',
       'CZ0PPLA', 'CZ0TOPR'],
      dtype='object', name='code')

In [31]: sim.model
Out[31]: 'ENS'

In [32]: sim.datasets
Out[32]: 
[<evaltools.dataset.Dataset at 0x7fef69a2b080>,
 <evaltools.dataset.Dataset at 0x7fef69b41fd0>]

Note

sim.datasets is a list of Dataset objects, one for each forecast day.

Dataset objects attributes:

In [33]: dt = obs.dataset

In [34]: dt.species
Out[34]: 'co'

In [35]: dt.start_date
Out[35]: datetime.date(2017, 6, 1)

In [36]: dt.end_date
Out[36]: datetime.date(2017, 6, 7)

In [37]: dt.nb_days
Out[37]: 7

In [38]: dt.series_type
Out[38]: 'hourly'

In [39]: dt.date_format
Out[39]: '%Y%m%d%H'

In [40]: type(dt.data)
Out[40]: pandas.core.frame.DataFrame

In [41]: dt.data
Out[41]: 
code                 AD0942A  AT0VOR1  AT10001  ...  CZ0JKOS  CZ0PPLA  CZ0TOPR
2017-06-01 00:00:00      NaN    52.42    51.67  ...    209.0    260.0    150.0
2017-06-01 01:00:00      NaN    51.94    66.99  ...      NaN      NaN      NaN
2017-06-01 02:00:00      NaN    52.91    48.73  ...    205.0    225.0    137.0
2017-06-01 03:00:00      NaN    51.51    33.35  ...    203.0    221.0      NaN
2017-06-01 04:00:00      NaN    50.15    33.40  ...    200.0    616.0    175.0
...                      ...      ...      ...  ...      ...      ...      ...
2017-06-07 19:00:00      NaN    63.87   255.80  ...     47.0    226.0     58.0
2017-06-07 20:00:00      NaN    64.79   241.32  ...     47.0    247.0      NaN
2017-06-07 21:00:00      NaN    63.82   229.71  ...    104.0    211.0     58.0
2017-06-07 22:00:00      NaN    60.28   201.00  ...     97.0    234.0    130.0
2017-06-07 23:00:00      NaN    61.20   195.26  ...     97.0    210.0    129.0

[168 rows x 14 columns]

Evaluator objects attributes:

In [42]: obj.species
Out[42]: 'co'

In [43]: obj.start_date
Out[43]: datetime.date(2017, 6, 1)

In [44]: obj.end_date
Out[44]: datetime.date(2017, 6, 6)

In [45]: obj.forecast_horizon
Out[45]: 2

In [46]: obj.series_type
Out[46]: 'hourly'

In [47]: obj.model
Out[47]: 'ENS'

In [48]: obj.stations
Out[48]: 
        site area       lat       lon
code                                 
AD0942A  bac  urb  42.50969   1.53914
AT0VOR1  bac  rur  46.67970  12.97190
AT10001  bac  sub  47.84000  16.52670
AT31401  bac  sub  48.08610  16.30220
AT31402  tra  sub  48.12500  16.33170
CH0002R  bac  rur  46.81310   6.94447
CH0005A  bac  sub  47.40290   8.61341
CH0005R  bac  rur  47.06740   8.46334
CH0010A  bac  urb  47.37760   8.53042
CZ0ALIB  bac  sub  50.00730  14.44590
CZ0HHKB  tra  urb  50.19540  15.84640
CZ0JKOS  bac  rur  49.57340  15.08030
CZ0PPLA  tra  urb  49.73240  13.40230
CZ0TOPR  ind  urb  49.85630  18.26970

In [49]: type(obj.obs_df)
Out[49]: pandas.core.frame.DataFrame

In [50]: obj.obs_df
Out[50]: 
code                 AD0942A  AT0VOR1  AT10001  ...  CZ0JKOS  CZ0PPLA  CZ0TOPR
2017-06-01 00:00:00      NaN    52.42    51.67  ...    209.0    260.0    150.0
2017-06-01 01:00:00      NaN    51.94    66.99  ...      NaN      NaN      NaN
2017-06-01 02:00:00      NaN    52.91    48.73  ...    205.0    225.0    137.0
2017-06-01 03:00:00      NaN    51.51    33.35  ...    203.0    221.0      NaN
2017-06-01 04:00:00      NaN    50.15    33.40  ...    200.0    616.0    175.0
...                      ...      ...      ...  ...      ...      ...      ...
2017-06-07 19:00:00      NaN    63.87   255.80  ...     47.0    226.0     58.0
2017-06-07 20:00:00      NaN    64.79   241.32  ...     47.0    247.0      NaN
2017-06-07 21:00:00      NaN    63.82   229.71  ...    104.0    211.0     58.0
2017-06-07 22:00:00      NaN    60.28   201.00  ...     97.0    234.0    130.0
2017-06-07 23:00:00      NaN    61.20   195.26  ...     97.0    210.0    129.0

[168 rows x 14 columns]

In [51]: type(obj.sim_df)
Out[51]: list

In [52]: type(obj.sim_df[0])
Out[52]: pandas.core.frame.DataFrame

In [53]: obj.sim_df
Out[53]: 
[                     AD0942A  AT0VOR1  AT10001  ...  CZ0JKOS  CZ0PPLA  CZ0TOPR
 2017-06-01 00:00:00      NaN  125.615  195.743  ...  118.072  120.903  200.016
 2017-06-01 01:00:00      NaN  124.696  187.501  ...  118.872  121.781  188.149
 2017-06-01 02:00:00      NaN  124.414  180.656  ...  120.605  122.917  183.144
 2017-06-01 03:00:00      NaN  122.835  178.975  ...  123.277  123.259  187.981
 2017-06-01 04:00:00      NaN  122.949  165.245  ...  128.243  125.293  185.981
 ...                      ...      ...      ...  ...      ...      ...      ...
 2017-06-06 19:00:00      NaN  103.669  117.103  ...  122.876  112.820  157.919
 2017-06-06 20:00:00      NaN  104.671  122.962  ...  114.701  111.285  161.868
 2017-06-06 21:00:00      NaN  106.023  114.108  ...  111.590  111.109  166.062
 2017-06-06 22:00:00      NaN  104.786  109.945  ...  109.645  110.133  150.973
 2017-06-06 23:00:00      NaN  104.190  111.574  ...  108.634  109.217  143.948
 
 [144 rows x 14 columns],
                      AD0942A  AT0VOR1  AT10001  ...  CZ0JKOS  CZ0PPLA  CZ0TOPR
 2017-06-02 00:00:00      NaN  130.632  187.774  ...  115.595  123.772  346.895
 2017-06-02 01:00:00      NaN  128.314  191.837  ...  115.705  124.764  307.852
 2017-06-02 02:00:00      NaN  128.497  192.607  ...  115.811  124.966  279.956
 2017-06-02 03:00:00      NaN  128.094  196.502  ...  114.913  125.033  280.241
 2017-06-02 04:00:00      NaN  128.403  191.039  ...  114.898  125.935  303.427
 ...                      ...      ...      ...  ...      ...      ...      ...
 2017-06-07 19:00:00      NaN  100.061  109.947  ...  103.466  105.743  163.747
 2017-06-07 20:00:00      NaN   98.077  109.488  ...  103.569  106.134  188.676
 2017-06-07 21:00:00      NaN   98.452  110.656  ...  103.568  108.130  165.174
 2017-06-07 22:00:00      NaN   97.151  117.971  ...  103.430  109.123  163.276
 2017-06-07 23:00:00      NaN   97.184  118.467  ...  103.774  109.683  171.150
 
 [144 rows x 14 columns]]

Note

obj.obs_df is equivalent to obj.observations.dataset.data, and obj.sim_df[fd] is equivalent to obj.simulations.datasets[fd].data (where fd is one of the forecast days).

How to handle data with a time step different from 1h#

Since version 1.0.4, you can work with data at 1h, 2h, 3h, 4h, 6h and 12h time step. The following methods

have an argument step that corresponds to the time step in hours. This argument is ignored when argument series_type is ‘daily’.

Plotting with translated annotations#

If you want the annotations on your charts to be translated into French, you can set evaltools.plotting.lang = 'FR' in your script.