time_utils#

Hydrological_model_validator.Processing.time_utils.Timer(description)[source]#
Hydrological_model_validator.Processing.time_utils.ensure_datetime_index(series: Series, label: str) Series[source]#

Ensure that a pandas Series has a DatetimeIndex. If not, prompt the user to create one.

This function checks whether the index of the provided pandas Series is a DatetimeIndex. If the index is not datetime-based, it asks the user to input a start date and frequency, then generates and assigns a new DatetimeIndex to the Series accordingly.

Parameters:
  • series (pd.Series) – The pandas Series whose index is to be checked and possibly converted.

  • label (str) – A descriptive name for the series, used in prompts and messages.

Returns:

The original Series if it already had a DatetimeIndex, or the Series with a newly created DatetimeIndex based on user input.

Return type:

pd.Series

Example

>>> s = pd.Series([1, 2, 3])
>>> s = ensure_datetime_index(s, "Sample Series")
Please enter the start date for Sample Series (YYYY-MM-DD): 2020-01-01
Please enter the data frequency (e.g. 'D') for Sample Series: D
DatetimeIndex created for Sample Series from 2020-01-01 with frequency 'D'.
Hydrological_model_validator.Processing.time_utils.get_common_years(data_dict: Dict[str, Dict[int | str, Any]], mod_key: str, sat_key: str) List[str | int][source]#

Get sorted years present in both model and satellite datasets.

Parameters:
  • data_dict (dict) – Dictionary where keys are dataset names and values are dictionaries keyed by year.

  • mod_key (str) – Key for the model dataset in data_dict.

  • sat_key (str) – Key for the satellite dataset in data_dict.

Returns:

Sorted list of years present in both model and satellite datasets.

Return type:

List[int or str]

Raises:

ValueError – If mod_key or sat_key are not present in data_dict or their values are not dictionaries.

Examples

>>> data = {
...     'model': {2020: 'data1', 2021: 'data2'},
...     'satellite': {2021: 'dataA', 2022: 'dataB'}
... }
>>> get_common_years(data, 'model', 'satellite')
[2021]
Hydrological_model_validator.Processing.time_utils.get_season_mask(dates: DatetimeIndex | Series, season_name: str) ndarray[source]#

Generate a boolean mask for the given season on a datetime index or series.

Parameters:
  • dates (pd.DatetimeIndex or pd.Series) – Datetime-like index or pandas Series with a datetime index.

  • season_name (str) – Season name. Must be one of {‘DJF’, ‘MAM’, ‘JJA’, ‘SON’}.

Returns:

Boolean mask array indicating whether each date falls in the specified season.

Return type:

np.ndarray

Raises:
  • ValueError – If season_name is not one of the expected values.

  • TypeError – If dates is not a pandas DatetimeIndex or a Series with DatetimeIndex.

Examples

>>> import pandas as pd
>>> dates = pd.date_range('2023-01-01', periods=12, freq='M')
>>> get_season_mask(dates, 'DJF')
array([ True,  True, False, False, False, False, False, False, False, False, False,  True])
Hydrological_model_validator.Processing.time_utils.is_invalid_time_index(time_index: Index | ndarray) bool[source]#

Check whether a given time index is invalid based on dtype and value range.

This function validates if the input time_index is a valid datetime index. It considers the time index invalid if: - The dtype is not a datetime64 type. - All timestamps fall within a very narrow range starting from the Unix epoch (1970-01-01)

and the differences between consecutive timestamps are extremely small (less than 1 millisecond).

Such a time index might indicate corrupted or placeholder data.

Parameters:

time_index (array-like) – An array or pandas Index representing time values, expected to be datetime-like.

Returns:

True if the time index is considered invalid, False otherwise.

Return type:

bool

Example

>>> import numpy as np
>>> import pandas as pd
>>> # Valid datetime index
>>> idx = pd.date_range("2023-01-01", periods=3)
>>> is_invalid_time_index(idx)
False
>>> # Invalid: non-datetime dtype
>>> is_invalid_time_index(np.array([1, 2, 3]))
True
Hydrological_model_validator.Processing.time_utils.leapyear(year: int) int[source]#

Check if a given year is a leap year.

Parameters:

year (int) – Year as a positive integer.

Returns:

Returns 1 if the year is a leap year, 0 otherwise.

Return type:

int

Raises:

ValueError – If year is not a positive integer.

Examples

>>> leapyear(2020)
1
>>> leapyear(1900)
0
>>> leapyear(2000)
1
>>> leapyear(2023)
0
Hydrological_model_validator.Processing.time_utils.prompt_for_datetime_index(length: int) DatetimeIndex[source]#

Prompt the user to manually enter a valid datetime index for a time series.

When an invalid or missing time index is detected, this function interacts with the user to obtain a valid start date and frequency. It then generates a pandas DatetimeIndex of the specified length with the given frequency.

Parameters:

length (int) – The desired length of the datetime index to generate.

Returns:

A pandas DatetimeIndex object starting from the user-provided date, with the specified frequency and length.

Return type:

pd.DatetimeIndex

Example

>>> idx = prompt_for_datetime_index(10)
Enter the start date for the time series (e.g. 2000-01-01): 2020-01-01
Enter the frequency (e.g. 'D' for daily, 'H' for hourly): D
Generated datetime index from 2020-01-01 00:00:00 with frequency 'D'.
Hydrological_model_validator.Processing.time_utils.resample_and_compute(model_sst_chunked: DataArray | Dataset, sat_sst_chunked: DataArray | Dataset) Tuple[DataArray | Dataset, DataArray | Dataset][source]#

Resample the input chunked SST datasets to monthly means and compute them concurrently.

Parameters:
  • model_sst_chunked (xarray.DataArray or Dataset) – The model SST dataset chunked for dask processing.

  • sat_sst_chunked (xarray.DataArray or Dataset) – The satellite SST dataset chunked for dask processing.

Returns:

  • model_sst_monthly (xarray.DataArray or Dataset) – The computed monthly mean resampled model SST.

  • sat_sst_monthly (xarray.DataArray or Dataset) – The computed monthly mean resampled satellite SST.

Hydrological_model_validator.Processing.time_utils.split_to_monthly(yearly_data: Dict[int, Series | DataFrame]) Dict[int, List[Series | DataFrame]][source]#

Split yearly pandas Series or DataFrames with datetime index into monthly segments.

Parameters:

yearly_data (dict[int, pd.Series or pd.DataFrame]) – Dictionary keyed by year, with values being pandas Series or DataFrames indexed by datetime.

Returns:

Dictionary keyed by year, each containing a list of 12 elements corresponding to monthly slices of the data (January to December). Months with no data will have empty Series or DataFrames of the same type.

Return type:

dict[int, list[pd.Series or pd.DataFrame]]

Raises:

ValueError – If yearly_data is not a dictionary or values are not pandas Series/DataFrames with datetime-like indexes.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> dates = pd.date_range('2020-01-01', '2020-12-31')
>>> data = pd.Series(np.random.rand(len(dates)), index=dates)
>>> yearly = {2020: data}
>>> monthly = split_to_monthly(yearly)
>>> len(monthly[2020])
12
>>> monthly[2020][0].index.month.unique()
Int64Index([1], dtype='int64')
Hydrological_model_validator.Processing.time_utils.split_to_yearly(series: Series, unique_years: List[str | int]) Dict[int | str, Series][source]#

Split a pandas Series with a datetime index into a dictionary keyed by year.

Parameters:
  • series (pd.Series) – Time-indexed pandas Series with a datetime index.

  • unique_years (list of int or str) – List of years to split the series into. Years can be int or string representations.

Returns:

Dictionary keyed by year containing the Series filtered for that year.

Return type:

dict of year (int or str) to pd.Series

Raises:

ValueError – If the series does not have a DatetimeIndex or unique_years contains invalid types.

Examples

>>> import pandas as pd
>>> dates = pd.date_range('2020-01-01', periods=365)
>>> s = pd.Series(range(365), index=dates)
>>> split_yearly = split_to_yearly(s, [2020])
>>> list(split_yearly.keys())
[2020]
>>> split_yearly[2020].index.year.unique()
Int64Index([2020], dtype='int64')
Hydrological_model_validator.Processing.time_utils.true_time_series_length(chlfstart: List[int], chlfend: List[int], DinY: int) int[source]#

Calculate the true time series length in days over multiple files, accounting for leap years.

Parameters:
  • chlfstart (list[int]) – List of start years per file.

  • chlfend (list[int]) – List of end years per file.

  • DinY (int) – Number of days in a normal year (expected 365).

Returns:

Total number of days including leap years.

Return type:

int

Raises:

ValueError – If input types or values do not meet expectations.

Examples

>>> true_time_series_length([2000], [2001], 365)
731
>>> true_time_series_length([1999, 2001], [2000, 2002], 365)
1096