time_utils#
- Hydrological_model_validator.Processing.time_utils.ensure_datetime_index(series: Series, label: str) Series[source]#
Ensure that a pandas Series has a DatetimeIndex. If not, prompt the user to create one.
This function checks whether the index of the provided pandas Series is a DatetimeIndex. If the index is not datetime-based, it asks the user to input a start date and frequency, then generates and assigns a new DatetimeIndex to the Series accordingly.
- Parameters:
series (pd.Series) – The pandas Series whose index is to be checked and possibly converted.
label (str) – A descriptive name for the series, used in prompts and messages.
- Returns:
The original Series if it already had a DatetimeIndex, or the Series with a newly created DatetimeIndex based on user input.
- Return type:
pd.Series
Example
>>> s = pd.Series([1, 2, 3]) >>> s = ensure_datetime_index(s, "Sample Series") Please enter the start date for Sample Series (YYYY-MM-DD): 2020-01-01 Please enter the data frequency (e.g. 'D') for Sample Series: D DatetimeIndex created for Sample Series from 2020-01-01 with frequency 'D'.
- Hydrological_model_validator.Processing.time_utils.get_common_years(data_dict: Dict[str, Dict[int | str, Any]], mod_key: str, sat_key: str) List[str | int][source]#
Get sorted years present in both model and satellite datasets.
- Parameters:
data_dict (dict) – Dictionary where keys are dataset names and values are dictionaries keyed by year.
mod_key (str) – Key for the model dataset in data_dict.
sat_key (str) – Key for the satellite dataset in data_dict.
- Returns:
Sorted list of years present in both model and satellite datasets.
- Return type:
List[int or str]
- Raises:
ValueError – If mod_key or sat_key are not present in data_dict or their values are not dictionaries.
Examples
>>> data = { ... 'model': {2020: 'data1', 2021: 'data2'}, ... 'satellite': {2021: 'dataA', 2022: 'dataB'} ... } >>> get_common_years(data, 'model', 'satellite') [2021]
- Hydrological_model_validator.Processing.time_utils.get_season_mask(dates: DatetimeIndex | Series, season_name: str) ndarray[source]#
Generate a boolean mask for the given season on a datetime index or series.
- Parameters:
dates (pd.DatetimeIndex or pd.Series) – Datetime-like index or pandas Series with a datetime index.
season_name (str) – Season name. Must be one of {‘DJF’, ‘MAM’, ‘JJA’, ‘SON’}.
- Returns:
Boolean mask array indicating whether each date falls in the specified season.
- Return type:
np.ndarray
- Raises:
ValueError – If season_name is not one of the expected values.
TypeError – If dates is not a pandas DatetimeIndex or a Series with DatetimeIndex.
Examples
>>> import pandas as pd >>> dates = pd.date_range('2023-01-01', periods=12, freq='M') >>> get_season_mask(dates, 'DJF') array([ True, True, False, False, False, False, False, False, False, False, False, True])
- Hydrological_model_validator.Processing.time_utils.is_invalid_time_index(time_index: Index | ndarray) bool[source]#
Check whether a given time index is invalid based on dtype and value range.
This function validates if the input time_index is a valid datetime index. It considers the time index invalid if: - The dtype is not a datetime64 type. - All timestamps fall within a very narrow range starting from the Unix epoch (1970-01-01)
and the differences between consecutive timestamps are extremely small (less than 1 millisecond).
Such a time index might indicate corrupted or placeholder data.
- Parameters:
time_index (array-like) – An array or pandas Index representing time values, expected to be datetime-like.
- Returns:
True if the time index is considered invalid, False otherwise.
- Return type:
bool
Example
>>> import numpy as np >>> import pandas as pd >>> # Valid datetime index >>> idx = pd.date_range("2023-01-01", periods=3) >>> is_invalid_time_index(idx) False >>> # Invalid: non-datetime dtype >>> is_invalid_time_index(np.array([1, 2, 3])) True
- Hydrological_model_validator.Processing.time_utils.leapyear(year: int) int[source]#
Check if a given year is a leap year.
- Parameters:
year (int) – Year as a positive integer.
- Returns:
Returns 1 if the year is a leap year, 0 otherwise.
- Return type:
int
- Raises:
ValueError – If year is not a positive integer.
Examples
>>> leapyear(2020) 1 >>> leapyear(1900) 0 >>> leapyear(2000) 1 >>> leapyear(2023) 0
- Hydrological_model_validator.Processing.time_utils.prompt_for_datetime_index(length: int) DatetimeIndex[source]#
Prompt the user to manually enter a valid datetime index for a time series.
When an invalid or missing time index is detected, this function interacts with the user to obtain a valid start date and frequency. It then generates a pandas DatetimeIndex of the specified length with the given frequency.
- Parameters:
length (int) – The desired length of the datetime index to generate.
- Returns:
A pandas DatetimeIndex object starting from the user-provided date, with the specified frequency and length.
- Return type:
pd.DatetimeIndex
Example
>>> idx = prompt_for_datetime_index(10) Enter the start date for the time series (e.g. 2000-01-01): 2020-01-01 Enter the frequency (e.g. 'D' for daily, 'H' for hourly): D Generated datetime index from 2020-01-01 00:00:00 with frequency 'D'.
- Hydrological_model_validator.Processing.time_utils.resample_and_compute(model_sst_chunked: DataArray | Dataset, sat_sst_chunked: DataArray | Dataset) Tuple[DataArray | Dataset, DataArray | Dataset][source]#
Resample the input chunked SST datasets to monthly means and compute them concurrently.
- Parameters:
model_sst_chunked (xarray.DataArray or Dataset) – The model SST dataset chunked for dask processing.
sat_sst_chunked (xarray.DataArray or Dataset) – The satellite SST dataset chunked for dask processing.
- Returns:
model_sst_monthly (xarray.DataArray or Dataset) – The computed monthly mean resampled model SST.
sat_sst_monthly (xarray.DataArray or Dataset) – The computed monthly mean resampled satellite SST.
- Hydrological_model_validator.Processing.time_utils.split_to_monthly(yearly_data: Dict[int, Series | DataFrame]) Dict[int, List[Series | DataFrame]][source]#
Split yearly pandas Series or DataFrames with datetime index into monthly segments.
- Parameters:
yearly_data (dict[int, pd.Series or pd.DataFrame]) – Dictionary keyed by year, with values being pandas Series or DataFrames indexed by datetime.
- Returns:
Dictionary keyed by year, each containing a list of 12 elements corresponding to monthly slices of the data (January to December). Months with no data will have empty Series or DataFrames of the same type.
- Return type:
dict[int, list[pd.Series or pd.DataFrame]]
- Raises:
ValueError – If yearly_data is not a dictionary or values are not pandas Series/DataFrames with datetime-like indexes.
Examples
>>> import pandas as pd >>> import numpy as np >>> dates = pd.date_range('2020-01-01', '2020-12-31') >>> data = pd.Series(np.random.rand(len(dates)), index=dates) >>> yearly = {2020: data} >>> monthly = split_to_monthly(yearly) >>> len(monthly[2020]) 12 >>> monthly[2020][0].index.month.unique() Int64Index([1], dtype='int64')
- Hydrological_model_validator.Processing.time_utils.split_to_yearly(series: Series, unique_years: List[str | int]) Dict[int | str, Series][source]#
Split a pandas Series with a datetime index into a dictionary keyed by year.
- Parameters:
series (pd.Series) – Time-indexed pandas Series with a datetime index.
unique_years (list of int or str) – List of years to split the series into. Years can be int or string representations.
- Returns:
Dictionary keyed by year containing the Series filtered for that year.
- Return type:
dict of year (int or str) to pd.Series
- Raises:
ValueError – If the series does not have a DatetimeIndex or unique_years contains invalid types.
Examples
>>> import pandas as pd >>> dates = pd.date_range('2020-01-01', periods=365) >>> s = pd.Series(range(365), index=dates) >>> split_yearly = split_to_yearly(s, [2020]) >>> list(split_yearly.keys()) [2020] >>> split_yearly[2020].index.year.unique() Int64Index([2020], dtype='int64')
- Hydrological_model_validator.Processing.time_utils.true_time_series_length(chlfstart: List[int], chlfend: List[int], DinY: int) int[source]#
Calculate the true time series length in days over multiple files, accounting for leap years.
- Parameters:
chlfstart (list[int]) – List of start years per file.
chlfend (list[int]) – List of end years per file.
DinY (int) – Number of days in a normal year (expected 365).
- Returns:
Total number of days including leap years.
- Return type:
int
- Raises:
ValueError – If input types or values do not meet expectations.
Examples
>>> true_time_series_length([2000], [2001], 365) 731 >>> true_time_series_length([1999, 2001], [2000, 2002], 365) 1096