data_alignment#

Hydrological_model_validator.Processing.data_alignment.align_numpy_arrays(mod_vals: ndarray, sat_vals: ndarray) → Tuple[ndarray, ndarray][source]#

Align two numpy arrays by removing elements where either array contains NaN.

This function creates a boolean mask identifying indices where both input arrays have valid (non-NaN) data, then returns filtered arrays containing only those valid data points.

Parameters:

mod_vals (np.ndarray) – Array of model values.
sat_vals (np.ndarray) – Array of satellite values, must be the same shape as mod_vals.

Returns:

Tuple of two numpy arrays (mod_vals_filtered, sat_vals_filtered) containing only elements where both inputs have valid (non-NaN) data.

Return type:

Tuple[np.ndarray, np.ndarray]

Raises:

TypeError – If either input is not a numpy ndarray.
ValueError – If input arrays do not have the same shape.

Example

>>> import numpy as np
>>> mod = np.array([1.0, np.nan, 3.0, 4.0])
>>> sat = np.array([1.5, 2.0, np.nan, 4.5])
>>> mod_filt, sat_filt = align_numpy_arrays(mod, sat)
>>> print(mod_filt)
[1. 4.]
>>> print(sat_filt)
[1.5 4.5]

Hydrological_model_validator.Processing.data_alignment.align_pandas_series(mod_series: Series, sat_series: Series) → Tuple[ndarray, ndarray][source]#

Align two pandas Series by their index, returning numpy arrays of values where both Series have overlapping indices and non-NaN data.

This function finds the intersection of the indices from the two Series, filters out any entries where either Series has NaN, and returns two numpy arrays containing only the valid paired data, ready for further analysis or comparison.

Parameters:

mod_series (pd.Series) – Pandas Series containing model data.
sat_series (pd.Series) – Pandas Series containing satellite data.

Returns:

Tuple of two numpy arrays: (aligned model values, aligned satellite values), where each array contains only values at indices where both Series have non-NaN data.

Return type:

Tuple[np.ndarray, np.ndarray]

Raises:

TypeError – If either input is not a pandas Series.

Example

>>> import pandas as pd
>>> model_s = pd.Series([1.0, None, 3.0, 4.0], index=pd.date_range("2023-01-01", periods=4))
>>> sat_s = pd.Series([1.5, 2.0, None, 4.5], index=pd.date_range("2023-01-01", periods=4))
>>> mod_vals, sat_vals = align_pandas_series(model_s, sat_s)
>>> print(mod_vals)
[1. 4.]
>>> print(sat_vals)
[1.5 4.5]

Hydrological_model_validator.Processing.data_alignment.apply_3d_mask(data: ndarray, mask3d: ndarray) → ndarray[source]#

Apply a 3D mask to a data array, setting masked elements to NaN where the mask is zero.

This function takes a 3D mask array with shape (depth, lat, lon) and applies it to the input data array, which must have the last three dimensions matching (or broadcast-compatible with) the mask shape. Any element in data corresponding to a zero in the mask will be replaced by np.nan. The mask is broadcasted first to match the mask shape exactly, then broadcasted again to match the full data shape.

Parameters:

data (np.ndarray) – Data array with shape (…, depth, lat, lon) or exactly (depth, lat, lon).
mask3d (np.ndarray) – 3D mask array of shape (depth, lat, lon), where zero values indicate masked regions.

Returns:

Data array of the same shape as the input, with masked elements set to np.nan.

Return type:

np.ndarray

Raises:

TypeError – If either data or mask3d is not a numpy ndarray.
ValueError – If mask3d is not 3-dimensional or cannot be broadcast to the last three dimensions of data.

Example

>>> import numpy as np
>>> data = np.ones((2, 3, 4, 5))
>>> mask = np.ones((3, 4, 5))
>>> mask[1, 2, 3] = 0
>>> masked_data = apply_3d_mask(data, mask)
>>> np.isnan(masked_data[:, 1, 2, 3]).all()
True

Hydrological_model_validator.Processing.data_alignment.extract_mod_sat_keys(taylor_dict: Dict) → Tuple[str, str][source]#

Identify and return the keys corresponding to model and satellite data within a dictionary.

This function searches for keys commonly associated with model data (e.g., ‘mod’, ‘model’, ‘predicted’) and satellite data (e.g., ‘sat’, ‘satellite’, ‘observed’) within the provided dictionary. It returns a tuple containing the identified model and satellite keys.

Parameters:

taylor_dict (dict) – Dictionary expected to contain keys for model and satellite datasets.

Returns:

Tuple with two strings: - model_key: Key associated with model data in the dictionary. - satellite_key: Key associated with satellite data in the dictionary.

Return type:

Tuple[str, str]

Raises:

TypeError – If the input is not a dictionary.
ValueError – If suitable keys for model or satellite data cannot be found in the dictionary.

Example

>>> data = {'model': ..., 'satellite': ...}
>>> extract_mod_sat_keys(data)
('model', 'satellite')

Hydrological_model_validator.Processing.data_alignment.gather_monthly_data_across_years(data_dict: Dict[str, Dict[int, List[ndarray | list]]], key: str, month_idx: int) → ndarray[source]#

Collect and concatenate data for a specified month across all years for a given dataset key.

This function extracts monthly data arrays or lists for the specified key (e.g., model or satellite) from each year in the provided nested dictionary. It flattens each month’s data, concatenates all years’ data for that month into a single 1D numpy array, and removes any NaN values.

Parameters:

data_dict (dict) – Nested dictionary containing data arrays/lists keyed first by dataset keys (e.g., ‘mod’, ‘sat’), then by year (int), where each year maps to a list of monthly arrays or lists.
key (str) – Dataset key to select data from data_dict (e.g., ‘mod’ or ‘sat’).
month_idx (int) – Zero-based month index to select (0 = January, …, 11 = December).

Returns:

One-dimensional numpy array of concatenated valid (non-NaN) data for the specified month across all years.

Return type:

np.ndarray

Raises:

ValueError – If data_dict is not a dictionary or key is not found in it, or data for a year/month is invalid.
IndexError – If month_idx is not in the range 0 to 11 or if any year’s data does not have enough monthly entries.

Example

>>> data = {
...     'mod': {
...         2020: [np.array([1, 2, np.nan]), np.array([3, 4]), *[np.array([])]*10],
...         2021: [np.array([5, np.nan]), np.array([6, 7]), *[np.array([])]*10]
...     }
... }
>>> gather_monthly_data_across_years(data, 'mod', 0)
array([1., 2., 5.])

Hydrological_model_validator.Processing.data_alignment.get_common_series_by_year(data_dict: Dict[str, Dict[int, Series]]) → List[Tuple[str, ndarray, ndarray]][source]#

Extract and align model and satellite time series data by year, returning only overlapping data points.

This function takes a dictionary containing yearly model and satellite data as pandas Series, aligns them on their time indices for each year, and returns numpy arrays of paired values where both datasets have valid (non-NaN) data.

Parameters:: data_dict (dict) – Dictionary with keys for model and satellite data (e.g., ‘model’, ‘satellite’), each mapping to a dictionary keyed by year (int), with pandas Series as values.
Returns:: List of tuples, each containing: - year as a string, - numpy array of aligned model values for that year, - numpy array of aligned satellite values for that year. Only years with overlapping valid data are included.
Return type:: List[Tuple[str, np.ndarray, np.ndarray]]
Raises:: TypeError – If input is not a dictionary or if the model/satellite data are not dictionaries keyed by years.

Notes

This function depends on extract_mod_sat_keys(data_dict) to determine the model and satellite keys.

Example

>>> import pandas as pd
>>> import numpy as np
>>> data = {
...     'model': {
...         2000: pd.Series([1.0, 2.0, np.nan], index=pd.date_range('2000-01-01', periods=3)),
...         2001: pd.Series([4.0, 5.0], index=pd.date_range('2001-01-01', periods=2)),
...     },
...     'satellite': {
...         2000: pd.Series([1.1, 2.1, 3.1], index=pd.date_range('2000-01-01', periods=3)),
...         2001: pd.Series([4.1, np.nan], index=pd.date_range('2001-01-01', periods=2)),
...     }
... }
>>> get_common_series_by_year(data)
[('2000', array([1., 2.]), array([1.1, 2.1])), ('2001', array([4.]), array([4.1]))]

Hydrological_model_validator.Processing.data_alignment.get_common_series_by_year_month(data_dict: Dict[str, Dict[int | str, List[ndarray]]]) → List[Tuple[int, int, ndarray, ndarray]][source]#

Extract and align monthly model and satellite data by year.

This function iterates over all available years and months in the input data, aligning model and satellite arrays for each month. It filters out elements where either array contains NaN values, returning only valid data pairs.

Parameters:

data_dict (dict) – Dictionary with two top-level keys (e.g., ‘model’ and ‘satellite’). Each key maps to a dictionary where each year (int or str) maps to a list of 12 numpy arrays, one per month, representing time-resolved spatial or summary data.

Returns:

A list of tuples, each containing: - year as an integer, - month index (0-based, 0 = January, 11 = December), - NumPy array of model values with valid data, - NumPy array of satellite values with valid data.

Return type:

List[Tuple[int, int, np.ndarray, np.ndarray]]

Raises:

TypeError – If input is not structured as expected (e.g., dicts or lists missing or incorrect types).
ValueError – If a year does not contain 12 monthly entries.

Example

>>> import numpy as np
>>> data = {
...     'model': {
...         2000: [np.array([1.0, np.nan]), np.array([2.0]), *[np.array([])]*10]
...     },
...     'satellite': {
...         2000: [np.array([1.1, 2.2]), np.array([2.1]), *[np.array([])]*10]
...     }
... }
>>> get_common_series_by_year_month(data)
[(2000, 0, array([1.]), array([1.1])), (2000, 1, array([2.]), array([2.1]))]

Hydrological_model_validator.Processing.data_alignment.get_valid_mask(mod_vals: ndarray, sat_vals: ndarray) → ndarray[source]#

Generate a boolean mask identifying elements where both model and satellite data are valid (non-NaN).

This function compares two numpy arrays element-wise and returns a boolean array that is True only at positions where neither array has NaN values, effectively marking data points valid for both datasets. This mask can be used for paired analysis or filtering.

Parameters:

mod_vals (np.ndarray) – Array of model data values, can be of any shape.
sat_vals (np.ndarray) – Array of satellite data values, must have the same shape as mod_vals.

Returns:

Boolean numpy array of the same shape as inputs, where True indicates positions with valid (non-NaN) data in both mod_vals and sat_vals, and False otherwise.

Return type:

np.ndarray

Raises:

TypeError – If either mod_vals or sat_vals is not a numpy ndarray.
ValueError – If the shapes of mod_vals and sat_vals do not match.

Example

>>> import numpy as np
>>> model_data = np.array([1.0, np.nan, 3.0, 4.0])
>>> satellite_data = np.array([1.5, 2.0, np.nan, 4.5])
>>> mask = get_valid_mask(model_data, satellite_data)
>>> print(mask)
[ True False False  True]

Hydrological_model_validator.Processing.data_alignment.get_valid_mask_pandas(mod_series: Series, sat_series: Series) → Series[source]#

Generate a boolean pandas Series mask indicating positions where both model and satellite data Series have valid (non-NaN) values, aligned by their common index.

This function takes two pandas Series, aligns them on the intersection of their indices, and returns a boolean Series that is True where both input Series have non-missing data. This mask can be used to filter or compare paired time series or other indexed data.

Parameters:

mod_series (pd.Series) – Pandas Series containing model data values.
sat_series (pd.Series) – Pandas Series containing satellite data values.

Returns:

Boolean Series indexed by the intersection of input Series indices, where True indicates valid data points (non-NaN) in both inputs.

Return type:

pd.Series

Raises:

TypeError – If either input is not a pandas Series.

Example

>>> import pandas as pd
>>> model_s = pd.Series([1.0, None, 3.0, 4.0], index=pd.date_range("2023-01-01", periods=4))
>>> sat_s = pd.Series([1.5, 2.0, None, 4.5], index=pd.date_range("2023-01-01", periods=4))
>>> mask = get_valid_mask_pandas(model_s, sat_s)
>>> print(mask)
2023-01-01     True
2023-01-02    False
2023-01-03    False
2023-01-04     True
Freq: D, dtype: bool

data_alignment

Contents

data_alignment#