Efficiency_metrics

Efficiency_metrics#

Hydrological_model_validator.Processing.Efficiency_metrics.compute_error_timeseries(model_sst_data: DataArray, sat_sst_data: DataArray, ocean_mask: DataArray) → DataFrame[source]#

Compute daily error statistics between model and satellite SST data within a specified basin mask.

For each time step, this function applies the spatial mask to both model and satellite data, computes a suite of statistical metrics on the valid values, and returns a time-indexed DataFrame containing these metrics.

Parameters:

model_sst_data (xarray.DataArray) – Sea Surface Temperature (SST) data from the model, with dimensions including ‘time’, ‘lat’, and ‘lon’.
sat_sst_data (xarray.DataArray) – Observed satellite SST data, aligned in space and time with the model data.
basin_mask (xarray.DataArray) – Boolean mask (with dimensions ‘lat’ and ‘lon’) indicating the spatial domain (e.g., a basin) over which statistics should be computed. True (or 1) values indicate inclusion.

Returns:

DataFrame indexed by time (daily), where each row contains statistics computed between model and satellite SST for the corresponding day. Columns may include metrics such as: - Mean Bias - Standard Deviation of Error - Correlation Coefficient - RMSE - Relative metrics, etc. (as defined by compute_stats_single_time)

Return type:

pandas.DataFrame

Notes

Assumes input DataArrays are spatially aligned and share a common time coordinate.
Invalid (NaN or masked) values are excluded before metric computation.
The function compute_stats_single_time must return a dictionary or similar structure convertible to a DataFrame row.

Examples

>>> df = compute_error_timeseries(model_sst_data, sat_sst_data, basin_mask)
>>> df.head()
                     mean_bias  rmse   corr
2000-01-01             -0.12    0.45   0.88
2000-01-02             -0.15    0.51   0.85
...

Hydrological_model_validator.Processing.Efficiency_metrics.compute_spatial_efficiency(model_da: DataArray, sat_da: DataArray, time_group: Literal['month', 'year'] = 'month') → Tuple[DataArray, DataArray, DataArray, DataArray, DataArray, DataArray][source]#

Compute spatial efficiency metrics between model and satellite data aggregated over time groups.

This function calculates multiple performance metrics spatially across the domain by aggregating the input datasets over calendar months or years. It returns time-resolved maps for each metric.

Parameters:

model_da (xarray.DataArray) – Model data with a ‘time’ coordinate.
sat_da (xarray.DataArray) – Satellite (observed) data with a ‘time’ coordinate, matching the model in space and time.
time_group ({'month', 'year'}, optional) –
Temporal aggregation level:
- ’month’: groups data by calendar month (1–12),
- ’year’: groups data by unique years in the time dimension.

Returns:

Six DataArrays with spatial metrics computed for each time group (month or year), with dimensions: (time_group, lat, lon). The returned metrics are:

mb_all : Mean Bias

sde_all : Standard Deviation of the Error

cc_all : Pearson Cross-Correlation

rm_all : Standard Deviation of the Model

ro_all : Standard Deviation of the Observation

urmse_all : Unbiased Root Mean Squared Error

Return type:

tuple of xarray.DataArray

Raises:

ValueError – If time_group is not ‘month’ or ‘year’.

Notes

Input DataArrays must have a ‘time’ coordinate with datetime-like values.
Each metric is computed for all available times in the group (month or year).
The function assumes spatial alignment between model and satellite datasets.

Examples

>>> mb, sde, cc, rm, ro, urmse = compute_spatial_efficiency(model_da, sat_da, time_group="month")
>>> mb.sel(month=1).plot()  # Plot Mean Bias for January

Hydrological_model_validator.Processing.Efficiency_metrics.compute_stats_single_time(model_slice: ndarray, sat_slice: ndarray) → dict[source]#

Compute error statistics between model and satellite data for a single time slice.

This function evaluates a set of core statistical metrics comparing model output and satellite observations for one timestep, using only valid (non-NaN) paired values.

Parameters:

model_slice (np.ndarray) – 1D array of model data values at a single timestep, typically flattened from 2D (lat/lon).
sat_slice (np.ndarray) – 1D array of satellite observation values at the same timestep and spatial extent.

Returns:

Dictionary containing the following metrics: - ‘mean_bias’ : float

Mean difference between model and satellite values.

’unbiased_rmse’float
Root Mean Square Error after removing mean bias.
’std_error’float
Standard deviation of the model-satellite difference.
’cross_correlation’float
Pearson correlation coefficient between model and satellite.

If no valid data pairs exist, all values are returned as np.nan.

Return type:

dict

Notes

Only pairs where both model and satellite values are finite (non-NaN) are used.
This function assumes input arrays are already aligned in space.

Examples

>>> m = np.array([20.1, 19.5, np.nan, 21.0])
>>> o = np.array([19.8, 19.7, 20.0, 21.1])
>>> compute_stats_single_time(m, o)
{'mean_bias': 0.0X, 'unbiased_rmse': 0.0Y, 'std_error': 0.0Z, 'correlation': 0.99}

Hydrological_model_validator.Processing.Efficiency_metrics.index_of_agreement(obs: ndarray | Sequence[float], pred: ndarray | Sequence[float]) → float[source]#

Calculate the Index of Agreement (d) between observed and predicted values.

The Index of Agreement is a standardized measure of the degree of model prediction error, which varies between 0 (no agreement) and 1 (perfect agreement).

Parameters:

obs (array-like) – Observed values.
pred (array-like) – Predicted values.

Returns:

Index of Agreement (d), or np.nan if insufficient valid data or denominator is zero.

Return type:

float

Notes

Excludes any pairs where either observed or predicted values are NaN.
Requires at least two valid data points to compute.
Denominator involves sums of absolute deviations from the observed mean.
The metric penalizes both under- and over-prediction differently than simple correlation.

Examples

>>> obs = np.array([1, 2, 3, 4, 5])
>>> pred = np.array([1.1, 2.1, 2.9, 4.2, 4.8])
>>> index_of_agreement(obs, pred)
0.97  # example output (actual value depends on data)

Hydrological_model_validator.Processing.Efficiency_metrics.index_of_agreement_j(obs: Sequence[float] | ndarray, pred: Sequence[float] | ndarray, j: float = 1) → float[source]#

Compute modified Index of Agreement (d_j) with arbitrary exponent j.

This generalizes the Index of Agreement by raising absolute deviations to the power j, allowing flexible emphasis on prediction errors.

Parameters:

obs (array-like) – Observed values.
pred (array-like) – Predicted values.
j (float, optional) – Exponent parameter applied to absolute deviations (default is 1).

Returns:

Modified Index of Agreement (d_j), or np.nan if insufficient valid data or zero denominator.

Return type:

float

Notes

The modified index is defined as: d_j = 1 - (sum(|obs - pred|^j) / sum((|pred - mean(obs)| + |obs - mean(obs)|)^j))
Requires at least two paired valid values.
If denominator is zero (lack of variability), returns np.nan.
Larger j values penalize larger deviations more heavily.

Examples

>>> obs = np.array([2, 3, 4])
>>> pred = np.array([2.1, 2.9, 3.8])
>>> index_of_agreement_j(obs, pred, j=2)
0.92  # example output, actual depends on data

Hydrological_model_validator.Processing.Efficiency_metrics.ln_nse(obs: Sequence[float] | ndarray, pred: Sequence[float] | ndarray) → float[source]#

Compute the Nash–Sutcliffe Efficiency (NSE) on the natural logarithms of observed and predicted data.

This metric evaluates model performance emphasizing relative differences by transforming data with the natural logarithm. It is useful when data span several orders of magnitude or when multiplicative errors are more meaningful.

Parameters:

obs (array-like) – Observed values (must be strictly positive).
pred (array-like) – Predicted values (must be strictly positive).

Returns:

Logarithmic NSE value, or np.nan if there is insufficient valid data or if input contains non-positive values.

Return type:

float

Notes

Both obs and pred must contain strictly positive values; zeros or negatives will be excluded from the calculation.
The function computes NSE on ln(obs) and ln(pred).
Requires at least two valid paired observations.
Returns np.nan if denominator in NSE calculation is zero or data are insufficient.

Examples

>>> obs = np.array([1.0, 10.0, 100.0, 1000.0])
>>> pred = np.array([1.1, 9.5, 110.0, 950.0])
>>> ln_nse(obs, pred)
0.95  # example output (actual value depends on data)

Hydrological_model_validator.Processing.Efficiency_metrics.monthly_index_of_agreement(dictionary: Dict[str, Dict[int, List[ndarray | List[float]]]]) → List[float][source]#

Compute the monthly Index of Agreement (d) between model and satellite datasets.

This function aggregates paired model and satellite data across multiple years, then calculates the Index of Agreement for each month to evaluate the agreement between predicted and observed values.

Parameters:: dictionary (dict) – Dictionary containing keys with ‘mod’ and ‘sat’ indicating model and satellite data. Each key maps to a dict where each year contains a list or array of 12 monthly data arrays.
Returns:: List of 12 Index of Agreement values, one per month.
Return type:: list of float
Raises:: KeyError – If no keys containing ‘mod’ or ‘sat’ are found in the dictionary.

Notes

The function concatenates monthly data across all years before computing the metric.
Handles NaNs by excluding them pairwise in the calculation.
Requires at least two valid data points per month to return a numeric result.
Returns np.nan for months with insufficient data.

Examples

>>> data = {
...     'mod_data': {2020: [np.array([1,2]), np.array([3,4]), ...], ...},
...     'sat_data': {2020: [np.array([1.1,1.9]), np.array([2.9,4.1]), ...], ...}
... }
>>> monthly_index_of_agreement(data)
[0.98, 0.95, ..., 0.97]  # 12 values for each month

Hydrological_model_validator.Processing.Efficiency_metrics.monthly_index_of_agreement_j(dictionary: Dict[str, Dict[int, List[ndarray | list]]], j: float = 1) → List[float][source]#

Compute monthly modified Index of Agreement (d_j) with exponent j from paired model and satellite data.

Calculates the modified Index of Agreement for each calendar month by aggregating all paired model and satellite data across years. The exponent j controls the sensitivity of the metric to deviations, with j=1 corresponding to the standard index.

Parameters:

dictionary (dict) – Dictionary with keys containing ‘mod’ and ‘sat’ for model and satellite data. Each key maps to a dict of years, where each year is a list or array of 12 monthly arrays.
j (float, optional) – Exponent parameter for the modified Index of Agreement (default is 1).

Returns:

Modified Index of Agreement (d_j) values for each month (length 12).

Return type:

list of float

Raises:

KeyError – If model or satellite keys are missing from the dictionary.

Notes

Requires at least two paired valid observations per month.
Returns np.nan for months with insufficient data or zero variability.
The metric generalizes the traditional Index of Agreement by raising deviations to the power j, allowing emphasis on different scales of error.

Examples

>>> dictionary = {
...     'mod_data': {2020: [np.array([1,2]), np.array([3,4]), ...], 2021: [...], ...},
...     'sat_data': {2020: [np.array([1.1,2.1]), np.array([3.1,3.9]), ...], 2021: [...], ...}
... }
>>> monthly_index_of_agreement_j(dictionary, j=2)
[0.85, 0.88, ..., 0.90]  # list of 12 floats, one per month

Hydrological_model_validator.Processing.Efficiency_metrics.monthly_ln_nse(dictionary: Dict[str, Dict[int, List[ndarray | list]]]) → List[float][source]#

Compute monthly logarithmic Nash–Sutcliffe Efficiency (ln NSE) from paired model and satellite data.

This metric evaluates model performance on the natural logarithm scale, emphasizing relative differences and multiplicative errors.

Parameters:: dictionary (dict) – Dictionary with keys containing ‘mod’ and ‘sat’ for model and satellite data. Each key maps to a dict of years, each year a list/array of 12 monthly arrays.
Returns:: Logarithmic NSE values for each month (length 12).
Return type:: list of float
Raises:: KeyError – If model or satellite keys are missing.

Notes

Only pairs of positive values (both observed and predicted) are considered for each month.
Months without sufficient valid data yield np.nan.
Relies on the ln_nse function to compute the metric.

Examples

>>> data = {
...     'model': {2020: [np.array([1,2]), np.array([3,4]), ..., np.array([11,12])]},
...     'satellite': {2020: [np.array([1.1,2.1]), np.array([2.9,3.8]), ..., np.array([10.9,11.8])]}
... }
>>> monthly_ln_nse(data)
[0.95, 0.87, ..., 0.93]  # example output (actual values depend on data)

Hydrological_model_validator.Processing.Efficiency_metrics.monthly_nse(dictionary: Dict[str, Dict[int, List[ndarray | List[float]]]]) → List[float][source]#

Compute monthly Nash–Sutcliffe Efficiency (NSE) between model and satellite datasets.

This function aggregates paired model and satellite data over multiple years, calculates the NSE for each calendar month, and returns a list of monthly NSE values.

Parameters:: dictionary (dict) – Dictionary containing keys with ‘mod’ and ‘sat’ for model and satellite data. Each key maps to a dictionary of years, where each year is a list or array of 12 monthly data arrays.
Returns:: NSE values for each month (length 12). Each value represents the NSE aggregated over all years for that month.
Return type:: list of float
Raises:: KeyError – If no model or satellite keys are found in the input dictionary.

Notes

The function concatenates monthly data across all years before computing NSE.
Pairs with NaN values in either dataset are excluded.
Returns np.nan for months where valid paired data is insufficient.

Examples

>>> data = {
...     'model_data': {
...         2020: [np.array([1, 2]), np.array([2, 3]), ...],  # 12 monthly arrays
...         2021: [np.array([1.1, 1.9]), np.array([2.1, 3.1]), ...]
...     },
...     'satellite_data': {
...         2020: [np.array([1, 2]), np.array([2, 2.9]), ...],
...         2021: [np.array([1.0, 2.0]), np.array([2.0, 3.0]), ...]
...     }
... }
>>> monthly_nse(data)
[0.95, 0.89, ..., 0.92]  # Example output, one value per month

Hydrological_model_validator.Processing.Efficiency_metrics.monthly_nse_j(dictionary: Dict[str, Dict[int, List[ndarray | list]]], j: float = 1) → List[float][source]#

Compute monthly modified Nash–Sutcliffe Efficiency (E_j) for arbitrary exponent j from paired model and satellite data.

This generalizes the NSE by raising absolute differences to the power j, allowing flexible emphasis on deviations.

Parameters:

dictionary (dict) – Dictionary with keys containing ‘mod’ and ‘sat’ for model and satellite data. Each key maps to a dict of years, each year a list/array of 12 monthly arrays.
j (float, optional) – Exponent for the absolute difference (default is 1).

Returns:

Modified NSE values for each month (length 12).

Return type:

list of float

Raises:

KeyError – If model or satellite keys are missing.

Notes

The function calculates E_j = 1 - sum(|obs - pred|^j) / sum(|obs - mean(obs)|^j) for each month.
Requires at least two valid paired values per month, else returns np.nan.
Higher values of j increase sensitivity to large errors.

Examples

>>> data = {
...     'model': {2020: [np.array([1, 2]), ..., np.array([11, 12])]},
...     'satellite': {2020: [np.array([1.1, 2.1]), ..., np.array([10.9, 11.8])]}
... }
>>> monthly_nse_j(data, j=2)
[0.90, 0.85, ..., 0.88]  # example output (depends on data)

Hydrological_model_validator.Processing.Efficiency_metrics.monthly_r_squared(data_dict: Dict[str, Dict[int, List[ndarray | List[float]]]]) → List[float][source]#

Compute monthly R² values between model and satellite datasets over multiple years.

Parameters:

data_dict (dict) –

Dictionary with structure: {

’BASSTmod’: {year1: [12 arrays], year2: […], …}, ‘BASSTsat’: {year1: [12 arrays], year2: […], …}

} Keys should contain ‘mod’ for model and ‘sat’ for satellite data. Each value is a dictionary mapping years to lists of 12 monthly 2D arrays.

Returns:

List of 12 R² values (one for each month from January to December).

Return type:

list of float

Notes

This function concatenates data across all years for each month and calculates the R² between the model and satellite data for that month.
NaN values are excluded from the computation.
If no valid data exists for a month, the R² for that month is set to np.nan.

Examples

>>> mod_data = {
...     2000: [np.array([[1, 2], [3, 4]]) for _ in range(12)],
...     2001: [np.array([[2, 3], [4, 5]]) for _ in range(12)]
... }
>>> sat_data = {
...     2000: [np.array([[1, 2.1], [3.1, 4]]) for _ in range(12)],
...     2001: [np.array([[2.2, 3], [4.1, 5.1]]) for _ in range(12)]
... }
>>> data_dict = {'BASSTmod': mod_data, 'BASSTsat': sat_data}
>>> monthly_r_squared(data_dict)
[0.999..., 0.999..., ..., 0.999...]  # 12 values

Hydrological_model_validator.Processing.Efficiency_metrics.monthly_relative_index_of_agreement(dictionary: Dict[str, Dict[int, List[ndarray | list]]]) → List[float][source]#

Compute the Relative Index of Agreement (d_rel) for each calendar month by aggregating paired observed (satellite) and predicted (model) data across multiple years.

This metric assesses proportional agreement between observations and predictions on a monthly basis, by evaluating relative errors normalized by observations. It is sensitive to proportional differences rather than absolute errors.

Parameters:

dictionary (dict) –

Dictionary containing paired model and satellite monthly data with keys containing ‘mod’ and ‘sat’ respectively. Each key maps to a dictionary of years, where each year contains a list or array of 12 elements representing monthly data: {

’mod…’: {year1: [month_0_data, …, month_11_data], year2: […], …}, ‘sat…’: {year1: [month_0_data, …, month_11_data], year2: […], …}

}

Returns:

List of 12 Relative Index of Agreement values, one for each month (January=0,…,December=11). Returns np.nan for months with insufficient or invalid data.

Return type:

list of float

Notes

Observations (satellite data) with zero values are excluded to avoid division by zero errors.
Requires at least two valid paired observations per month.
Returns np.nan if the observations have zero variance (all equal) or denominator is zero.
The metric ranges typically between 0 and 1, with values closer to 1 indicating better agreement.

Examples

>>> dictionary = {
...     'mod_data': {
...         2020: [np.array([1,2]), np.array([3,4]), ...],  # 12 months of data per year
...         2021: [np.array([2,3]), np.array([4,5]), ...]
...     },
...     'sat_data': {
...         2020: [np.array([1.1,1.9]), np.array([2.9,4.1]), ...],
...         2021: [np.array([2.1,2.8]), np.array([3.8,5.2]), ...]
...     }
... }
>>> monthly_relative_index_of_agreement(dictionary)
[0.95, 0.91, ..., 0.89]  # example output list with 12 values

Hydrological_model_validator.Processing.Efficiency_metrics.monthly_relative_nse(dictionary: Dict[str, Dict[int, List[ndarray | list]]]) → List[float][source]#

Compute monthly Relative Nash–Sutcliffe Efficiency (Relative NSE) from paired model and satellite data.

This function calculates the Relative NSE metric for each calendar month by aggregating data across all available years. It compares relative deviations normalized by observations, emphasizing proportional accuracy of model predictions compared to satellite observations.

Parameters:: dictionary (dict) – Dictionary containing keys with ‘mod’ and ‘sat’ identifying model and satellite datasets. Each key maps to a dict of years, with each year containing a list or array of 12 monthly data arrays.
Returns:: Relative NSE values computed for each month (length 12). Returns np.nan for months with insufficient or invalid data.
Return type:: list of float
Raises:: KeyError – If either model (‘mod’) or satellite (‘sat’) keys are missing in the dictionary.

Notes

Observations with zero values are excluded to avoid division by zero.
Requires at least two valid paired observations per month to compute the metric.
Relative NSE close to 1 indicates good proportional agreement between model and observations.
Months with zero variance in relative observations or insufficient data return np.nan.

Examples

>>> dictionary = {
...     'mod': {
...         2020: [np.array([10, 15]), np.array([20, 25]), ..., np.array([30, 35])],  # 12 arrays for each month
...         2021: [np.array([12, 16]), np.array([22, 26]), ..., np.array([32, 37])]
...     },
...     'sat': {
...         2020: [np.array([9, 14]), np.array([19, 24]), ..., np.array([29, 34])],
...         2021: [np.array([11, 15]), np.array([21, 25]), ..., np.array([31, 36])]
...     }
... }
>>> monthly_relative_nse(dictionary)
[0.95, 0.97, ..., 0.93]  # example output for each month

Hydrological_model_validator.Processing.Efficiency_metrics.monthly_weighted_r_squared(dictionary: Dict[str, Dict[int, List[ndarray | List[float]]]]) → List[float][source]#

Compute weighted coefficient of determination (weighted R²) for each calendar month across multiple years, using paired model and satellite datasets.

The weighting adjusts the R² based on the slope of the linear regression between predicted (model) and observed (satellite) values, emphasizing months where the relationship is closer to a 1:1 correspondence.

Parameters:: dictionary (dict) – Dictionary containing keys with substrings ‘mod’ and ‘sat’ representing model and satellite data, respectively. Each key maps to a dictionary of years, where each year corresponds to a list or array of 12 monthly arrays of data points.
Returns:: List of 12 weighted R² values, each representing one calendar month (January to December).
Return type:: list of float
Raises:: KeyError – Raised if no keys containing ‘mod’ or ‘sat’ are found in the input dictionary.

Notes

For each month, data from all years are concatenated to form a single paired dataset of model and satellite values.
NaN values in either dataset are excluded from the calculations.
The weighted R² combines the classical coefficient of determination with a weighting factor derived from the slope of the regression line between satellite (observed) and model (predicted) data.
This method penalizes cases where the prediction slope deviates significantly from unity, even if correlation is high.

Examples

>>> data = {
...     'model': {
...         2000: [np.array([1, 2]), np.array([3, 4])] * 6,
...         2001: [np.array([2, 3]), np.array([4, 5])] * 6,
...     },
...     'satellite': {
...         2000: [np.array([1.1, 2.1]), np.array([2.9, 4.1])] * 6,
...         2001: [np.array([2.2, 2.9]), np.array([3.8, 5.2])] * 6,
...     }
... }
>>> monthly_weighted_r_squared(data)
[0.95, 0.92, ..., 0.90]

Hydrological_model_validator.Processing.Efficiency_metrics.nse(obs: ndarray | Sequence[float], pred: ndarray | Sequence[float]) → float[source]#

Compute Nash–Sutcliffe Efficiency (NSE) between observed and predicted data.

NSE is a normalized statistic that determines the relative magnitude of the residual variance (“noise”) compared to the variance of the observed data (“signal”). It is widely used to assess the predictive skill of hydrological models.

Parameters:

obs (array-like) – Observed values.
pred (array-like) – Predicted values.

Returns:

NSE value, ranging from -∞ to 1: - NSE = 1 indicates a perfect match between observed and predicted data. - NSE = 0 indicates that the model predictions are as accurate as the mean of the observations. - NSE < 0 indicates that the observed mean is a better predictor than the model. Returns np.nan if insufficient valid data or if variance of observed data is zero.

Return type:

float

Notes

The function ignores pairs where either observation or prediction is NaN.
At least two valid data points are required to compute NSE.
NSE is sensitive to extreme values and assumes that observations are error-free.

Examples

>>> obs = np.array([3, -0.5, 2, 7])
>>> pred = np.array([2.5, 0.0, 2, 8])
>>> nse(obs, pred)
0.8571428571428571

Hydrological_model_validator.Processing.Efficiency_metrics.nse_j(obs: Sequence[float] | ndarray, pred: Sequence[float] | ndarray, j: float = 1) → float[source]#

Compute modified Nash–Sutcliffe Efficiency (E_j) for an arbitrary exponent j.

This generalizes the Nash–Sutcliffe Efficiency by raising the absolute differences between observed and predicted values to the power j, allowing flexible weighting of deviations.

Parameters:

obs (array-like) – Observed values.
pred (array-like) – Predicted values.
j (float, optional) – Exponent applied to absolute differences (default is 1).

Returns:

Modified NSE value (E_j), or np.nan if insufficient valid data or zero denominator.

Return type:

float

Notes

The modified NSE is defined as: E_j = 1 - (sum(|obs - pred|^j) / sum(|obs - mean(obs)|^j))
Requires at least two paired valid values.
If the denominator is zero (no variability in obs), returns np.nan.
Increasing j increases sensitivity to larger errors.

Examples

>>> obs = np.array([1, 2, 3, 4])
>>> pred = np.array([1.1, 1.9, 2.8, 4.2])
>>> nse_j(obs, pred, j=2)
0.95  # Example value, actual depends on data

Hydrological_model_validator.Processing.Efficiency_metrics.r_squared(obs: ndarray | Sequence[float], pred: ndarray | Sequence[float]) → float[source]#

Calculate the coefficient of determination (r²) between observed and predicted data.

Parameters:

obs (np.ndarray) – Array of observed values.
pred (np.ndarray) – Array of predicted values.

Returns:

The coefficient of determination (r²), which quantifies how well predictions approximate the observed values. Returns np.nan if fewer than 2 valid (non-NaN) pairs.

Return type:

float

Notes

NaN values in either input array are ignored.
r² is the square of the Pearson correlation coefficient between obs and pred.
r² ranges from 0 (no correlation) to 1 (perfect linear correlation).

Examples

>>> import numpy as np
>>> obs = np.array([3.0, 4.5, 5.2, np.nan, 6.1])
>>> pred = np.array([2.8, 4.7, 5.0, 5.9, np.nan])
>>> r_squared(obs, pred)
0.9911...  # Very high correlation with missing data ignored

>>> obs = np.array([1.0, 2.0, 3.0])
>>> pred = np.array([1.1, 1.9, 3.1])
>>> r_squared(obs, pred)
0.9983...

>>> obs = np.array([np.nan, np.nan])
>>> pred = np.array([1.0, 2.0])
>>> r_squared(obs, pred)
nan

Hydrological_model_validator.Processing.Efficiency_metrics.relative_index_of_agreement(obs: Sequence[float] | ndarray, pred: Sequence[float] | ndarray) → float[source]#

Compute the Relative Index of Agreement (d_rel) between observed and predicted values.

This metric assesses the agreement between predictions and observations by evaluating relative errors normalized by the observations, making it sensitive to proportional differences.

Parameters:

obs (array-like) – Observed values.
pred (array-like) – Predicted values.

Returns:

Relative Index of Agreement value ranging typically between 0 and 1, where values closer to 1 indicate better agreement. Returns np.nan if the calculation is invalid due to insufficient data, zero variance, or division by zero.

Return type:

float

Notes

Observations with zero values are excluded to avoid division by zero errors.
Requires at least two valid paired observations.
Returns np.nan if the observations have zero variance (all equal).
Sensitive to relative rather than absolute errors.

Examples

>>> obs = np.array([10, 20, 30, 40])
>>> pred = np.array([11, 19, 28, 39])
>>> relative_index_of_agreement(obs, pred)
0.92  # example output

Hydrological_model_validator.Processing.Efficiency_metrics.relative_nse(obs: Sequence[float] | ndarray, pred: Sequence[float] | ndarray) → float[source]#

Compute the Relative Nash–Sutcliffe Efficiency (Relative NSE) between observations and predictions.

This metric evaluates model performance by comparing relative deviations (normalized by observations) rather than absolute deviations, making it sensitive to proportional errors.

Parameters:

obs (array-like) – Observed values.
pred (array-like) – Predicted values.

Returns:

Relative NSE value, or np.nan if insufficient data, division by zero, or invalid calculation occurs.

Return type:

float

Notes

Observations with zero values are excluded to avoid division by zero.
Requires at least two valid paired observations.
Relative NSE close to 1 indicates good model performance on relative scale.
A small denominator (zero variance in relative observations) returns np.nan.

Examples

>>> obs = np.array([10, 20, 30, 40])
>>> pred = np.array([11, 18, 33, 39])
>>> relative_nse(obs, pred)
0.95  # example output

Hydrological_model_validator.Processing.Efficiency_metrics.weighted_r_squared(obs: ndarray | list, pred: ndarray | list) → float[source]#

Compute weighted coefficient of determination (weighted R²) between observed and predicted data.

The weighting accounts for the slope of the regression line between predicted and observed values, emphasizing cases where the slope is close to 1 (ideal 1:1 relation).

Parameters:

obs (array-like) – Observed values.
pred (array-like) – Predicted values.

Returns:

Weighted R² value or np.nan if insufficient data.

Return type:

float

Notes

The function first computes the standard R² between obs and pred.
It then fits a linear regression line pred = slope * obs + intercept.
The absolute value of the slope is used to weight the R²: - If slope is close to 1, weight ≈ 1 (no change). - If slope deviates far from 1, weight is reduced, clipped between 0.1 and 1.
This penalizes cases where prediction trends differ substantially from observations, even if correlation is high.
A small floor of 0.1 prevents the weighted R² from becoming zero or negligible when the slope is near zero.

Examples

>>> obs = np.array([1, 2, 3, 4, 5])
>>> pred = np.array([1.1, 2.1, 2.9, 4.2, 4.8])
>>> weighted_r_squared(obs, pred)
0.98  # example output (actual value depends on data)

Efficiency_metrics

Contents

Efficiency_metrics#