utils#

Hydrological_model_validator.Processing.utils.build_bfm_filename(year: int, filename_fragments: Dict[str, str]) str[source]#

Construct BFM filename with given year and fragments.

Hydrological_model_validator.Processing.utils.check_numeric_data(data_dict: Dict[str, Dict[int, List[ndarray]]]) None[source]#

Validate that all monthly data arrays in the input dictionary contain numeric data and follow the expected structure (12 numpy arrays per year per key).

This function checks that the data dictionary for ‘model’ and ‘satellite’ contains, for each year, exactly 12 monthly numpy arrays. It raises errors if any array is non-numeric or if the structure is invalid.

Parameters:

data_dict (dict) – Dictionary with keys such as ‘model’ and ‘satellite’, each mapping to a dictionary where keys are years (int) and values are lists of 12 numpy arrays (one per month).

Return type:

None

Raises:

ValueError – If data for any year and key is not a list or tuple of length 12. If any monthly numpy array contains non-numeric data.

Example

>>> data = {
...     'model': {2000: [np.array([1.0, 2.0])] + [np.array([])] * 11},
...     'satellite': {2000: [np.array([1.1, 2.1])] + [np.array([])] * 11}
... }
>>> check_numeric_data(data)  # passes silently if valid
Hydrological_model_validator.Processing.utils.convert_dataarrays_in_df(df: DataFrame) DataFrame[source]#

Converts any xarray.DataArray objects inside the DataFrame cells into dictionaries with dims, coords, and data, preserving other cells.

Parameters:

df – pandas DataFrame possibly containing xarray.DataArray in cells.

Returns:

A new DataFrame where DataArray cells are converted to dicts.

Hydrological_model_validator.Processing.utils.extract_options(user_kwargs: Dict[str, Any], default_dict: Dict[str, Any], prefix: str = '') Dict[str, Any][source]#

Extract options from user_kwargs by overriding values in default_dict for keys optionally prefixed by prefix.

Parameters:
  • user_kwargs (dict) – Dictionary of user-supplied keyword arguments.

  • default_dict (dict) – Dictionary of default options to be updated.

  • prefix (str, optional) – Prefix to prepend to keys when looking them up in user_kwargs (default is “”).

Returns:

New dictionary with updated options from user_kwargs. For each key in default_dict, the function first checks if prefix + key exists in user_kwargs and uses that value; otherwise, it checks for the key without prefix.

Return type:

dict

Raises:

ValueError – If inputs are not dictionaries or prefix is not a string.

Examples

>>> defaults = {'color': 'blue', 'linewidth': 2}
>>> user_args = {'plot_color': 'red', 'linewidth': 3}
>>> extract_options(user_args, defaults, prefix='plot_')
{'color': 'red', 'linewidth': 3}
>>> extract_options(user_args, defaults)
{'color': 'blue', 'linewidth': 3}
Hydrological_model_validator.Processing.utils.find_key(dictionary: Dict[Any, Any], possible_keys: Iterable[str]) str | None[source]#

Find the first key in a dictionary containing any of the substrings in possible_keys (case insensitive).

Parameters:
  • dictionary (dict) – Dictionary to search keys in.

  • possible_keys (iterable of str) – Iterable of substrings to look for in the dictionary keys.

Returns:

The first matching key found that contains any substring from possible_keys (case insensitive), or None if no key matches.

Return type:

Optional[str]

Raises:

ValueError – If dictionary is not a dict or possible_keys is not an iterable of strings.

Examples

>>> d = {'Temperature': 23, 'Salinity': 35}
>>> find_key(d, ['temp', 'sal'])
'Temperature'
>>> find_key(d, ['sal'])
'Salinity'
>>> find_key(d, ['pressure'])
None
Hydrological_model_validator.Processing.utils.find_key_variable(nc_vars: Iterable[str], candidates: List[str]) str[source]#

Return the first variable name found in nc_vars from candidates list, or raise KeyError if none found.

Parameters:
  • nc_vars (iterable) – Collection of variable names available (e.g., keys of a NetCDF dataset).

  • candidates (list) – List of candidate variable names to search for.

Returns:

The first variable name found in nc_vars from the candidates list.

Return type:

str

Raises:

KeyError – If none of the candidate variable names are found in nc_vars.

Example

>>> vars_available = ['temp', 'salinity', 'depth']
>>> candidates = ['chlorophyll', 'salinity', 'temperature']
>>> find_key_variable(vars_available, candidates)
'salinity'
Hydrological_model_validator.Processing.utils.hal_threshold(slice_data: ndarray, mask_shallow: ndarray, mask_deep: ndarray) ndarray[source]#

Compute invalid mask for salinity based on shallow and deep thresholds.

Parameters:
  • slice_data (np.ndarray) – 3D array of salinity data (Y, X).

  • mask_shallow (np.ndarray) – Boolean mask where True corresponds to shallow depths.

  • mask_deep (np.ndarray) – Boolean mask where True corresponds to deep depths.

Returns:

Boolean mask of invalid salinity points.

Return type:

np.ndarray

Example

>>> import numpy as np
>>> salinity = np.array([[26, 41], [37, 39]])
>>> shallow_mask = np.array([[True, True], [False, False]])
>>> deep_mask = np.array([[False, False], [True, True]])
>>> hal_threshold(salinity, shallow_mask, deep_mask)
array([[False,  True],
       [False, False]])
Hydrological_model_validator.Processing.utils.infer_years_from_path(directory: str | Path, *, target_type: str = 'file', pattern: str = '_(\\d{4})\\.nc$', debug: bool = False) Tuple[int, int, List[int]][source]#

Infer available years from directory content by matching a regex pattern on file or folder names.

Parameters:
  • directory (str or Path) – Directory path to scan.

  • target_type (str, optional) – Type of items to scan in the directory: “file” or “folder”. Default is “file”.

  • pattern (str, optional) – Regex pattern to extract year as a capturing group (e.g. r’_(d{4}).nc$’ or r’outputs*(d{4})’). The year must be captured in the first group.

  • debug (bool, optional) – If True, prints debug info.

Returns:

  • Ybeg (int) – Earliest year found.

  • Yend (int) – Latest year found.

  • ysec (List[int]) – List of all years from Ybeg to Yend inclusive.

Raises:

ValueError – If no matching years are found or directory does not exist.

Hydrological_model_validator.Processing.utils.temp_threshold(slice_data: ndarray, mask_shallow: ndarray, mask_deep: ndarray) ndarray[source]#

Compute invalid mask for temperature based on shallow and deep thresholds.

Parameters:
  • slice_data (np.ndarray) – 3D array of temperature data (Y, X).

  • mask_shallow (np.ndarray) – Boolean mask where True corresponds to shallow depths.

  • mask_deep (np.ndarray) – Boolean mask where True corresponds to deep depths.

Returns:

Boolean mask of invalid temperature points.

Return type:

np.ndarray

Example

>>> import numpy as np
>>> temp = np.array([[10, 36], [7, 9]])
>>> shallow_mask = np.array([[True, True], [False, False]])
>>> deep_mask = np.array([[False, False], [True, True]])
>>> temp_threshold(temp, shallow_mask, deep_mask)
array([[False,  True],
       [ True, False]])