BFM_Data_Reader#

Hydrological_model_validator.Processing.BFM_data_reader.extract_and_filter_benthic_data(data_4d: ndarray, Bmost: ndarray, dz: float = 2.0, variable_key: str | None = None) ndarray[source]#

Extract bottom layer data from a 4D array and apply depth-based threshold filtering.

This function extracts data from the bottom-most layer indicated by Bmost indices from a 4D array data_4d with dimensions (time, depth, Y, X). It then applies filtering based on depth-dependent thresholds for specific variables (‘votemper’ for temperature, ‘vosaline’ for salinity). Invalid data points outside the thresholds are set to NaN.

Parameters:
  • data_4d (np.ndarray) – 4D numpy array with shape (time, depth, Y, X), containing the variable data over time, vertical layers, and spatial grid.

  • Bmost (np.ndarray) – 2D array with shape (Y, X) of 1-based indices indicating the bottom layer depth at each spatial position.

  • dz (float, optional) – Thickness of each vertical layer in meters. Default is 2.0.

  • variable_key (str, optional) – Variable name to apply threshold filtering. Supported values are ‘votemper’ (temperature) and ‘vosaline’ (salinity). If None or unsupported, no filtering is applied.

Returns:

3D numpy array with shape (time, Y, X) containing the extracted bottom layer data, filtered by the specified depth-dependent thresholds.

Return type:

np.ndarray

Example

>>> import numpy as np
>>> data_4d = np.random.rand(12, 10, 5, 5)  # 12 time steps, 10 depth layers, 5x5 spatial grid
>>> Bmost = np.array([[10, 9, 8, 10, 7],
...                   [10, 10, 9, 8, 7],
...                   [9, 10, 10, 9, 8],
...                   [8, 9, 10, 10, 9],
...                   [7, 8, 9, 10, 10]])  # bottom layer indices (1-based)
>>> benthic_filtered = extract_and_filter_benthic_data(data_4d, Bmost, dz=2.0, variable_key='votemper')
>>> print(benthic_filtered.shape)
(12, 5, 5)
Hydrological_model_validator.Processing.BFM_data_reader.extract_bottom_layer(data: ndarray, Bmost: ndarray) list[ndarray][source]#

Extract the bottom layer data from a 4D array using provided bottom layer indices.

This function extracts the bottom-most layer values for each spatial location (y, x) from a 4D data array with dimensions (time, depth, y, x). The bottom layer is specified by the 1-based indices provided in Bmost. The function returns a list of 2D arrays, each corresponding to a time slice, containing the bottom layer data.

Parameters:
  • data (np.ndarray) – 4D numpy array with shape (time, depth, y, x), representing data collected over time, vertical layers, and spatial grid.

  • Bmost (np.ndarray) – 2D array of shape (y, x) containing 1-based indices indicating the bottom layer depth for each spatial location. Indices are converted internally to zero-based.

Returns:

A list of 2D numpy arrays, each of shape (y, x), where each array corresponds to a time slice containing the extracted bottom layer data.

Return type:

list of np.ndarray

Raises:
  • TypeError – If input arrays are not numpy ndarrays.

  • ValueError – If input arrays do not have the expected dimensions or shapes, or if indices are invalid.

Example

>>> import numpy as np
>>> data = np.random.rand(3, 4, 2, 2)  # time=3, depth=4, y=2, x=2
>>> Bmost = np.array([[4, 3],
...                   [2, 1]])  # bottom indices for each (y,x)
>>> bottom_layers = extract_bottom_layer(data, Bmost)
>>> print(len(bottom_layers))
3
>>> print(bottom_layers[0].shape)
(2, 2)
Hydrological_model_validator.Processing.BFM_data_reader.process_year(year: int, IDIR: str | Path, mask3d: ndarray, Bmost: ndarray, filename_fragments: Dict[str, str], variable_key: str) Tuple[int, ndarray][source]#

Process benthic parameter data for a single year by reading, masking, extracting, and filtering bottom layer data from model output files.

This function locates the compressed model data file for the specified year, decompresses it, loads the specified variable’s data, applies a 3D mask to filter out invalid points, and extracts the bottom layer benthic parameter values using depth indices (Bmost). It then applies variable-specific filtering to ensure data quality.

Parameters:
  • year (int) – The year of the dataset to process.

  • IDIR (str or Path) – Base directory path where the model output files are stored.

  • mask3d (np.ndarray) – A 3D mask array of shape (depth, Y, X) where zeros indicate invalid data points to be masked out (set as NaN).

  • Bmost (np.ndarray) – A 2D array of shape (Y, X) containing 1-based indices that indicate the bottom vertical layer for each spatial point.

  • filename_fragments (dict) – Dictionary containing filename fragments with keys ‘ffrag1’, ‘ffrag2’, and ‘ffrag3’, used to construct the filename of the model output.

  • variable_key (str) – The name of the variable to extract from the dataset (e.g., ‘votemper’, ‘vosaline’).

Returns:

A tuple containing: - The processed year (int). - A 3D numpy array of shape (time, Y, X) with the extracted and filtered bottom

layer parameter data.

Return type:

Tuple[int, np.ndarray]

Raises:
  • FileNotFoundError – If the compressed model output file for the given year does not exist.

  • KeyError – If the specified variable_key is not found in the dataset.

  • ValueError – If the spatial dimensions of the data do not match those of the provided mask.

Example

>>> year = 2005
>>> base_dir = "/data/model_output"
>>> mask = np.ones((10, 20, 30))  # example mask with all valid points
>>> Bmost_indices = np.full((20, 30), 10)  # bottom layer is the 10th depth for all
>>> fragments = {'ffrag1': 'model', 'ffrag2': 'output', 'ffrag3': 'nc'}
>>> variable = 'votemper'
>>> yr, benthic_arr = process_year(year, base_dir, mask, Bmost_indices, fragments, variable)
>>> print(yr)
2005
>>> print(benthic_arr.shape)
(time_steps, 20, 30)
Hydrological_model_validator.Processing.BFM_data_reader.read_benthic_parameter(IDIR: str | Path, mask3d: ndarray, Bmost: ndarray, filename_fragments: Dict[str, str], variable_key: str) Dict[int, List[ndarray]][source]#

Reads benthic parameter data (e.g., temperature, salinity) from monthly averaged compressed NetCDF files over all available years in a specified directory.

The function scans the given directory for yearly folders matching a pattern, then concurrently processes each year’s data by applying spatial and depth masks, extracting bottom layer values, and filtering the data based on the variable’s quality thresholds. The results are collected as a dictionary mapping each year to a list of 12 monthly 2D arrays representing the benthic parameter.

Parameters:
  • IDIR (str or Path) – Base directory containing the MODEL output data folders.

  • mask3d (np.ndarray) – 3D mask array with shape (depth, Y, X), where 0 indicates invalid points that should be masked out (set to NaN).

  • Bmost (np.ndarray) – 2D array with shape (Y, X) containing 1-based indices of the bottom vertical layer at each spatial location.

  • filename_fragments (dict) – Dictionary with keys ‘ffrag1’, ‘ffrag2’, ‘ffrag3’ containing parts of the filename used to locate the NetCDF files.

  • variable_key (str) – Key name of the variable to extract from the dataset, such as ‘votemper’ or ‘vosaline’.

Returns:

Dictionary mapping each processed year (int) to a list of 12 numpy 2D arrays (Y, X) representing monthly bottom parameter values.

Return type:

Dict[int, List[np.ndarray]]

Raises:
  • ValueError – If any required filename fragment is missing, or if inputs have incorrect types or dimensions.

  • FileNotFoundError – If the specified data directory does not exist.

Example

>>> base_dir = "/data/model_output"
>>> mask = np.ones((10, 50, 60))  # mask with all valid points
>>> Bmost_indices = np.full((50, 60), 10)  # bottom layer index = 10 everywhere
>>> fragments = {'ffrag1': 'model', 'ffrag2': 'output', 'ffrag3': 'nc'}
>>> variable = 'votemper'
>>> data_by_year = read_benthic_parameter(base_dir, mask, Bmost_indices, fragments, variable)
>>> for year, monthly_data in data_by_year.items():
...     print(f"Year {year} has {len(monthly_data)} months of data, shape {monthly_data[0].shape}")
Hydrological_model_validator.Processing.BFM_data_reader.read_bfm_chemical(IDIR: str | Path, mask3d: ndarray, Bmost: ndarray, filename_fragments: Dict[str, str], variable_key: str) Dict[int, List[ndarray]][source]#

Reads BFM NetCDF model output for a specified chemical variable over multiple years, applies a spatial mask, extracts bottom layer values, and returns the data organized by year.

This function scans the given base directory for yearly output folders matching a pattern, constructs filenames from provided fragments, and processes each year’s compressed NetCDF files. For each year, it unzips the file (if needed), reads the specified variable, applies a 3D mask to invalidate certain points, extracts the bottom-most valid layer based on Bmost indices, and collects the monthly or time-step data as 2D arrays.

Parameters:
  • IDIR (str or Path) – Base directory path where BFM model output folders are located.

  • mask3d (np.ndarray) – 3D numpy array (depth, Y, X) used as a mask; elements with zero are masked (set to NaN).

  • Bmost (np.ndarray) – 2D numpy array (Y, X) containing 1-based indices indicating the bottom-most valid layer per grid cell.

  • filename_fragments (dict) – Dictionary with keys such as ‘ffrag1’, ‘ffrag2’, ‘ffrag3’ used to construct filenames for the NetCDF files.

  • variable_key (str) – Name of the chemical variable to extract from the NetCDF files (e.g., ‘O2’, ‘NO3’).

Returns:

Dictionary where each key is a year (int) and each value is a list of 2D numpy arrays representing the extracted bottom layer chemical parameter for each time step (e.g., months).

Return type:

dict[int, list[np.ndarray]]

Notes

  • The function will unzip compressed .gz files if uncompressed NetCDF files are not already present.

  • Unzipped files are deleted after reading to conserve disk space.

  • Processing is done concurrently across years using a thread pool for speed.

Example

>>> base_dir = "/path/to/bfm/output"
>>> mask = np.ones((10, 50, 60))
>>> Bmost_indices = np.full((50, 60), 10)
>>> fragments = {'ffrag1': 'chem', 'ffrag2': 'monthly', 'ffrag3': 'nc'}
>>> chemical_var = 'O2'
>>> data_by_year = read_bfm_chemical(base_dir, mask, Bmost_indices, fragments, chemical_var)
>>> for year, monthly_layers in data_by_year.items():
...     print(f"Year {year} has {len(monthly_layers)} time slices, shape {monthly_layers[0].shape}")