file_IO#

Hydrological_model_validator.Processing.file_io.call_interpolator(varname: str, data_level: int, input_dir: str | PathLike, output_dir: str | PathLike, mask_file: str | PathLike) None[source]#

Call the MATLAB interpolation function Interpolator_v2 via the MATLAB Engine API for Python.

This function starts a MATLAB session, adds the directory containing the MATLAB function to the MATLAB path, and calls Interpolator_v2 with the provided arguments.

Parameters:
  • varname (str) – Variable name to interpolate (e.g., ‘temperature’).

  • data_level (int) – Data level identifier (such as depth or layer index).

  • input_dir (str or os.PathLike) – Path to the directory containing input data files.

  • output_dir (str or os.PathLike) – Path to the directory where output files will be saved.

  • mask_file (str or os.PathLike) – Path to the mask file required by the interpolation function.

Raises:

RuntimeError – If the MATLAB engine fails to start or if the interpolation function raises an error.

Example

>>> call_interpolator(
        varname='sst',
        data_level=0,
        input_dir='/data/input',
        output_dir='/data/output',
        mask_file='/data/mask/mesh_mask.nc'
    )
Hydrological_model_validator.Processing.file_io.find_file_with_keywords(files: List[Any], keywords: List[str], description: str) Any[source]#

Search for a file among a list whose name contains any of the specified keywords.

This function filters a list of file-like objects to find those whose filenames include any of the provided keywords (case-insensitive). If exactly one match is found, it is returned. If multiple matches are found, the user is prompted to select the exact filename. If no matches are found, the user is prompted to input the filename manually.

Parameters:
  • files (list of file-like objects) – List of objects with a .name attribute representing the filename.

  • keywords (list of str) – List of keywords to search for in filenames (case-insensitive).

  • description (str) – Description of the file purpose, used in prompts to the user.

Returns:

The selected file object matching the keywords or user input.

Return type:

file-like object

Raises:

FileNotFoundError – Raised if the user-input filename does not exist among the candidate files or in the folder.

Example

>>> files = [Path("data_obs.nc"), Path("data_sim.nc"), Path("readme.txt")]
>>> keywords = ["obs", "observed"]
>>> selected_file = find_file_with_keywords(files, keywords, "observed data")
>>> print(selected_file.name)
data_obs.nc
Hydrological_model_validator.Processing.file_io.load_dataset(year: int | str, IDIR: str | Path) Tuple[int | str, Dataset | None][source]#

Load a NetCDF dataset file for a specified year from a given directory.

Parameters:
  • year (int or str) – Year identifier used to construct the filename ‘Msst_{year}.nc’.

  • IDIR (str or Path) – Path to the directory containing the dataset files.

Returns:

A tuple containing: - year (int or str): The input year identifier. - ds (xarray.Dataset or None): The loaded dataset if the file exists and opens successfully,

otherwise None.

Return type:

tuple

Raises:

ValueError – If the input directory does not exist or is not a directory.

Notes

  • Prints messages indicating progress or warnings.

  • Catches exceptions during dataset loading and returns None if loading fails.

Example

>>> year, dataset = load_dataset(2020, "/data/sea_surface_temp")
>>> if dataset is not None:
...     print(dataset)
... else:
...     print("Dataset not found or failed to load.")
Hydrological_model_validator.Processing.file_io.mask_reader(BaseDIR: str | Path) Tuple[ndarray, Tuple[ndarray, ...], Tuple[ndarray, ...], ndarray, ndarray][source]#

Load the land-sea mask and associated latitude/longitude fields from a NEMO ‘mesh_mask.nc’ file.

This function extracts: - A 2D land-sea surface mask (0 = land, 1 = ocean), - Indices of land grid points in 2D and 3D masks, - Latitude and longitude arrays on the model grid.

Parameters:

BaseDIR (str or Path) – Path to the directory containing the ‘mesh_mask.nc’ NetCDF file.

Returns:

A tuple containing: - Mmask (np.ndarray): 2D surface land-sea mask array with shape (y, x). - Mfsm (tuple of np.ndarray): Tuple of indices (y, x) where surface mask equals 0 (land). - Mfsm_3d (tuple of np.ndarray): Tuple of indices (z, y, x) where full 3D mask equals 0 (land). - Mlat (np.ndarray): Latitude array with same shape as Mmask. - Mlon (np.ndarray): Longitude array with same shape as Mmask.

Return type:

tuple

Hydrological_model_validator.Processing.file_io.read_nc_variable_from_gz_in_memory(file_gz: Path, variable_key: str) ndarray[source]#

Read a variable directly from a gzipped NetCDF (.nc.gz) file in memory using xarray.

This function decompresses the gzipped file in memory without writing to disk, then opens the NetCDF dataset and extracts the requested variable.

Parameters:
  • file_gz (Path) – Path to the gzipped NetCDF (.nc.gz) file.

  • variable_key (str) – Name of the variable to extract from the dataset.

Returns:

Numpy array containing the data of the requested variable.

Return type:

np.ndarray

Raises:
  • FileNotFoundError – If the gzipped file does not exist.

  • KeyError – If the specified variable_key is not found in the dataset.

Example

>>> from pathlib import Path
>>> data = read_nc_variable_from_gz_in_memory(Path("data/sample.nc.gz"), "temperature")
>>> print(data.shape)
(50, 100)
Hydrological_model_validator.Processing.file_io.read_nc_variable_from_unzipped_file(file_nc: Path, variable_key: str) ndarray[source]#

Read a variable array from a NetCDF (.nc) file.

Parameters:
  • file_nc (Path) – Path to the NetCDF (.nc) file.

  • variable_key (str) – Name of the variable to extract from the file.

Returns:

The data array corresponding to the specified variable.

Return type:

np.ndarray

Raises:
  • FileNotFoundError – If the NetCDF file does not exist at the given path.

  • KeyError – If the specified variable_key is not found in the NetCDF file.

Example

>>> from pathlib import Path
>>> data = read_nc_variable_from_unzipped_file(Path("data/sample.nc"), "temperature")
>>> print(data.shape)
(50, 100)
Hydrological_model_validator.Processing.file_io.select_3d_variable(ds: Dataset, label: str) DataArray[source]#

Select a 3D variable from an xarray Dataset, prompting the user if multiple candidates exist.

This function searches for variables within the given Dataset that have exactly three dimensions. If none are found, it raises an error. If multiple 3D variables are found, it prompts the user to select one by displaying the variable names and shapes. If only one 3D variable exists, it is selected automatically.

Parameters:
  • ds (xr.Dataset) – The xarray Dataset containing the variables to search.

  • label (str) – A descriptive label for the Dataset, used in prompts and error messages.

Returns:

The selected 3D variable as an xarray DataArray.

Return type:

xr.DataArray

Raises:

ValueError – If no 3D variables are found in the Dataset.

Example

>>> var = select_3d_variable(my_dataset, "Observed data")
⚠️ Multiple 3D variables found in Observed data: ['temp', 'salinity']
1: temp (shape: (time, depth, lat, lon))
2: salinity (shape: (time, depth, lat, lon))
Select variable number: 1
Hydrological_model_validator.Processing.file_io.unzip_gz_to_file(file_gz: Path, target_file: Path) None[source]#

Decompress a .gz compressed file to a specified target file.

Parameters:
  • file_gz (Path) – Path to the input .gz compressed file.

  • target_file (Path) – Path to the output decompressed file.

Raises:

FileNotFoundError – If the input .gz file does not exist.

Notes

  • Creates the parent directory of the target file if it does not exist.

  • Overwrites the target file if it already exists.

Example

>>> from pathlib import Path
>>> unzip_gz_to_file(Path("data/archive.nc.gz"), Path("data/archive.nc"))