Hydrological Model Validator#
This project provides a set of tools for evaluating the performance of Bio-Geo-Hydrological simulations and analyzing their outputs.
The focus of this repository is on post-processing, offering utilities to:
Clean and pre-process relevant dataframes
Interpolate observed datasets to handle missing values and apply proper masking
Compare observed and simulated outputs for validation and performance assessment
Analyse the simulation outputs for further insights
Features#
Data cleaning and transformation
Missing data interpolation and masking
Automated validation metrics and visual comparison
Optional PDF report generation
Modular structure for customizable analysis workflows
Project Structure#
The project is organized into two main objectives:
Quick-Use Toolkit
A high-level interface that allows users to input datasets and automatically generate:Validation plots
Summary dataframes
A PDF report (optional)
Modular Subcomponents
A collection of standalone functions and classes for users who prefer to build custom analysis pipelines or integrate specific components into other projects. These are collected into 3 submodules, which can both be used as standalones or combined:Processing/: Functions for reading, cleaning, transforming, and analyzing input datasets.Plotting/: Tools for generating a variety of plots from the processed results, including time series, scatter plots, and performance metrics.Report/: Utilities for creating structured PDF reports, incorporating plots, summary statistics, and metadata.
Note: This repository is part of a Physics of the Earth System thesis and may be expanded in the future to include additional variables and more advanced analytical features.
Model/Simulation Evaluation#
The current evaluation approach is based on a direct comparison between simulated and observed datasets over the same time window. The results are presented through a variety of plots and statistical performance metrics.
Analytical Tools#
The following visualization and statistical tools are used to evaluate model performance:
Time Series & Scatter Plots
General time series plots for visual inspection
Seasonal scatter plots for intra-annual trends
Distribution Plots
Box-and-whisker plots
Violin plots
Multivariate Performance Plots
Target diagrams
Taylor diagrams
Efficiency Metrics A wide set of statistical coefficients is implemented to evaluate model accuracy:
Coefficient of Determination (R²)
Standard
Weighted
Index of Agreement (d)
Standard
Modified
Relative
Nash–Sutcliffe Efficiency (NSE)
Standard
Logarithmic
Modified
Relative
Error Decomposition
Time-series and Spectral Analysis
Compared against cloud coverage patterns
Spatial Performance Mapping
Annual and monthly resolution maps showing model performance across geographic regions
Expansion of the Analysis: Bottom (Benthic) σ-Layer#
As the first direction for expanded analysis, this repository introduces tools focused on the extraction and study of the bottom σ-layer of the simulation grid. This layer is particularly relevant for investigating the formation of deep water masses and the distribution of bio-geochemical variables near the seabed.
Once the model has been validated using the core evaluation tools, users can apply these modules to explore processes such as:
Stratification and mixing at depth
Tracer evolution in deep layers (e.g., nutrients, oxygen, carbon compounds)
Temporal variability in bottom water properties
For implementation details and example workflows, refer to the test cases provided in the
Test_cases/directory.
Installation Guide#
This project can be installed using conda (recommended) or pip across all major operating systems. Below you’ll also find guidance for optional tools like MATLAB and CDO (Linux only) which are integrated in some of the functions/routines
Python Environment Setup#
Python version supported :
Conda (Recommended)
All Systems
# Create a new conda environment
conda create -n hydroval python=3.10
# Activate the environment
conda activate hydroval
# Install the package and dependencies in editable mode
pip install -e .
Pip Only (Without Conda)
All Systems
# Optionally create and activate a virtual environment (recommended)
python -m venv env
source env/bin/activate # Windows: env\Scripts\activate
# Install the package and dependencies in editable mode
pip install -e .
Alternative Pip Options
--user (No admin rights)
pip install --user -e .
-e (Editable/development mode)
Use -e when actively developing or modifying the source code.pip install -e .
MATLAB (Optional but needed for the interpolator script)#
MATLAB Setup (All Systems)
Description#
Some test cases or post-analysis steps may require MATLAB. Make sure it’s installed and available via your system’s PATH.
To correctly run the interpolator, the toolboxes m_map, mexcdf, and nctoolbox need to be accessible by the script. Please make sure that their paths are reachable by your MATLAB installation. For a guide on how to add paths in MATLAB, please refer to MATLAB Add Folder to Path Documentation.
The usage of a MATLAB interpolator is to make the process NOAA compliant by using their same tools, allowing future integration of this repository with other NOAA tools.
m_map toolbox: https://www.eoas.ubc.ca/~rich/map.html
mexcdf toolbox: https://www.mathworks.com/matlabcentral/fileexchange/26310-netcdf-interface-for-matlab-mexcdf
nctoolbox: nctoolbox/nctoolbox
CDO - Climate Data Operators (Linux Only)#
CDO Setup (Linux Only)
⚠️ CDO is supported only on Linux-based systems.#
# Ubuntu/Debian
sudo apt install cdo
# Or use conda
conda install -c conda-forge cdo
Helpful Links#
Official Documentation
Usage Guide: GenerateReport CLI#
The GenerateReport command-line interface (CLI) allows users to generate evaluation reports from observed and simulated Bio-Geo-Hydrological datasets.
Basic Command#
GenerateReport [input_folder_or_dict] [OPTIONS]
Positional Argument#
usage: GenerateReport [-h] [--output-dir path] [--check] [--no-pdf] [--verbose] [--open-report]
[--variable var_name] [--unit unit_str] [--no-banner] [--info] [--version]
[input]
Generate a comprehensive evaluation report from observed and simulated Bio-Geo-Hydrological datasets.
positional arguments:
input Path to the input data directory or a dictionary of file paths.
You can pass:
- a folder containing: obs_spatial, sim_spatial, obs_ts, sim_ts, and mask
- or a stringified dictionary (JSON or Python format) mapping these keys:
{
"obs_spatial": "obs_spatial.nc",
"sim_spatial": "sim_spatial.nc",
"obs_ts": "obs_timeseries.csv",
"sim_ts": "sim_timeseries.csv",
"mask": "mask.nc"
}
options:
-h, --help Show this help message and exit
--output-dir path Destination folder for report and plots (default: ./REPORT)
--check Validate input files and structure only, no report generation
--no-pdf Skip PDF generation, only output plots and dataframes
--verbose Enable detailed logging
--open-report Automatically open the PDF report if generated
--variable var_name Name of the target variable (e.g. "Chlorophyll-a")
--unit unit_str Unit of the variable (e.g. "mg/L", "m3/s"), LaTeX-ready
--no-banner Suppress ASCII banner (useful for batch jobs)
--info Show program description and exit
--version Show version and exit
Examples#
Minimal Run (Interactive)
GenerateReport ./data
With Output Directory & No PDF
GenerateReport ./data --output-dir ./results --no-pdf
Using a JSON-Style Dictionary
GenerateReport "{ \"obs_spatial\": \"obs.nc\", \"sim_spatial\": \"sim.nc\", \"obs_ts\": \"obs.csv\", \"sim_ts\": \"sim.csv\", \"mask\": \"mask.nc\" }"
Quiet Batch Run (No Banner, Auto Open Report)
GenerateReport ./data --no-banner --open-report
For example usage of the singular functions (sans the report generation ones) availbale in the repository, and generally for in-script import and usage, please refer to the test cases available in the
Test_cases/folder and their respective TEST_CASES_README file.
Test Cases and Pytests#
This repository includes a suite of example routines and automated tests to ensure the correct functionality of its components. All tests are located in the Test_cases/ directory.
Test Case Scripts#
These are step-by-step, verbose scripts that demonstrate how to apply the tools for data cleaning, analysis, and reporting. They are ideal for understanding the intended usage.
Available Test Cases#
Data_cleaner_setupper.py
Demonstrates how to clean and prepare datasets for analysis.
Includes the MATLAB scriptInterpolator_v2.mto perform bilinear interpolation on observed datasets.SST_data_analyzer.py&CHL_data_analyzer.py
Practical examples of analysis workflows using Sea Surface Temperature and Chlorophyll-a datasets.
These are simplified and didactical illustrations of what theReport_generatorsubmodule automates.Benthic_layer.py
Focuses on extracting and analyzing bottom σ-layers, emphasizing dense water formation and bio-geochemical tracers near the seabed.
Pytests and Code Quality#
Automated testing ensures reliability and stability of the modules, using:
You can run them via:
pytest
flake8 src/
These tools verify logic correctness, class behavior, and code style compliance.
Code Quality Reports#
This project is continuously monitored with external quality and coverage tools:
Bibliography#
Summary diagrams for coupled hydrodynamic-ecosystem model skill assessment (Jolliff et al., 2008)
Defining a Simplified Yet “Realistic” Equation of State for Seawater (Roquet et al., 2015)
Climatological analysis of the Adriatic Sea thermohaline characteristics (Giorgietti A., 1998)