Hydrological Model Validator#

This project provides a set of tools for evaluating the performance of Bio-Geo-Hydrological simulations and analyzing their outputs.

The focus of this repository is on post-processing, offering utilities to:

Clean and pre-process relevant dataframes
Interpolate observed datasets to handle missing values and apply proper masking
Compare observed and simulated outputs for validation and performance assessment
Analyse the simulation outputs for further insights

Features#

Data cleaning and transformation
Missing data interpolation and masking
Automated validation metrics and visual comparison
Optional PDF report generation
Modular structure for customizable analysis workflows

Project Structure#

The project is organized into two main objectives:

Quick-Use Toolkit
A high-level interface that allows users to input datasets and automatically generate:
- Validation plots
- Summary dataframes
- A PDF report (optional)
Modular Subcomponents
A collection of standalone functions and classes for users who prefer to build custom analysis pipelines or integrate specific components into other projects. These are collected into 3 submodules, which can both be used as standalones or combined:
- Processing/: Functions for reading, cleaning, transforming, and analyzing input datasets.
- Plotting/: Tools for generating a variety of plots from the processed results, including time series, scatter plots, and performance metrics.
- Report/: Utilities for creating structured PDF reports, incorporating plots, summary statistics, and metadata.

Note: This repository is part of a Physics of the Earth System thesis and may be expanded in the future to include additional variables and more advanced analytical features.

Model/Simulation Evaluation#

The current evaluation approach is based on a direct comparison between simulated and observed datasets over the same time window. The results are presented through a variety of plots and statistical performance metrics.

Analytical Tools#

The following visualization and statistical tools are used to evaluate model performance:

Time Series & Scatter Plots
- General time series plots for visual inspection
- Seasonal scatter plots for intra-annual trends
Distribution Plots
- Box-and-whisker plots
- Violin plots
Multivariate Performance Plots
- Target diagrams
- Taylor diagrams
Efficiency Metrics A wide set of statistical coefficients is implemented to evaluate model accuracy:
- Coefficient of Determination (R²)
  - Standard
  - Weighted
- Index of Agreement (d)
  - Standard
  - Modified
  - Relative
- Nash–Sutcliffe Efficiency (NSE)
  - Standard
  - Logarithmic
  - Modified
  - Relative
Error Decomposition
- Time-series and Spectral Analysis
  - Compared against cloud coverage patterns
- Spatial Performance Mapping
  - Annual and monthly resolution maps showing model performance across geographic regions

Expansion of the Analysis: Bottom (Benthic) σ-Layer#

As the first direction for expanded analysis, this repository introduces tools focused on the extraction and study of the bottom σ-layer of the simulation grid. This layer is particularly relevant for investigating the formation of deep water masses and the distribution of bio-geochemical variables near the seabed.

Once the model has been validated using the core evaluation tools, users can apply these modules to explore processes such as:

Stratification and mixing at depth
Tracer evolution in deep layers (e.g., nutrients, oxygen, carbon compounds)
Temporal variability in bottom water properties

For implementation details and example workflows, refer to the test cases provided in the Test_cases/ directory.

Installation Guide#

This project can be installed using conda (recommended) or pip across all major operating systems. Below you’ll also find guidance for optional tools like MATLAB and CDO (Linux only) which are integrated in some of the functions/routines

Python Environment Setup#

Python version supported :

Conda (Recommended)

All Systems

# Create a new conda environment
conda create -n hydroval python=3.10

# Activate the environment
conda activate hydroval

# Install the package and dependencies in editable mode
pip install -e .

Pip Only (Without Conda)

All Systems

# Optionally create and activate a virtual environment (recommended)
python -m venv env
source env/bin/activate      # Windows: env\Scripts\activate

# Install the package and dependencies in editable mode
pip install -e .

Alternative Pip Options

--user (No admin rights)

pip install --user -e .

-e (Editable/development mode)

pip install -e .

Use -e when actively developing or modifying the source code.

MATLAB (Optional but needed for the interpolator script)#

MATLAB Setup (All Systems)

Description#

Some test cases or post-analysis steps may require MATLAB. Make sure it’s installed and available via your system’s PATH.

🔗 Official MATLAB Installation Guide

To correctly run the interpolator, the toolboxes m_map, mexcdf, and nctoolbox need to be accessible by the script. Please make sure that their paths are reachable by your MATLAB installation. For a guide on how to add paths in MATLAB, please refer to MATLAB Add Folder to Path Documentation.

The usage of a MATLAB interpolator is to make the process NOAA compliant by using their same tools, allowing future integration of this repository with other NOAA tools.

m_map toolbox: https://www.eoas.ubc.ca/~rich/map.html

mexcdf toolbox: https://www.mathworks.com/matlabcentral/fileexchange/26310-netcdf-interface-for-matlab-mexcdf

nctoolbox: nctoolbox/nctoolbox

CDO - Climate Data Operators (Linux Only)#

CDO Setup (Linux Only)

⚠️ CDO is supported only on Linux-based systems.#

# Ubuntu/Debian sudo apt install cdo # Or use conda conda install -c conda-forge cdo

🔗 Official CDO Installation Guide

Helpful Links#

Official Documentation

🐍 Python Installation

📦 Anaconda/Miniconda Installation

🪟 WSL for Windows Users

🌐 MATLAB Installation

🌊 CDO (Linux only)

Usage Guide: GenerateReport CLI#

The GenerateReport command-line interface (CLI) allows users to generate evaluation reports from observed and simulated Bio-Geo-Hydrological datasets.

Basic Command#

GenerateReport [input_folder_or_dict] [OPTIONS]

Positional Argument#

usage: GenerateReport [-h] [--output-dir path] [--check] [--no-pdf] [--verbose] [--open-report] [--variable var_name] [--unit unit_str] [--no-banner] [--info] [--version] [input] Generate a comprehensive evaluation report from observed and simulated Bio-Geo-Hydrological datasets. positional arguments: input Path to the input data directory or a dictionary of file paths. You can pass: - a folder containing: obs_spatial, sim_spatial, obs_ts, sim_ts, and mask - or a stringified dictionary (JSON or Python format) mapping these keys: { "obs_spatial": "obs_spatial.nc", "sim_spatial": "sim_spatial.nc", "obs_ts": "obs_timeseries.csv", "sim_ts": "sim_timeseries.csv", "mask": "mask.nc" } options: -h, --help Show this help message and exit --output-dir path Destination folder for report and plots (default: ./REPORT) --check Validate input files and structure only, no report generation --no-pdf Skip PDF generation, only output plots and dataframes --verbose Enable detailed logging --open-report Automatically open the PDF report if generated --variable var_name Name of the target variable (e.g. "Chlorophyll-a") --unit unit_str Unit of the variable (e.g. "mg/L", "m3/s"), LaTeX-ready --no-banner Suppress ASCII banner (useful for batch jobs) --info Show program description and exit --version Show version and exit

Examples#

Minimal Run (Interactive)

GenerateReport ./data

With Output Directory & No PDF

GenerateReport ./data --output-dir ./results --no-pdf

Using a JSON-Style Dictionary

GenerateReport "{ \"obs_spatial\": \"obs.nc\", \"sim_spatial\": \"sim.nc\", \"obs_ts\": \"obs.csv\", \"sim_ts\": \"sim.csv\", \"mask\": \"mask.nc\" }"

Quiet Batch Run (No Banner, Auto Open Report)

GenerateReport ./data --no-banner --open-report

For example usage of the singular functions (sans the report generation ones) availbale in the repository, and generally for in-script import and usage, please refer to the test cases available in the Test_cases/ folder and their respective TEST_CASES_README file.

Test Cases and Pytests#

This repository includes a suite of example routines and automated tests to ensure the correct functionality of its components. All tests are located in the Test_cases/ directory.

Test Case Scripts#

These are step-by-step, verbose scripts that demonstrate how to apply the tools for data cleaning, analysis, and reporting. They are ideal for understanding the intended usage.

Available Test Cases#

Data_cleaner_setupper.py
Demonstrates how to clean and prepare datasets for analysis.
Includes the MATLAB script Interpolator_v2.m to perform bilinear interpolation on observed datasets.

SST_data_analyzer.py & CHL_data_analyzer.py
Practical examples of analysis workflows using Sea Surface Temperature and Chlorophyll-a datasets.
These are simplified and didactical illustrations of what the Report_generator submodule automates.

Benthic_layer.py
Focuses on extracting and analyzing bottom σ-layers, emphasizing dense water formation and bio-geochemical tracers near the seabed.

Pytests and Code Quality#

Automated testing ensures reliability and stability of the modules, using:

pytest

flake8 (for linting and style enforcement)

You can run them via:

pytest flake8 src/

These tools verify logic correctness, class behavior, and code style compliance.

Code Quality Reports#

This project is continuously monitored with external quality and coverage tools:

Codacy

Codebeat

Codecov

Documentation

Bibliography#

The Northern Adriatic Forecasting System for Circulation and Biogeochemistry: Implementation and Preliminary Results (Scroccaro I et al., 2022)

Comparison of different efficiency criteria for hydrological model assessment (Krause P. et al., 2005)

Summary diagrams for coupled hydrodynamic-ecosystem model skill assessment (Jolliff et al., 2008)

The International Thermodynamic Equation of Seawater 2010 (TEOS-10): Calculation and Use of Thermodynamic Properties (McDougall et al., 2010)

Defining a Simplified Yet “Realistic” Equation of State for Seawater (Roquet et al., 2015)

Climatological analysis of the Adriatic Sea thermohaline characteristics (Giorgietti A., 1998)

Evaluation of different Maritime rapid environmental assessment procedures with a focus on acoustic performance (Oddo et al., 2022)

A study of the hydrographic conditions in the Adriatic Sea from numerical modelling and direct observations (2000–2008) (Oddo et al., 2011)

Hydrological Model Validator

Contents

Hydrological Model Validator#

Features#

Project Structure#

Model/Simulation Evaluation#

Analytical Tools#

Expansion of the Analysis: Bottom (Benthic) σ-Layer#

Installation Guide#

Python Environment Setup#

MATLAB (Optional but needed for the interpolator script)#

Description#

CDO - Climate Data Operators (Linux Only)#

⚠️ CDO is supported only on Linux-based systems.#

Helpful Links#

Usage Guide: `GenerateReport` CLI#

Basic Command#

Positional Argument#

Examples#

Test Cases and Pytests#

Test Case Scripts#

Available Test Cases#

Pytests and Code Quality#

Code Quality Reports#

Bibliography#

Hydrological Model Validator

Contents

Hydrological Model Validator#

Features#

Project Structure#

Model/Simulation Evaluation#

Analytical Tools#

Expansion of the Analysis: Bottom (Benthic) σ-Layer#

Installation Guide#

Python Environment Setup#

MATLAB (Optional but needed for the interpolator script)#

Description#

CDO - Climate Data Operators (Linux Only)#

⚠️ CDO is supported only on Linux-based systems.#

Helpful Links#

Usage Guide: GenerateReport CLI#

Basic Command#

Positional Argument#

Examples#

Test Cases and Pytests#

Test Case Scripts#

Available Test Cases#

Pytests and Code Quality#

Code Quality Reports#

Bibliography#

Usage Guide: `GenerateReport` CLI#