Changelog

Contents

Changelog#

Version: 4.10.4#

Date: 23/06/2025

Summary#

Sphinx documentation now available online, helper script added for test-case dataset download, and minor fixes applied to improve documentation consistency and test stability.

Sphinx Documentation#

A Sphinx-based documentation site has been deployed to consolidate all key project materials in one accessible HTML format.

Included in the site:

  • README.md

  • TEST_CASES_README.md

  • CHANGELOG.md

  • Autogenerated API documentation

Due to Markdown/HTML formatting differences, some files have duplicated versions to avoid rendering issues (e.g., image paths and layout inconsistencies).
Both the GitHub .md and Sphinx .rst versions will be maintained in parallel going forward.

Test Dataset Downloader#

Because the test-case dataset is too large for GitHub’s LFS, it has been hosted on Google Drive.

To automate retrieval:

  • A script named Download_data.py has been added in the Test_cases/ folder.

  • Running it will:

    • Download the zipped dataset

    • Unzip it in-place for immediate use

Best practice: Keep both zipped and unzipped versions locally to ensure uninterrupted access if the external link becomes temporarily unavailable.

Minor Fixes#

Docstring Refactor#

  • Plotting function docstrings have been reformatted:

    • Removed 2-column style (which broke HTML formatting)

    • Now compliant with Sphinx autodoc parsing

Import Fixes#

  • Relative imports have been fully replaced with absolute imports to prevent breakages during Sphinx auto-build.

Test Errors#

  • Some tests unexpectedly failed due to unclear causes (likely linked to recent layout changes)

  • These issues have been manually addressed and all test suites now pass.

✅ Project Status#

This marks the official completion of the project’s primary development phase.

Final milestone achieved. All deliverables are in place, documented, and operational.


Version: 4.10.3#

Date: 21/06/2025

Summary#

Documentation overhaul and minor hotfixes.

Documentation Updates#

New README.md#

The main README has been fully rewritten to better reflect the scope, usage, and structure of the project. It now includes:

  • Project overview and capabilities

  • Installation instructions (including dependencies like MATLAB and CDO)

  • Example usage and command-line options

  • Extended bibliography and reference list

  • Project health badges from Codecov, Codacy, and Codebeat

docs/TEST_CASES_README.md#

A dedicated markdown file has been added under docs/ to document the test cases:

  • Descriptions of each test case

  • Example output plots

  • Clarification on purpose and expected input/output

A future integration with Sphinx is planned to combine this with the README.md and CHANGELOG.md into a full API-style documentation._


Hotfixes#

plot_spatial_efficiency#

Fixed a layout issue affecting multi-year datasets (7+ years):
The recent refactor to improve handling of short datasets (e.g., 1 year) broke the centering logic for longer timelines. Plot layout is now adaptive and visually consistent across all timescales.


Version: 4.10.2#

Date: 21/06/2025

Summary#

Introduction of new tests

Report_generator tests#

With the aim to reach an acceptable coverage of the Report_generator more tests have been created

Version: 4.10.1#

Date: 21/06/2025

Summary#

Minor release focused on general hotfixes and improved cross-platform behavior.

Hotfixes#

PDF Opening in WSL#

The Report_generator’s --open-report functionality has been reworked for better compatibility with WSL environments.
The routine now falls back to xdg-open or os.startfile alternatives, depending on platform detection. This ensures PDFs open correctly across Linux, Windows, and WSL instances when using the CLI flag --open-report.

Taylor Diagram (Single Marker Bug)#

Fixed a rendering bug in the comprehensive_taylor_diagram plotting function:

  • When only a single marker is passed and overlay="on" is set, the native SkillMetrics library failed to place it correctly.

  • Now, a manual fallback mechanism adds the marker without using the looped overlay logic.


Version: 4.10.0#

Date: 21/06/2025

Summary#

Major release introducing a third submodule for automatic PDF report generation via CLI. Also includes CLI enhancements, helper utilities, and hotfixes across plotting and data handling functions.

Report generator CLI#

A command-line interface has been introduced to automatically generate a PDF report for model performance evaluation based on minimal input datasets.

Invoked via the GenerateReport entrypoint, this routine:

  • Accepts input via file paths or dictionaries

  • Runs SST/CHL-like analyses automatically (same ones proposed in the associated test cases)

  • Optionally compiles the results into a PDF (plots + summaries)

  • Always saves individual plots/dataframes, regardless of PDF output

Key CLI flags/options:

  • input: str or dict path(s) to data

  • --output-dir: optional output path

  • --check: validate structure only (no run)

  • --no-pdf: suppress PDF creation

  • --verbose: print run-time messages

  • --open-report: open PDF post-run

  • --variable, --unit: override plot labels

  • --no-banner: suppress ASCII banner

  • --version, --info: metadata displa

Report submodule#

All functions and classes for PDF report composition have been moved into a standalone submodule.
This enables advanced users to programmatically generate custom reports.

  • Includes layout tools, content templates, and internal page managers

  • Covered by 3 new testing suites:

    • Report classes

    • Report functions

    • Full report generation flow

Report helper functions#

To support the new submodule, various utilities were added or improved:

file_io.py

  • find_file_with_keywords: auto-match filenames

  • select_3d_variable: extract usable DataArray from Dataset

time_utils.py

  • is_invalid_time_index: check for broken time series

  • ensure_datetime_index: enforce time indexing

  • prompt_for_datetime_index: prompt user to define one

utils.py

  • convert_dataarrays_in_df: safely convert 3D/2D DataArray to DataFrame

Additional input label normalization ensures all common synonyms (mod, sim, obs, sat, etc.) are interpreted correctly.

Hotifxes#

  • plot_spatial_efficiency:

    • Fixed handling of 1-column layouts

    • Improved colorbar/suptitle rendering

  • compute_fft and plot_spectral:

    • Now skip ZeroDivision instances gracefully

  • CHL/SST test cases:

    • Corrected geolocation error caused by inconsistent ocean masks

Future Work#

Next steps before final release:

  • Fully refactor the README.md with diagrams and working image links

  • Publish the public test-case dataset

  • General hotfixes

Planned project deadline: June 22, 2025 (subject to change)


Version: 4.9.1#

Date: 13/06/2025

Summary#

Test case scripts have been updated to function as proper entrypoints, improving general usability.

Addition of flake8 lining.

New Entrypoints#

One of the final steps toward full project modularization has been completed:
The test case scripts — which illustrate example usages — have been refactored into callable entrypoints.

The updated names are as follows:

  • sst-analyze to call SST_data_analyzer.py

  • chl-analyze to call CHL_data_analyzer.py

  • bfm-analyze to call Benthic_Layer.py

  • data-setupper to call Data_reader_setupper.py

The installation of these new entrypoints in handled by the setup.py script and the pyproject.toml

Key changes:

  • Each script is now wrapped in a main() function, called explicitly at runtime

  • File paths are now relative to __file__ rather than the working directory (cwd), ensuring portability and reducing errors

  • Additional verbosity has been added to clarify script behavior and improve interpretability for users and contributors

Flake8#

At an attempt to further ensure correct code sintax a lining for flake8 has been added in the ci.yml file.

Future Work#

Next steps before final release:

  • Integrate argparse and build a __main__.py controller for the two primary modules

  • Fully refactor the README.md with diagrams and working image links

  • Publish the public test-case dataset

  • Final pass of module and import cleanup

Planned project deadline: June 22, 2025 (subject to change)


Version: 4.9.0#

Date: 11/06/2025

Summary#

Major update introducing climate data analysis utilities, .json save support for model data, and enhanced control over line plots. Minor cleanup and test alignment also included.

Climate Data Analysis#

To broaden the analytical capabilities of the project, new statistical tools have been added to the stats_math_utils module, primarily aimed at climate and long-term signal analysis. Each function is equipped with Timer logging and proper test coverage.

Newly added functions include:

  • detrend_linear, detrend_poly_dim — for lightweight linear and polynomial detrending

  • monthly_anomaly, yearly_anomaly, detrended_monthly_anomaly — anomaly isolators for different timescales

  • np_covariance, np_correlation, np_regression — NumPy-based correlation and regression metrics

  • extract_multidecadal_peak, extract_multidecadal_peaks_from_spectra — signal diagnostics from spectral power densities

  • identify_extreme_events — simple threshold-based extreme event detection

.json Model Save Support#

The save_model_data function now supports .json output alongside already existing file types.

Updated plot_line Logic#

To address incorrect NaN handling by seaborn, the plot_line function now supports two rendering engines:

  • matplotlib (default) — maintains breaks in line plots for missing data

  • seaborn — can still be manually selected by setting library='sns', but note it masks NaN values incorrectly in time series

This change ensures that all timeseries plots reflect missing data clearly and behave as expected under scientific studies.

Minor Cleanup#

  • Unified dataset and mask imports in SST_data_analyser and CHL_data_analyser test scripts

  • All imports are now grouped at the top for clarity and to avoid repeated disk reads

Future Work#

With the exception of critical bugs or fixes, this is the final major feature release before introducing CLI tools and project packaging. Final steps include:

  • argparse integration and __main__.py setup

  • Setup of script entrypoints (test cases + main)

  • README.md refactor with updated figures and proper image paths

  • Final publication of the public test-case dataset

Planned project deadline: June 22, 2025 (subject to change)


Version: 4.8.8#

Date: 11/06/2025

Summary#

This patch delivers hotfixes to the dense_water_timeseries function and its associated tests, along with proactive adjustments for upcoming matplotlib changes regarding colormap handling.

dense_water_timeseries Enhancements#

The dense_water_timeseries function has been enhanced with two key additions:

  • A savefig flag to enable conditional plot saving.

  • An output_path parameter to define where plots are saved.

These improvements support streamlined figure generation and automation. The associated test suite has been updated accordingly to validate the new behavior.

Colormap Futureproofing#

In preparation for the deprecation of the cm submodule in matplotlib version 3.11, colormap access has been refactored from

cmap = cm.get_cmap("plasma")

to

cmap = plt.colormaps["plasma"]

This change ensures compatibility with upcoming versions and affects:

  • bfm_plots.py

  • formatting.py

Known Limitations#

A few deprecation warnings remain when running the full test suite. These are not critical and are due to:

  • Legacy test data usage that will be cleaned up during the final documentation/testing phase.

  • Upcoming matplotlib changes with insufficient current documentation.

These warnings will be addressed once clearer upstream documentation is available and regression testing is complete.

Future Work#

  • Extend logging and timing to remaining modules (stats_math_utils, Data_saver)

  • Introduce climate data analysis tools.

  • Finalize README.md overhaul and fix broken image paths

  • Publish a clean, public test-case dataset for reproducibility

Planned project deadline: June 22, 2025 (subject to change)


Version: 4.8.7#

Date: 10/06/2025

Summary#

Introduced logging and performance timing utilities to monitor function usage and track computational costs across the codebase.

Logging and Timing Utilities#

A new utility class, Timer, has been added to the time_utils.py submodule. This tool is designed to be integrated with all core functions to monitor execution duration and improve traceability.

Logging is now performed via two channels:

  • app.log: A traditional log file with human-readable messages and timestamps.

  • eliot.log: A structured .json log formatted for use with the eliot-tree tool, enabling users to visualize computation flow in a hierarchical tree structure.

Nearly all functions in the Processing submodule have been wrapped with the Timer, with the exceptions of:

  • stats_math_utils

  • Data_saver

These will be updated in upcoming patches after:

  • Logging is fully integrated into the remaing computational submodules.

  • Refactors to handle deprecations and new features (e.g., JSON export for model data in Data_saver and extended analysis in stats_math_utils) are completed.

    The Timer and logging mechanisms are implemented without decorators, and instead integrated via manual indentation within while loops or inline blocks.
    While decorators would have led to cleaner function definitions, they do not support fine-grained logging of internal steps. This trade-off prioritizes thorough logging over code brevity.

Future Work#

  • Climate data analysis: Finalize the new analytical functions.

  • Documentation overhaul: Complete the README.md rework and finalize all documentation elements.

  • Public test-case datasets: Ensure reproducibility by uploading a curated set of test data.

  • Extend logging utilities: Finalize integration of the logging/timing system across all remaining submodules.

Planned project deadline: June 22, 2025 (subject to change)


Version: 4.8.6#

Date: 09/06/2025

Summary#

This minor patch finalizes the in-script documentation, introduces fixes to input validation logic, and updates related test scripts accordingly. These refinements ensure greater robustness and compatibility of the testing infrastructure.

Documentation Expansion#

The in-script documentation for Plots.py and bfm_plot.py has been completed, marking the conclusion of this phase. All currently implemented modules and functions now include:

  • Descriptive docstrings

  • Clear inline comments

  • Logical code block headers

Test Case Hotfixes#

Test case scripts, particularly those related to the CHL and SST data analyses, were failing due to stricter input validation introduced in earlier patches. These issues have now been addressed, and test executions pass as expected.

Input Validation & Logic Hotfixes#

The following corrections were made to resolve issues introduced by overly strict or incorrect validation logic:

  • compute_density_bottom

    • Adjusted threshold for Bmost validity from >=1 to >=0 to support surface-level datasets.

    • Corrected input validation: previously assumed input was a list during yearly iteration, now correctly handles input as a nested Dict[Year][Month].

  • calc_density

    • Removed incorrect check requiring the first dimension of temperature/salinity to match that of depths. This mismatch is expected in many real-world cases.

  • compute_dense_water_volume

    • Removed boolean-only check on mask3d, as masks may also contain numeric information (e.g., depth).

Associated tests were updated or removed accordingly, resulting in a negligible change to overall test coverage, now reaching 95.19%.

Future Work#

  • New feature set for climate data analysis.

  • Documentation overhaul including a repaired and updated README.md

  • Public release of full test-case datasets to ensure reproducibility

  • Logging and timing utilities to be added for performance profiling

Planned project deadline: June 22, 2025 (subject to change)


Version: 4.8.5#

Date: 08/06/2025

Summary#

This minor patch continues test coverage expansion with two new suites focused on the plotting modules. Input validation has also been improved, particularly for output-related parameters.

Test Coverage#

Two new test suites have been added for the Plots.py and bfm_plots.py submodules, further enhancing overall test coverage.
With these additions, the project’s coverage has now reached 95.23%.

Note: due to the structure of the plotting functions and the extensione of test data used, the new testing suits take a while to complete.

Hotfixes & Enhancements#

Input Validation Improvements#

  • Improved input validation for output_path, variable_label, and unit_label in plotting functions.

  • Replaced direct attribute access with getattr(options, 'output_path', None) to safely check for the presence of output_path and provide clearer error messages when missing.

  • Similar validation was added for variable_label and unit_label to avoid AttributeError when default values were not properly handled, especially when generating LaTeX-formatted labels.

  • Added clauses to skip empty data in the dense_water_timeseries function to avoid unnecessary plotting.

Plot Behavior Consistency#

  • Reverted the interval used in plt.pause() to a hardcoded value to ensure consistent behavior during both test execution and interactive use.

    This pause allows the user to confirm that plots are correctly generated before they are saved. However, for full plot inspection, users should refer to the saved image files in the specified output_path.

Future Work#

  • Introduce new feature set for climate data analysis.

  • Complete a full documentation overhaul and repair the broken README.md

  • Upload the full test-case dataset to support reproducibility and public testing

  • Begin adding logging and timing utilities to evaluate and profile performance

Planned project deadline: June 22, 2025 (subject to change)


Version: 4.8.4#

Date: 07/06/2025

Summary#

Extended coverage of testing script to Taylor and Target plotting functions.

Testing expansion#

New testing units have been created to test Taylor and Target plotting functions making progress into full project coverage.

Current coverage for these testing suits is greater than 90%.

Current overall testing coverage is

Future Work#

  • Finalize tests for all remaining plotting scripts (general plotting functions and bfm specific plots)

  • Introduce new feature set for climate data analysis.

  • Complete a full documentation overhaul and repair the broken README.md

  • Upload the full test-case dataset to support reproducibility and public testing

  • Begin adding logging and timing utilities to evaluate and profile performance

Planned project deadline: June 22, 2025 (subject to change)


Version: 4.8.3#

Date: 07/06/2025

Summary#

This minor patch expands test coverage, removes legacy files, and introduces a new utility for numeric data validation.

Test Coverage#

All scripts in the Processing submodule have now reached >90% test coverage.

Contrary to earlier statements, new test modules have also been added for both model and satellite data reading functions. While these are still tailored to specific test-case datasets, their inclusion improves confidence and completeness in the test suite.

Deprecation of Legacy Files#

As part of a general repository cleanup, the Costants.py file has been removed. All constants previously defined in this module are now obsolete and unused by the current version of the codebase.

New Utility: Numeric Data Checker#

A new utility function, check_numeric_data, has been added to the utils.py submodule.
This function checks whether inputs are valid numeric types (e.g., integers, floats, NumPy arrays), which is essential for computational routines that assume numeric-only input.

Users are encouraged to validate their data with this function when unsure before using processing functions that rely on numeric inputs.

Future Work#

  • Finalize tests for all remaining plotting scripts

  • Introduce new feature set for climate data analysis.

  • Complete a full documentation overhaul and repair the broken README.md

  • Upload the full test-case dataset to support reproducibility and public testing

  • Begin adding logging and timing utilities to evaluate and profile performance

Planned project deadline: June 22, 2025 (subject to change)


Version: 4.8.2#

Date: 06/06/2025

Summary#

This minor patch establishes groundwork for connecting the repository to multiple third-party tools aimed at enhancing documentation, code quality, and testing infrastructure. It also includes general hotfixes to both test suites and associated functions.

Repository Enhancements#

Building on the previously implemented Continuous Integration with Codecov, this update initializes support for several additional tools:

  • Sphinx and ReadTheDocs
    These tools have been integrated to generate an HTML-based project documentation site. The initial build includes:

    • README contents

    • CHANGELOG

    • Requirements

    • Test case descriptions

    As part of this integration, the CHANGELOG.md file has been relocated to the docs/ folder, along with a copy of the requirements.txt and all example images previously used in the README.md.
    This causes the current root README.md to appear broken. This will be resolved during the final documentation overhaul near project completion.

  • Codacy and Codebeat
    These platforms have been linked to provide automated, objective feedback on code quality, including complexity metrics and maintainability suggestions.

The setup.py has also been expanded to encompass more general information.

Test Coverage Update#

Codecov-reported test coverage has increased to 59.91%. While the global improvement is modest, several key modules now exceed 90% coverage, including:

  • Efficiency_metrics

  • Data_alignment

  • Data_saver

  • file_io

  • BFM_data_reader

  • time_utils

  • Taylor_computations

  • Missing_data

  • stats_math_utils

  • utils

These improvements are primarily due to expanded testing targeting input validation and edge cases.

General Enhancements#

All Processing modules that are currently tested have been updated with:

  • Enhanced docstrings and inline comments

  • Stricter input validation

  • Clearer error messages for robustness

Hotfixes#

  • Removed a partial duplication of the convert_to_serialization function

  • Updated relevant test cases to align with the revised logic

  • In test_efficiency, redundant test dictionaries were moved into reusable setup functions to reduce complexity and improve readability

Future Works#

  • Patch and finalize all remaining untested or partially tested scripts

  • Introduce a new feature set for climate data analysis.

  • Complete documentation overhaul and fix the broken README.md

  • Upload the public test-case dataset to support reproducibility

Planned project deadline: June 22, 2025 (subject to change)


Version: 4.8.1#

Date: 06/06/2025

Summary#

This minor patch finalizes the in-script documentation and completes input validation integration for all Processing submodules that have already undergone testing. These improvements enhance code clarity, safety, and maintainability.

Hotfixes#

Updated the following test scripts to accommodate recent input validation changes:

  • test_target_computations

  • test_taylor_computations

  • test_time_utils

  • test_utils


Version: 4.8.0#

Date: 06/06/2025

Summary#

This release introduces continuous integration (CI) with Codecov-linked testing, expands installation tools, and includes multiple hotfixes across test cases to ensure compatibility and consistency.

Codecov Integration#

To improve test quality and tracking, the Codecov tool has been integrated with the GitHub repository, offering automated test coverage analysis. The current test coverage is approximately 57%, primarily limited by missing tests for data readers and plotting routines.

Target coverage: 85–90% across all tested modules.

Current coverage highlights:

  • formatting.py: 71%

  • BFM_data_reader.py: 80%

  • Data_saver.py: 76%

  • Density.py: 63%

  • file_io.py: 82%

CI badge:
CI

Installation Tools#

Multiple files have been added or enhanced to support proper installation of the package:

  • requirements.txt

  • pyproject.toml

  • setup.py

  • MANIFEST.in

Entrypoints have been initialized, with plans to enable:

  • Running test case scripts as CLI entry points.

  • Executing main processing/plotting functions through command-line interfaces in future releases.

Hotfixes & Documentation#

  • Expanded docstrings, inline comments, and input validation across modules.

  • Fixed issues in tests involving matlab.engine to align with CI and ensure accurate coverage tracking under Codecov.

Future Work#

  • Finalize in-script documentation across all modules.

  • Extend test suite to improve test coverage to >85%, especially in:

    • Plotting functions

    • Satellite/model data readers

    • Remaining utility modules


Version: 4.7.1#

Date: 05/06/2025

Summary#

This patch introduces several hotfixes along with expanded documentation and enhanced input validation for the Density.py module.

Enhancements#

  • Expanded inline comments, docstrings, and added input validation in Density.py for improved readability and reliability.

Hotfixes#

  • Adjusted dummy_Bmost in tests to use 0-based indexing instead of 1-based.

  • Updated test_get_common_series_by_year_month_invalid_structure to raise a TypeError (a formal fix to reflect the expected behavior).

  • Removed the 12-month validation check in data_alignment.py functions to allow compatibility with smaller datasets.


Version: 4.7.0#

Date: 05/06/2025

Summary#

The Data_saver.py script has been updated to improve clarity, robustness, and flexibility for users.

Enhancements#

  • Expanded documentation with improved docstrings, inline comments, and input validation to ensure safer use.

  • Two new utility functions added to support saving data in .json format, broadening the script’s export capabilities:

    • convert_to_serializable: Converts standard Python objects (including NumPy arrays and datetime types) into JSON-conformant formats.

    • save_variable_to_json: Handles the actual save operation to a .json file.


Version: 4.6.3#

Date: 05/06/2025

Summary#

The data_alignment.py script has been updated with more extensive documentation and input validation enhancements.

Enhancements#

  • Expanded and clarified all docstrings to better describe function behavior and expected inputs.

  • Added comprehensive inline comments to improve code readability and maintainability.

  • Implemented general input validation across all functions to ensure proper usage and clearer error reporting.


Version: 4.6.2#

Date: 05/06/2025

Summary#

The BFM_data_reader.py submodule, along with its corresponding pytest, has been updated, cleaned, and enhanced with improved documentation and validation.

Documentation Expansion#

  • Performed a comprehensive rework of BFM_data_reader.py, including:

    • Expanded and clarified docstrings.

    • Improved inline comments for better maintainability and readability.

    • Extensive input validation checks to ensure robustness and user feedback.

  • Updated test_bfm_data_reader.py to align with the new input validation logic and avoid testing conflicts.


Version: 4.6.1#

Date: 05/06/2025

Summary#

This hotfix addresses updates to keep testing aligned with recent functional updates and improves robustness through better input validation.

Hotfixes#

  • Updated the following test modules to reflect recent changes:

    • test_efficiency_metrics.py

    • test_stats_math_utils.py

    • test_utils.py

  • Minor fixes were applied to associated functions to:

    • Ensure consistent input validation

    • Improve edge case handling

    • Maintain compatibility with expanded testing coverage


Version: 4.6.0#

Date: 04/06/2025

Summary#

This update introduces a new section focused on temporal and spectral analysis of error components (mean bias, unbiased RMSE, standard deviation error, and cross-correlation). Additionally, it fixes a critical issue in the standard_deviation_error computation.

Temporal Analysis#

The new functionality computes the time evolution of key error metrics using 2D daily mean datasets. These time series are then correlated with cloud coverage percentages to explore how cloud cover affects model performance. This analysis is handled via:

  • compute_error_timeseries

  • compute_stats_single_time
    Both located in the Efficiency_metrics.py submodule and supported by statistical tools from stats_math_utils.py.

New plotting utilities have been introduced to visualize these insights.

Spectral Analysis#

Two types of spectral analysis have been introduced:

  • Power Spectral Density (PSD)

  • Cross Spectral Density (CSD)

These help identify dominant temporal frequencies and assess relationships between error signals and cloud cover.

All related computations are located in stats_math_utils.py.

Hotfix#

  • Fixed incorrect computation logic in the standard_deviation_error function, which previously returned incorrect values.

Future Works#

  • Add test coverage for new functions introduced in this update.

  • Continue expanding and refining in-script documentation for improved clarity.

  • Rework and expand the README.md for clearer user guidance.

  • Complete the packaging setup:

    • AUTHORS

    • MANIFEST.in

    • pyproject.toml

    • requirements.txt

    • Optional environment.yml


Version: 4.5.0#

Date: 04/06/2025

Summary#

This update introduces a complete pytest suite to validate the project’s functionality and ensure stability. It also includes several minor hotfixes to align behavior with intended design and pass the newly added tests.

Pytest implementation#

A new Pytests folder has been added under the Test cases directory. This folder contains all the pytest scripts created to test nearly all implemented functions across the project. Each test includes a concise comment explaining its purpose and additional inline documentation clarifying the logic and expected results.

NOTE: Pytests were not added for the model/satellite data loading modules and the plotting functions due to the complexity of mocking paths and structured datasets. These components are already validated through the main test-case scripts (SST, CHL, Benthic). However, helper functions used within those modules are fully covered — e.g., plotting helpers are tested in test_formatting, and file-related logic is covered in test_file_io.

Hotfixes#

Several functions were hotfixed to:

  • Restore correct behavior where recent changes introduced inconsistencies.

  • Ensure compatibility and accuracy in alignment with test coverage.

  • Standardize outputs and edge-case handling across the board.

Future Works#

  • Expand and refine in-script documentation for improved clarity and maintainability.

  • Rework and expand the README.md to better guide users through installation and usage.

  • Implement remaining files required for complete user installation: AUTHORS, MANIFEST.in, pyproject.toml, requirements.txt, and optional environment.yml.

  • Introduce a more detailed error decomposition in the temporal analysis (timeseries).

  • Begin exploration of frequency-domain error analysis using Fast Fourier Transform (FFT).


Version: 4.4.0#

Date: 02/06/2025

Summary#

This minor release introduces cloud coverage analysis and improves the spatial efficiency plotting system. It also formalizes the resampling process into a reusable function for streamlined time-based analysis.

New Feature: Cloud Coverage Timeseries#

  • The % of cloud cover over the basin is now plotted alongside the temporal bias in a dedicated timeseries figure.

  • The plotting function has been updated to split the output into two plots:

    • Plot 1: Standard timeseries metrics (e.g., observed vs. modeled values).

    • Plot 2: BIAS (moved from Plot 1) and the new cloud coverage percentage, visualized using:

      • Raw data,

      • 7-day running mean,

      • 30-day running mean.

  • Pearson correlation coefficients are computed and printed for each version using the BIAS and the cloud coverage.

Resampling Function#

  • The previously inline resampling logic has been extracted into a standalone function.

  • This function is now part of the time_utils.py submodule and supports consistent and efficient time aggregation for monthly and yearly analyses.

Spatial Efficiency Plot Enhancements#

  • Subplot label alignment and positioning have been adjusted to improve readability and reduce clutter.

  • All previously hardcoded configuration values (e.g., vmin/vmax, colormap, layout) have been moved into a default options file for easier maintenance and customization.

  • Colorbar unit labels now support LaTeX-style formatting, improving clarity and visual consistency across plots.

Future works#

Unless a further expansion of the project is required the future patches will focus on improving the documentation by adding extensive comments to the existing functions.


Version: 4.3.2#

Date: 02/06/2025

Summary#

This patch extends the spatial performance evaluation capabilities to include yearly average datasets, alongside enhancements to plotting, code robustness, and minor hotfixes.

Yearly Spatial Performance#

The compute_spatial_efficiency function has been adapted to support yearly performance evaluation, in addition to the existing monthly metrics.

  • The SST_data_analyser.py and CHL_data_analyser.py scripts now include examples for yearly spatial analysis.

  • Supporting functions in stats_math_utils.py were modified to accept flexible time input handling via a new argument.

  • Result dictionaries have been expanded to structure yearly results for future use and reference.

While the data is stored, plotting remains the primary method of interpreting 2D spatial performance outputs.

Enhanced Plotting#

The plotting function has been upgraded to dynamically adapt to the number of metrics:

  • The figure layout now prefers a square-like shape (2×N for ≤4 plots, 3×N for >4 plots).

  • Rows with fewer plots than the max are center-aligned for improved visual balance.

  • Added fail-safes in filename generation to replace illegal path characters (e.g., / in metric titles).

  • Improved overall layout and clarity, with better visual spacing and text placement.

Hotfixes#

  • Removed incorrect (°C) label from CHL_data_analyser.py, left over from SST-related code.

  • Resolved a bug in plt.savefig() where slashes (/) in titles were misinterpreted as folder paths, breaking the save process.

Future Work#

Planned improvements for the next patch include:

  • Externalizing all plot_spatial_efficiency configuration options to a dedicated default file.

  • Delegating unit label formatting to the format_unit() function with full LaTeX support.

  • Refactoring the monthly resampling code block into a standalone reusable function.

  • Introducing cloud coverage timeseries plots.

  • Separating the timeseries BIAS plot from the current gridspec (gs) layout to align it with cloud coverage analysis.


Version: 4.3.1#

Date: 01/06/2025

Summary#

Hotfix release addressing issues in value range handling for spatial metrics plots, along with improvements to the visualization layout and inline documentation for test scripts.

Plot Enhancements#

  • Subplot layout adjusted to 3 columns × 4 rows (previously 4×3) for improved readability.

  • Added coordinate labels and grid lines to maps; improved geolocation accuracy.

  • Colorbars repositioned to the bottom of the page for consistency across metrics.

  • Introduced a custom colorbar for the cross-correlation metric.

  • Enhanced plot text rendering for better clarity.

  • Added support for saving plots to a specified output directory.

Hotfixes#

  • Resolved an issue where vmin and vmax values were not correctly propagated to the plotting function, resulting in inconsistent value ranges across plots.

Test Case Updates#

  • Added explanatory comments to the SST_data_analyser.py and CHL_data_analyser.py scripts to improve readability and ease of use.

NOTE – Commit 797b3ab:
An earlier commit message incorrectly stated the removal of an import. In reality, the update introduced a higher-resolution coastline and adjusted the land mask facecolor in map plots. Future commit messages will be reviewed more carefully to avoid misreporting changes.


Version: 4.3.0#

Date: 01/06/2025

Summary#

This new version introduces a new section of the project focused on spatial performance analysis for the Sea Surface Temperature (SST) and Chlorophyll (CHL) fields. The SST_data_analyser.py and CHL_data_analyser.py test scripts have been expanded to demonstrate the new workflows.

Spatial Performance Analysis#

As a natural evolution of the project, spatial performance evaluation has been added to complement the existing temporal analysis. This feature is specifically designed for physical parameters retrieved from NEMO simulations, namely SST and CHL.

Dataset Structure#

Each metric is aimed to support two types of datasets:

  • Monthly Composite Averages (e.g., all Januaries, all Februaries, etc.)

  • Yearly Averages (e.g., 2000, 2001, etc.)

These datasets are created by resampling interpolated outputs to regular daily values into regular monthly values, which are then used to create the final datasets illustrated above.

Resampling Methods#

Three approaches (two primary) are supported for generating the monthly datasets:

  1. Xarray + ThreadPoolExecutor
    Uses xarray.resample with chunking and multithreading for in-memory operations.

    Currently a standalone code block in test scripts—planned for functional integration in future updates.

  2. CDO-Based Resampling
    If installed, Climate Data Operators (CDO) can be used via direct command-line calls from the scripts.

    Detailed documentation for this option will be included in the upcoming 5.0 README update.

NOTE:
Monthly datasets are required for this new spatial analysis section.
To ensure full usability of the test case scripts without requiring CDO installation, a precomputed monthly dataset is included in the test case data/ folder. Skipping this step will prevent the spatial analysis from running as intended.

Efficiency Metrics#

Five metrics are available for spatial performance evaluation:

  • Mean Bias

  • Standard Deviation Error

  • Raw Standard Deviation

  • Cross-Correlation

  • Unbiased RMSE

These can be visualized in 3×4 multi-panel plots.

Future Work#

  • Integration of yearly spatial performance evaluation

  • Enhanced plotting: improved colormaps and presentation

  • Rework of cloud coverage timeseries and timeseries() function

  • Final cleanup pass for docstrings and comments

  • CLI support with argparse, structured logging, and function-level testing


Version: 4.2.9#

Date: 30/05/2025

Summary#

This patch addresses a series of critical issues introduced in previous updates. All hotfixes have been applied to restore functionality.

Hotfixes#

  • Benthic_physical_plots:
    Fixed mismatch in the number of values extracted from get_benthic_plot_parameters (increased from 7 to 8).

  • get_benthic_plot_parameters:
    Corrected swifs usage for the O2o field which was incorrectly set to True, now set to False.

  • fill_anular_region:
    Reverted from polygon back to ax.fill to ensure compatibility with Cartesian-based SkillMetrics plots.

  • Data_saver.py:
    Fixed import errors caused by typos in typing and pathlib imports.

  • MOD_data_reader and SAT_data_reader:
    Resolved typo in the typing import statement.

  • eliminate_empty_field:
    Fixed an issue with NaN slicing. A warning is currently ignored, as it does not affect output and will be addressed in a future update.

  • check_missing_days:
    Corrected the printed start year in timeline messages (from 1988 to 2000). This was a display-only issue and did not affect the underlying data.

  • get_season_mask:
    Fixed an error that occurred when attempting to convert an already-existing array, preventing unnecessary conversion.

Validation#

All test case example scripts executed successfully, and no additional runtime errors were detected.


Version: 4.2.8#

Date: 30/05/2025

Summary#

This patch continues the series of reworks focused on improving code clarity, stability, and performance, specifically targeting the formatting.py submodule and related functions.

General Enhancements#

  • Refactored – formatting.py:
    All functions in the submodule have been cleaned up and optimized for improved readability and reliability.

  • Function Updates:

    • format_unit: Further optimized to ensure accurate and consistent label generation.

    • get_variable_label_unit: Now depends on format_unit for standardized output.

    • fill_anular_region: Reworked to use polygon instead of fill for target plot rendering.

    • get_min_max_for_identity_line: Vectorized to enhance computational performance.

    • compute_geolocalised_coords: Updated to use np.arange for faster coordinate computation.

    • swifs_colormap: Rewritten for clarity, with improved handling of default cases and added Typing for return values.

    • get_benthic_plot_parameters: Cleaned up and typed for better readability and stability.

Future Work#

Note:
This update concludes the current patch cycle aimed at optimizing and cleaning up existing code, in the following patches the spatial performance related functions will be implemented.
The next major revision will include:

  • Final pass for comment and docstring consistency

  • Integration of the argparse library for CLI support

  • Structured logging implementation

  • Unit testing of main computation functions


Version: 4.2.7#

Date: 30/05/2025

Summary#

This release includes a refactoring of submodules functions to enhance code clarity, stability, and maintainability, alongside minor performance optimizations.

General Enhancements#

  • Refactoring – stats_math_utils.py:
    Cleaned and reorganized function implementations for improved stability and readability.
    Variable names have been updated to remove assumptions tied to specific data sources (e.g., model vs. satellite).
    Comments and docstrings have been revised to improve clarity and documentation quality.

  • Optimization – compute_coverage_stats:
    Slight improvements to input handling for better robustness and reduced likelihood of runtime errors.

  • Cleanup – time_utils.py:
    Function declarations have been cleaned and input validation messages rewritten for clarity.
    Optimizations were applied to reduce nested loops and improve performance.

  • Improvements – utils.py:
    Enhanced error messages and streamlined function parameters.
    Refactored internal logic to eliminate unnecessary nested loops and improve efficiency.


Version: 4.2.6#

Date: 30/05/2025

Summary#

This release includes cleanups and performance improvements for data reading modules, enhancing readability, reliability, and execution speed.

General Enhancements#

  • Refactoring – MOD_data_reader.py and SAT_data_reader.py:
    Functions in these submodules have been refactored to improve code clarity and reduce computational overhead.

  • Input Validation Improvements:
    Replaced outdated assert statements with explicit RaiseErrors for more robust input validation and clearer error handling.


Version: 4.2.5#

Date: 30/05/2025

Summary#

This release provides cleanup and performance improvements in the Efficiency_metrics.py submodule, as well as bug fixes affecting test cases in Benthic_layer.py.

General Enhancements#

  • Refactoring – Efficiency_metrics.py:
    Functions within this submodule have been cleaned up for improved readability and maintainability.
    Optimization efforts targeted reducing computational time, particularly by minimizing nested loops.

Bug Fixes#

  • Benthic_physical_plot Bug:
    Fixed an issue where an incorrect number of variables was extracted during the execution of the get_benthic_plot_parameters helper function.

  • compute_dense_water_volume Bug:
    Resolved a problem where the valid_mask parameter was not being passed correctly to the calc_density helper function.


Version: 4.2.4#

Date: 29/05/2025

Summary#

This release focuses on improved documentation and minor optimizations within the file_io.py module.

General Enhancements#

  • Expanded Docstrings:
    Function docstrings within file_io.py have been updated to provide clearer, more exhaustive information for end users and developers.

  • mask_reader Optimization:
    Refactored to use slice instead of squeeze for removing extra dimensions, ensuring safer array handling.

  • call_interpolator Cleanup:
    General code cleanup and minor optimizations to improve readability and maintainability.


Version: 4.2.3#

Date: 29/05/2025

Summary#

This update refactors the Density.py submodule to enhance performance, maintainability, and numerical stability in density-related computations.

Enhancements#

  • Code Cleanup and Typing Improvements:
    Functions within the Density.py submodule have undergone general cleanup. Input types are now explicitly defined using typing annotations, and docstrings have been expanded for improved clarity and documentation.

  • NaN Value Handling:
    A general masking step has been added to the calc_density function to filter out NaN values during computation, increasing reliability.

Dense Water Mass Computation#

  • Function Rework – compute_dense_water_volume:
    The function has been restructured to utilize existing utility functions for reading .gz datasets, promoting code reuse and modularity.

  • Loop Optimization:
    Deprecated nested for loops have been replaced with efficient NumPy operations such as where and broadcast, significantly improving performance.

Note:
Since the recent changes primarily involve refactoring, minor fixes, and performance optimizations, with no major new features, the versioning has been adjusted to reflect these as patch-level updates. The old versions 4.3.0 and 4.4.0 have been renamed to 4.2.1 and 4.2.2.


Version: 4.2.2#

Date: 29/05/2025

Summary#

This patch includes refactoring of the Data_saver.py submodule to enhance performance, stability, and input flexibility.

Enhancements#

  • Refactored Data Saving Functions:
    Functions in the Data_saver.py submodule have been optimized for improved path handling. File paths can now be provided as either Path objects or str, offering greater flexibility.

  • Improved Error Handling:
    Replaced legacy assert statements with explicit RaiseErrors to provide clearer and more reliable error reporting.


Version: 4.2.1#

Date: 29/05/2025

Summary#

This update focuses on performance and reliability improvements within the data_alignment.py submodule.

Enhancements#

  • Refactored apply_3d_mask Function:
    The apply_3d_mask function has been reworked to simplify broadcasting across datasets, improving clarity and maintainability.

  • Improved Input Validation:
    Added RaiseErrors to enforce correct input types and shapes, ensuring more robust error handling and early failure detection.


Version: 4.2.0#

Date: 29/05/2025

Summary#

This release introduces a new computation feature: the percentage of cloud coverage and the percentage of available data within a basin. These enhancements lay the groundwork for more detailed analyses in future updates.

New Features#

  • Coverage Statistics Calculation:
    A new function, compute_coverage_stats, has been added to the stats_math_utils.py submodule. It calculates:

    • Percentage of available data in a basin

    • Percentage of cloud coverage

    These metrics are intended for future visualization alongside time series data plotted using the timeseries function.

Fixes & Improvements#

  • Dependency Cleanup:
    Removed deprecated libraries and modules from the SST_data_analyser.py test script.

  • Bug Fix – Dataset Selection:
    Resolved an issue in the read_model_data function that caused incorrect dataset selection under certain conditions.


Version: 4.1.0#

Date: 28/05/2025

Summary#

Version 4.1.0 introduces a full refactor of the Data_reader_setupper.py test case, integrating the Interpolator_v2.m MATLAB script and consolidating data reading and saving functions for both satellite and model datasets. This release also marks the formal deprecation of L4 data handling, shifting all analysis to L3s datasets.

Major Changes#

Data_reader_setupper.py#

  • Complete refactor of the test case script

  • Integrated Interpolator_v2.m using matlab.engine

  • Unified data reading and saving logic for satellite and model datasets

  • All previously used reader functions are now deprecated

  • File management is critical: avoid moving files while the interpolator is running

Unified Data Reading Functions#

MOD_data_reader.py & SAT_data_reader.py#
  • Each script now contains a single function for reading model or satellite datasets

  • Handles both chl and sst variables via the new varname argument

  • Future work will focus on improved robustness for irregular key names (e.g., adjusted_sea_surface_temperature in CMEMS)

Missing_data.py#

  • Rewritten to eliminate dependency on external constants

  • Now runs autonomously and prepares for future optimizations

Data_saver.py#

  • Merged functions into dedicated save_model_data() and save_satellite_data() routines

  • Simplifies saving of processed and interpolated data

Interpolator_v2.m Integration#

  • Executed from Python using matlab.engine

  • New helper: call_interpolator()

  • File structure and paths must remain unchanged during execution

  • setup.py and MANIFEST.in updated to include required MATLAB files during installation

Deprecations#

  • L4 Data: Now officially deprecated. Support removed from both CHL_data_analyzer.py and SST_data_analyzer.py

  • Legacy Reader Functions: Replaced by new unified readers

Minor Fixes & Adjustments#

  • Adjusted file names and dictionary keys in:

    • CHL_data_analyzer.py

    • SST_data_analyzer.py

  • Minor typo corrections in Benthic_layer.py

  • Fixed label positioning in Taylor_diagrams.py monthly plot function

Future Work#

  • Further optimization and cleanup of Data_reader_setupper.py functions

  • General cleanup of unused files in the GitHub repository

  • Begin development of spatial performance analysis modules


Version: 4.0.0 - OFFICIAL RELEASE#

Date: 25/05/2025

Summary#

This marks the official release of version 4.0.0. The Benthic_layer.py test case and all related BFM function scripts have been fully overhauled and are now operational. Additionally, setup.py has been updated to include all necessary dependencies for full functionality.

Benthic_layer.py Functionality#

While the high-level purpose of the Benthic_layer.py script remains the same, its underlying functions have been entirely restructured. All previous versions are now deprecated.

Modular BFM Function Scripts#

All functions related to BFM simulation variable handling are now organized in a dedicated module folder. This improves clarity, reuse, and integration with the rest of the project. Below is a breakdown of the updated logic and implementations.

Geolocalization#

  • Introduced geo_coords function to convert raw Cartesian coordinates into geodetic coordinates (longitude, latitude) and Eulerian angles (φ and λ).

  • Uses input horizontal resolution in degrees.

Bottom Layer Computation#

  • New functions: compute_Bmost() and compute_Bleast() to extract benthic and surface layers, respectively.

  • Visualization enhancements include:

    • deep colormap from cmocean

    • Optional 3D rendering using Plotly (surface and 3dmesh), paving the way for advanced deep water mass analysis.

Temperature and Salinity#

  • Functions now support parallel loading for improved performance.

  • Datasets are cached for reuse in subsequent density calculations.

  • Plotting options externalized for customization.

  • Colormaps:

    • Temperature: cmocean.thermal

    • Salinity: cmocean.haline

Density#

  • Computation logic extracted into a dedicated function.

  • EOS-80 remains the default method, aligning with simulation standards.

  • Optional methods retained for future flexibility.

  • Plotting colormap: cmocean.dense

Dense Water Masses#

  • New functionality added for detection and visualization of dense water masses (threshold: 1029.2 kg/m³, per Oddo et al.).

  • Enhancements:

    • 2D maps with black contour overlays

    • 3D volume estimation via cell counting (800×800×2 m³)

    • Timeline plotting of deep water formation across all methods

Chemical Species#

  • Data loading and plotting routines are now fully separated.

  • Parallel loading avoided due to memory limitations of large .nc files.

  • All plots use logarithmic color scaling with cmocean and matplotlib colormaps.

  • Enhanced colorbars and axis labeling for clarity.

Oxygen#
  • Uses scalar steps and cmocean.oxy colormap.

  • Thresholds:

    • Hypoxia: < 62.5 mmol/m³

    • Hyperoxia: > 312.5 mmol/m³

  • Thresholds adjustable via the formatting.py module.

Chlorophyll-a#
  • Logarithmic steps

  • Colormap: viridis

N-family (Nutrients)#
  • Logarithmic steps

  • Colormap: YlGnBu

P-family (Primary Producers)#
  • Logarithmic steps

  • Colormap: cmocean.algae

Z-family (Secondary Producers)#
  • Logarithmic steps

  • Colormap: cmocean.turbid

R-family (Particulate Organic Matter)#
  • Logarithmic steps

  • Colormap: cmocean.matter

Future Work#

The refactor of the Benthic_layer.py suite is now complete. Planned next steps include:

  • Integration of L3S Sea Surface Temperature data, replacing L4 due to underperformance.

  • Enhancements to the SST reading functions, with improved handling of missing or invalid data.

  • General performance optimization and code cleanup across all test case scripts.


Version: 4.0.0-δ - UNSTABLE#

Date: 23/05/2025

Summary#

This update brings the Data_reader_setupper.py test case script back online, resolving previous issues related to function imports and path handling.

Data_reader_setupper.py Reactivation#

  • The script Data_reader_setupper.py is now functional again.

  • Legacy hardcoded paths have been replaced with dynamic internal paths, removing the need for manual path extensions via sys.path.append.

  • This change enhances modularity and reduces the risk of import errors during execution or testing.

Note:
While the script is operational, optimization of the functions used within it has been deferred to a future update. The focus will shift to performance improvements once the L3s Sea Surface Temperature data integration is underway.


Version: 4.0.0-γ - UNSTABLE#

Date: 23/05/2025

Summary#

This incremental update completes the rework of plotting functions used in the SST and CHL analysis workflows by updating the Taylor Diagrams and Target Plots plotting and computational scripts following the same conventions introduced in the previous release.

Function Headers and Documentation#

  • Both Taylor and Target plotting functions now include detailed headers and inline comments designed to enhance code readability and provide clear guidance on usage.

  • These headers document function purpose, input arguments, return values, and expected keyword arguments.

Computational Function Improvements#

  • The underlying computation functions supporting these plots have been refined to:

    • Employ itertools for efficient iteration where applicable.

    • Replace all assert statements with explicit RaiseErrors to ensure robustness, even in optimized Python execution modes.

    • Include standardized function headers; comprehensive inline comments will be introduced in a subsequent update.

Future Work#

  • The refactoring of analysis test case functions will be temporarily paused, with plans to revisit optimization efforts at a later stage.

  • Next steps include fixing the Data_reader_setupper.py test case script to align with updated function paths; however, function re-optimization will wait until integration of the Sea Surface Temperature L3s data is complete.

  • Subsequently, the Benthic_layer scripts will be corrected and overhauled to improve computational performance and extend functionality, including the planned calculation of deep water formation volumes.


Version: 4.0.0-β - UNSTABLE#

Date: 22/05/2025

Summary#

This beta release continues the structural and functional overhaul of the project. Key updates include replacing assert statements with explicit RaiseErrors for robustness, the full integration of default plotting options, and improved documentation through consistent function headers. Additionally, the SST and CHL analysis test cases are now operational again following updates to the internal function paths.

This version is still UNSTABLE. While core scripts for SST and CHL analysis are functioning, most other scripts remain incompatible due to unrefactored paths and legacy syntax. Use is advised only for testing specific updated modules.

Major Changes#

RaiseErrors Replace assert Statements#

  • All validation previously handled through assert statements has been replaced with raise ValueError(...) or appropriate exceptions.

  • This ensures checks remain active even when scripts are executed in optimized (-O) mode, increasing the robustness and reliability of the library at the cost of some runtime performance.

Default Plotting Options Refactored#

  • Plotting functions used in SST_data_analyzer.py and CHL_data_analyzer.py now fully rely on centralized default options.

  • Legacy hardcoded options have been moved to a dedicated defaults file, allowing users to override or extend behavior more flexibly.

  • The default dpi remains set at 300 to maintain publication-quality output, but this will be lowered in the final release for faster rendering.

Function Headers and Documentation#

  • All plotting functions (excluding Taylor and Target plots) and newly added scripts now include comprehensive headers:

    • Function purpose

    • Expected inputs and return types

    • Supported keyword arguments (kwargs)

    • Example usage

  • This marks the beginning of a broader documentation effort to improve code clarity and onboarding for new contributors.

Reactivated Test Cases#

  • Both SST and CHL test case scripts are now functional again after internal path corrections.

  • The setup.py file has been updated accordingly, though users are still advised to install missing dependencies manually for full compatibility.

Fixed Issues#

  • Taylor Diagram Tick Labeling

    • RMSD ticks are now configurable via a tickRMS parameter.

    • The first tick value determines both the tick spacing and the RMSD label position, resolving longstanding issues of fixed/static placement.

  • Validation of Target and Regression Plot Behavior

    • Following extensive review and expert consultation, the anomalous behavior observed in Target Plots and Regression Lines for L4 CHL data is confirmed to be data-driven, not a bug.

    • Validation artifacts are present in the dataset itself; a pytest test suite will be released in the near future to systematically verify these findings.

Future Work#

Near-Term Roadmap#

  • Add headers and docstrings to Taylor and Target plot functions, improving readability and consistency.

  • Refactor internal for loops using itertools to reduce redundancy and optimize performance.

Upcoming Feature Development#

  • Begin refactoring of the Benthic_layer.py test script:

    • Modularize computation and plotting functions

    • Implement monthly volume calculations for deep water formation
      (based on upcoming work from Oddo et al.)

  • Add functionality to export plotting data as both .csv and .nc files (currently postponed due to priority conflicts).

  • Launch support for L3s data in Sea Surface Temperature analysis.

    • This will deprecate support for L4 data due to unsatisfactory reliability and quality of results.


Version: 4.0.0-α - UNSTABLE#

Date: 20/05/2025

Summary#

This release marks a major overhaul of the project, transitioning it from standalone scripts into a fully modular Python package. The deprecated Corollary.py and Auxilliary.py scripts have been restructured and their functions relocated to more logically organized modules. Several changes to the package structure and default plotting options are introduced to enhance usability and maintainability.

Major Changes#

Hydrological_model_validator as a Python Package#

  • The core structure of the Hydrological_model_validator project has been reworked into a Python package, making it easier to install and use as a library.

  • A new Setup.py script has been introduced, enabling installation of necessary dependencies. However, as not all dependencies are included, manual installation of additional libraries (as listed in the README) is still required.

  • The Processing and Plotting modules have been reorganized as submodules, allowing users to import specific functions from their respective scripts.

  • The Path command from the pathlib Python library has been deprecated across the codebase, though it will remain in test case scripts for accessing data directories.

Deprecations#

  • The Corollary.py and Auxilliary.py scripts are now officially deprecated due to the complexity and overabundance of functions. These functionalities have been moved to more specialized scripts to improve organization and maintainability.

New Functionality: Modularized Script Collections#

To better organize the deprecated functions from Corollary.py and Auxilliary.py, new themed scripts have been introduced within the Processing and Plotting modules. These changes ensure better modularity and improve the user experience by grouping related functions. The new scripts are as follows:

Processing Module:#

  • time_utils.py:

    • leapyear

    • true_true_time_series_length

    • split_to_monthly, split_to_yearly

    • get_common_years

    • get_season_mask

  • data_alignment.py:

    • get_valid_mask, get_valid_mask_pandas

    • align_pandas_series, align_numpy_series

    • get_common_series_by_year, get_common_series_by_year_month

    • extract_mod_sat_key

    • gather_monthly_data_across_years

  • file_io.py:

    • mask_reader

    • load_dataset

  • stat_math_utils.py:

    • fit_huber

    • fit_lowess

    • round_up_to_nearest

  • utils.py:

    • find_key

Plotting Module:#

  • formatting.py:

    • format_unit

    • get_variable_from_label_unit

    • fill_anular_region

    • get_min_max_for_identity_line

    • _style_axis_custom

These scripts will be expanded as necessary to accommodate additional functions and improve usability.

Default Plotting Options#

  • All previous options used in the plotting functions are now set as defaults. If the user does not provide custom options, the package will automatically apply these default settings, improving ease of use and flexibility for customizations. This will be further enhanced in the upcoming test case update.

Test Case Scripts#

  • Test case scripts have been relocated to a dedicated folder alongside the data folder. This structure allows for better organization and easier management of test data moving forward.

UNSTABLE#

  • Important Notice: This release is extremely unstable due to the fundamental changes in file paths and the overall structure of the package. Many old paths used to fetch functions have been broken, and some functions are still in the process of being integrated into the new structure.

  • It is advised to avoid using this release for anything beyond basic plotting functions. Upcoming updates will address these issues and re-implement missing functionalities, restoring full compatibility.

Future Work#

  • The immediate focus is to re-enable all core functions within the new package structure and restore their usability as quickly as possible.

  • Future updates will:

    • Move default options for Target and Taylor plots into a centralized configuration file.

    • Rework the Benthic_layer.py test case script to separate computational and plotting functions, which are currently intertwined.

    • Provide the option to save plot data in both .csv and .nc formats.

    • Optimize the data reading/setup scripts, with a focus on improving performance and finally integrating the interpolator into the Python ecosystem.


Version: 3.1.1#

Date: 19/05/2025

Summary#

Small patch for both analyser scripts regarding typos and a patch for both Target_plot.py and Taylor_diagrams.py functions’ scripts regarding a couple of bugs.

Taylor_diagrams.py#

Fixed a visualization issue for the Taylor_diagrams.py scripts for which the title would not be properly displayed in the saved image, the extension of the plot is extended to accommodate more space for the text.

Target_plots.py#

Fixed a bug due to which the yearly Target plot would be saved as a white image with nothing inside


Version: 3.1#

Date: 18/05/2025

Summary#

This update introduces a rework of the Whiskerbox and Violin plot functions and adds a new utility for streamlined variable extraction.

Plot Enhancements#

  • Whiskerbox and Violin plots have been restructured to follow the same logic and structure used in the other plotting functions.

  • These plots now support:

    • Automatic key extraction from nested dictionaries.

    • Dynamic title and label formatting via existing auxiliary functions.

New Helper Function: gather_monthly_data_across_years#

  • A new utility function, gather_monthly_data_across_years, has been implemented to facilitate data extraction across multiple years.

  • Currently tailored for box/violin plot input, but will be tested and adapted for wider use across additional plotting and computation workflows.

Future Direction#

  • Further optimize plotting routines for speed and clarity.

  • Begin reworking data loading and interpolation functions for faster runtime.

  • Explore full Python replacement of the current MATLAB Interpolato.m script.

  • Improvements of changelogs listing the new functions that are added in each update. The added function in the previous 3.x.x updates are:

    • ver 3.0.1:

      • In Corollary.py:

        • get_common_series_by_year (slices dataset based on years)

        • get_common_series_by_year_month (slices dataset based on years and months)

    • ver 3.0

      • All of Auxiliary.py (functions to aid for the necessary computations regarding the plotting function, contains statistics and other)

      • All of Target_computations-py (computations and normalisations necessary for the correct plotting of the Target Plots)

      • All of Taylor_computations.py (computations and normalisations necessary for the correct plotting of the Taylor Diagrams)

      • All of Density.py (bundles necessary density computations)

      • In Corollary.py:

        • extract_mod_sat_keys (allows for the identification/extraction of model and satellite dictionary keys)

Known Issues#

  • RMSD Label Placement: Labels are currently tied to a fixed first arc value. Further investigation into the SkillMetrics library is ongoing to determine how arc ranges are defined and whether label placement can be dynamically bound to them.

  • Static RMSD Arc Ticks: Taylor diagrams use the same arc ticks across plots. While this helps with consistency in test cases, dynamic adjustment would improve generality. Removing the tickrms override may solve this, but could also interfere with label alignment (see above).

  • Unexpected Target Plot Results: Initial performance scores from Target plots appear lower than anticipated. Ongoing testing will determine if this is a bug, data artifact, or an accurate model assessment.

  • Chlorophyll regression analysis occasionally produces anomalous values — further investigation is underway.


Version: 3.0.1#

Date: 18/05/2025

Summary#

This is a minor update focused on expanding DataFrame usability within the SST and CHL analysis scripts and improving dataset loading performance.

Expanded Use of Pandas DataFrames#

  • SST and CHL analysis scripts now fully leverage pandas DataFrames, enabling:

    • Seamless integration of the datetime dimension.

    • More efficient time-based slicing into monthly and yearly datasets using native pandas methods.

  • Enhances clarity and performance for long-term and seasonal trend analysis.

Faster Dataset Loading#

  • Introduced parallel loading of SST datasets using ThreadPoolExecutor from Python’s concurrent.futures module.

  • Significantly improves script runtime when dealing with large temporal datasets.


Version: 3.0#

Date: 18/05/2025

Summary#

This version introduces a major rework and optimization of the plotting functions used for model validation and comparison. It focuses on improving clarity, maintainability, and performance in both visual output and computational workflow.

New Taylor Diagrams and Target Plots#

  • Normalization: All monthly validation parameters are now normalized by their respective standard deviations, allowing for the unified display of all markers in a single diagram.

  • Marker Logic Update: Marker representations have been reworked based on a consistent logic [insert table when available].

  • Enhanced Visualization:

    • Taylor Diagrams: Now include RMSD arcs and repositioned RMSD labels outside the plot area to avoid marker overlap.

    • Target Plots: Include color-coded performance zones to quickly assess model accuracy and bias.

Violin Plots#

  • Introduced violin plots as an alternative to whisker-box plots.

  • Violin plots offer a smoother visual of data distribution but are less informative regarding outliers.

  • This plot type is included for completeness and comparative analysis.

Seaborn Integration#

  • Most plotting functions now utilize the Seaborn library.

  • Advantages include:

    • Better integration with pandas DataFrames.

    • More expressive and customizable visualizations.

    • Improved consistency across plots.

Separation of Computations#

A significant refactor has begun to modularize core functionality:

  • Extracted key routines from plotting scripts into a new Auxiliary script:

    • Label formatting (e.g., variable names, units).

    • Key identification from datasets.

    • Seasonal masks and data groupings.

    • Statistical calculations required for Taylor and Target diagrams.

    • Regression line generation (Huber, LOWESS, etc.).

This modularization paves the way for cleaner, more testable code in preparation for the final pytest integration.

Future Direction#

  • Further optimize plotting routines for speed and clarity.

  • Begin reworking data loading and interpolation functions for faster runtime.

  • Explore full Python replacement of the current MATLAB Interpolato.m script.

Known Issues#

  • RMSD Label Placement: Labels are currently tied to a fixed first arc value. Further investigation into the SkillMetrics library is ongoing to determine how arc ranges are defined and whether label placement can be dynamically bound to them.

  • Static RMSD Arc Ticks: Taylor diagrams use the same arc ticks across plots. While this helps with consistency in test cases, dynamic adjustment would improve generality. Removing the tickrms override may solve this, but could also interfere with label alignment (see above).

  • Unexpected Target Plot Results: Initial performance scores from Target plots appear lower than anticipated. Ongoing testing will determine if this is a bug, data artifact, or an accurate model assessment.

  • Chlorophyll regression analysis occasionally produces anomalous values — further investigation is underway.


Version: 2.11#

Date: 14/05/2025

Summary#

Whisker-box plots have been implemented for satellite Basin Average SST and CHL datasets.

Whisker Plots#

A new visualization tool — the whisker-box plot — has been added to both the SST and CHL analysis scripts. These plots provide a clearer view of statistical distributions, highlighting mean values and outliers in the Basin Average datasets. Their primary purpose is to enhance model performance evaluation by offering a more nuanced look at dataset variability.

Future Developments#

With the implementation of this feature, the 2.x development cycle is considered complete.

The next major update will initiate Version 3.0, which will focus on:

  • Refactoring all functions to improve computational efficiency and streamline logic.

  • Introducing new libraries, such as Seaborn, for more advanced and elegant plotting.

  • Ensuring result consistency, with side-by-side testing to confirm output reliability compared to previous versions.

  • Resolving known issues and bugs from earlier versions.

  • Expanding documentation, including:

    • Clearer comments and structure within the codebase.

    • A step-by-step guide for running the test case.

    • A pytest module to automate testing of computational functions.

Version 3.0 will mark a shift toward a more maintainable, scalable, and user-friendly project structure.

Known Issues#

  • Taylor Diagrams still use static RMSD ranges — dynamic scaling is planned.

  • Taylor and Target plots continue to depend on pre-defined .csv configuration files.

  • Chlorophyll regression analysis occasionally produces anomalous values — further investigation is underway.

  • LaTeX rendering in colorbar labels may break under certain conditions.

  • While dynamic colorbar scaling may improve usability, the current fixed scaling highlights extremes effectively; additional testing is ongoing.


Version: 2.10#

Date: 13/05/2025

Summary#

The Benthic Layer Analysis script has been expanded to version 2.0 with significant enhancements to data extraction, analysis, and visualization.

This update builds upon the initial version, adding functionality for the extraction and plotting of temperature and salinity data at the benthic layer. Additionally, the script now computes and visualizes the density field using three distinct equations of state, providing more accurate insights into deep water formation. As a result, the pressure field will no longer be included in the project, as the density field is deemed a more reliable representation of the evolution of dense water formation.

Key Enhancements:#

  • Temperature and Salinity Maps:
    The temperature and salinity values at the benthic layer are now extracted using a method similar to that employed for biogeochemical fields. These values are georeferenced and plotted using the same function as used for the biogeochemical species, ensuring consistent map generation.

  • Density Computation & Plotting:
    Temperature and salinity data are now processed to compute the density field, using the following three equations of state:

    • Simplified equation of state

    • Equation of State for Seawater (1980)

    • Thermodynamic Equation of State (2010)

    All three density fields are plotted using a fixed color range to allow for easy comparison of the differences between the equations of state.

Paper are provided in the Bibliography section of the README for the user to read to better understand differences in these 3 different Equations of State.

Visual Enhancements:#

  • Plots now feature fixed color ranges, making it easier to identify and interpret the phenomena illustrated by the maps and plots.

Future Developments:#

  • Ongoing improvements to the density computation.

  • Additional enhancements to data visualization and analytical functions.

Known Issues#

  • Taylor Diagrams still use static RMSD ranges — dynamic scaling is planned.

  • Taylor and Target plots continue to depend on pre-defined .csv configuration files.

  • Chlorophyll regression analysis occasionally produces anomalous values — further investigation is underway.

  • LaTeX rendering in colorbar labels may break under certain conditions.

  • While dynamic colorbar scaling may improve usability, the current fixed scaling highlights extremes effectively; additional testing is ongoing.


Version: 2.10#

Date: 10/05/2025

Summary#

Initial version (v1.0) of the Benthic Geochemical Analysis script completed.

Benthic Geochemical Analysis Script#

This first iteration of the Benthic Layer Analysis script introduces foundational functionality for exploring geochemical dynamics at the sediment-water interface using output from the BFM-NEMO coupled model.

Features:#

  • Computes the deepest active layer (out of 48 vertical layers) in each model grid cell across the domain.

  • Enables visualization of the model basin bathymetry and deepest layer distribution.

  • Allows users to select a chemical species from the simulation for spatial plotting.

  • Generates georeferenced 2D contour maps of selected species at the benthic interface, enriched with coastlines.

  • Default contour resolution is 51 levels (configurable in code) for the geochemical species, Benthic Depth plot uses 26.

Future Development#

This script represents the first half of the full analysis pipeline. Future updates will introduce:

  • Computation and visualization of the pressure field within the water column.

  • Diagnostic tools for investigating deep water formation processes in the Northern Adriatic Sea.

Known Issues#

  • Taylor Diagrams still use static RMSD ranges — dynamic scaling is planned.

  • Taylor and Target plots continue to depend on pre-defined .csv configuration files.

  • Chlorophyll regression analysis occasionally produces anomalous values — further investigation is underway.

  • LaTeX rendering in colorbar labels may break under certain conditions.

  • While dynamic colorbar scaling may improve usability, the current fixed scaling highlights extremes effectively; additional testing is ongoing.


Version: 2.9#

Date: 06/05/2025

Summary#

Introduced seasonal scatterplots for both Sea Surface Temperature (SST) and Chlorophyll (CHL) datasets to support more detailed seasonal analysis.

Seasonal Scatterplots#

  • Based on the insights from previous scatterplot analyses, new plots have been developed to break down basin-averaged values by season.

  • The data is first decomposed into seasonal subsets and visualized in individual season-specific scatterplots.

  • A combined scatterplot is also generated, consolidating all seasonal data and color-coding points according to their respective seasons.

  • Each plot includes:

    • A best-fit line

    • A Huber regression line for robust linear fitting (less sensitive to outliers)

    • A LOWESS (Locally Weighted Scatterplot Smoothing) non-linear regression line to highlight trends in densely clustered areas.

Known Issues#

  • The Taylor diagrams still use a fixed RMSD range; dynamic scaling is planned for a future update.

  • Taylor and Target plots continue to rely on static .csv files, which limits flexibility.

  • Some anomalous values have been observed in the CHL regression fits; further investigation is required to determine whether these are data artifacts or bugs.


Version: 2.8#

Date: 24/04/2025

Summary#

Updated the CHL analysis script and performed minor cleanup in the SST script.

Chlorophyll Analysis Script Updated to Version 2.0#

  • The CHL analysis script has been updated to align with the improvements made in the SST script.

  • Comments have been revised to more clearly identify level 3 and level 4 analysis sections.

  • Plots are now displayed for 3 seconds before closing and are saved in dynamically created folders.

Other Changes#

  • Added a print statement to the SST analysis script to inform the user when the BIAS has been computed.

Known Issues#

  • Similar to the SST plots, the CHL plots have a fixed range for the RMSD in the Taylor diagrams. Future updates aim to make the RMSD range dynamic for improved graph display.


Version: 2.7#

Date: 24/04/2025

Summary#

Updated the SST analysis script and fixed minor path-related issues.

Sea Surface Temperature Data Analysis Script Version 2.0#

  • The SST analysis script has been updated to version 2.0, enhancing user interaction through improved print statements and more effective plot handling.

  • Implemented dynamic path creation, enabling the automatic creation of ad-hoc folders to save plots, as seen in previous scripts.

  • Plots are now displayed for 3 seconds before automatically closing; this duration can be adjusted by modifying the value in the plotting functions.

Other Changes#

  • Removed the Plot outputs folder to declutter the project structure. Example outputs will be provided in a future update to the README.

  • Removed leap_year.py, as its functions have been migrated to Corollary.py.

  • Merged scatter plots, time series plots, and efficiency plots functions into a single script, Plots.py.

Known Issues#

  • The RMSD (Root Mean Square Deviation) in the Taylor diagrams is not currently displayed correctly. This issue stems from the .csv files provided in the folder. Future updates will aim to make the ranges dynamic for improved graph display.


Version: 2.6#

Date: 23/04/2025

Summary#

Introduced basin average computation and performed significant cleanup of old scripts.

Data Setupper Complete#

  • The data reading and setup script is now fully complete, with the addition of functionality to compute the daily mean basin average time series for SST datasets (both satellite and model).

  • The data saving script has been updated to support saving the new basin average data.

  • The CHL dataset still requires processing through the interpolator to compute its basin average.

Other Changes#

  • Conducted a major cleanup of the functions folder. All outdated functions have been removed, with the exception of the leap_year function, which remains in use for analysis scripts.


Version: 2.5#

Date: 23/04/2025

Summary#

Expanded data saving functionality to support model datasets.

Data Saver Expanded#

  • The Data_saver.py script has been extended with new functions to handle saving model datasets.

  • Due to the large size of the model data, it has been split into multiple files, with each file corresponding to a specific year.

Other Changes#

  • Added assertions for enhanced code stability and error handling.

  • Reorganized files and functions to improve code structure and maintainability.


Version: 2.4#

Date: 23/04/2025

Summary#

Introduced functions for reading model data and expanded the main script to support them.

Model Data#

  • Added necessary functions for reading model data, now integrated into a new script.

  • The main code has been updated to handle the newly added model data reading functionality.

Other Changes#

  • Reorganized functions for improved clarity and maintainability.

  • Added assertions to enhance code stability and error handling.


Version: 2.3#

Date: 23/04/2025

Summary#

Introduced the ability to import and apply a mask to align satellite and model data.

Mask Handling#

  • The model data contains NaN values for landmasses, so a mask has been implemented to ensure proper alignment between satellite and model datasets.

  • The mask is essential for the interpolation process, and the one used in the current test case will be provided at a later stage.

Other Changes#

  • Reorganized functions for improved code structure.

  • Created a corollary file to house additional functions that are not directly related to missing data handling or data reading.


Version: 2.2#

Date: 23/04/2025

Summary#

Introduced the Data_saver.py script to enhance data management and facilitate the transition to the interpolator.

Data Saving#

  • Two new steps have been implemented to allow users to save manipulated satellite datasets for validation or later processing.

  • Users can now save datasets in a dedicated folder in either .mat or .nc formats:

    • .mat files contain all necessary variables for interpolation.

    • .nc files offer flexibility for future expansion if needed.

  • Each saved dataset is timestamped with the date of the run for tracking purposes.

Other Changes#

  • Reorganized and moved functions to continue restructuring the codebase for improved modularity and maintainability.


Version: 2.1#

Date: 23/04/2025

Summary#

Introduced a dedicated script for reading satellite SST (Sea Surface Temperature) data.

Code Structure#

  • Added functions for reading satellite SST data, now integrated into the main script.

  • The overall structure remains streamlined, with new functionality focused on SST data handling.

Other Changes#

  • Refined and reorganized functions for improved clarity and maintainability.


Version: 2.0#

Date: 22/04/2025

Summary#

Reworked the Data_reader_setupper script from the ground up, with the script now moved to the home directory.

Code Structure#

  • Currently works only with the satellite CHL datasets.

  • The structure of the functions remains unchanged, but they have been moved to a dedicated SAT data script.

  • The user can now define which level of data to handle via an input line.

Other Changes#

  • Added assertions to improve code stability.


Version: 1.0
Date: 20/04/2025

Summary#

Initial implementation of core functionality up to the Data Analysis and Efficiency Metrics stage. The codebase is structured to handle both satellite and model data through two distinct scripts, with support for saving outputs in .nc and .mat formats.

Data Handling#

  • Satellite Data

    • CHL (Chlorophyll): Loads data into a single array and checks for missing fields. Requires the data level to be specified via a variable in the code.

    • SST (Sea Surface Temperature): Reads data and converts temperature values to Celsius.

  • Model Data

    • CHL: Retrieves and reads chlorophyll data from model outputs.

    • SST: Retrieves and reads sea surface temperature data; also computes basin-averaged SST values.

Code Structure#

  • Core functions are distributed across multiple files.

  • Interpolation is handled using a MATLAB script due to Python limitations in managing grids with adjacent cells of identical values.

    • This MATLAB script also computes the basin average for CHL data.

Analysis#

  • SST and CHL analyses are performed in separate scripts to mitigate RAM limitations.

  • Implemented analyses include:

    • Time series plots

    • Scatter plots

    • Taylor diagrams

    • Target diagrams

    • Multiple efficiency metrics (refer to Krause et al., 2005, as listed in the Bibliography)