Changelog#
Version: 4.10.4#
Date: 23/06/2025
Summary#
Sphinx documentation now available online, helper script added for test-case dataset download, and minor fixes applied to improve documentation consistency and test stability.
Sphinx Documentation#
A Sphinx-based documentation site has been deployed to consolidate all key project materials in one accessible HTML format.
Included in the site:
README.mdTEST_CASES_README.mdCHANGELOG.mdAutogenerated API documentation
Due to Markdown/HTML formatting differences, some files have duplicated versions to avoid rendering issues (e.g., image paths and layout inconsistencies).
Both the GitHub.mdand Sphinx.rstversions will be maintained in parallel going forward.
Test Dataset Downloader#
Because the test-case dataset is too large for GitHub’s LFS, it has been hosted on Google Drive.
To automate retrieval:
A script named
Download_data.pyhas been added in theTest_cases/folder.Running it will:
Download the zipped dataset
Unzip it in-place for immediate use
Best practice: Keep both zipped and unzipped versions locally to ensure uninterrupted access if the external link becomes temporarily unavailable.
Minor Fixes#
Docstring Refactor#
Plotting function docstrings have been reformatted:
Removed 2-column style (which broke HTML formatting)
Now compliant with Sphinx autodoc parsing
Import Fixes#
Relative imports have been fully replaced with absolute imports to prevent breakages during Sphinx auto-build.
Test Errors#
Some tests unexpectedly failed due to unclear causes (likely linked to recent layout changes)
These issues have been manually addressed and all test suites now pass.
✅ Project Status#
This marks the official completion of the project’s primary development phase.
Final milestone achieved. All deliverables are in place, documented, and operational.
Version: 4.10.3#
Date: 21/06/2025
Summary#
Documentation overhaul and minor hotfixes.
Documentation Updates#
New README.md#
The main README has been fully rewritten to better reflect the scope, usage, and structure of the project. It now includes:
Project overview and capabilities
Installation instructions (including dependencies like MATLAB and CDO)
Example usage and command-line options
Extended bibliography and reference list
Project health badges from Codecov, Codacy, and Codebeat
docs/TEST_CASES_README.md#
A dedicated markdown file has been added under docs/ to document the test cases:
Descriptions of each test case
Example output plots
Clarification on purpose and expected input/output
A future integration with Sphinx is planned to combine this with the
README.mdandCHANGELOG.mdinto a full API-style documentation._
Hotfixes#
plot_spatial_efficiency#
Fixed a layout issue affecting multi-year datasets (7+ years):
The recent refactor to improve handling of short datasets (e.g., 1 year) broke the centering logic for longer timelines. Plot layout is now adaptive and visually consistent across all timescales.
Version: 4.10.2#
Date: 21/06/2025
Summary#
Introduction of new tests
Report_generator tests#
With the aim to reach an acceptable coverage of the Report_generator more tests have been created
Version: 4.10.1#
Date: 21/06/2025
Summary#
Minor release focused on general hotfixes and improved cross-platform behavior.
Hotfixes#
PDF Opening in WSL#
The Report_generator’s --open-report functionality has been reworked for better compatibility with WSL environments.
The routine now falls back to xdg-open or os.startfile alternatives, depending on platform detection. This ensures PDFs open correctly across Linux, Windows, and WSL instances when using the CLI flag --open-report.
Taylor Diagram (Single Marker Bug)#
Fixed a rendering bug in the comprehensive_taylor_diagram plotting function:
When only a single marker is passed and
overlay="on"is set, the native SkillMetrics library failed to place it correctly.Now, a manual fallback mechanism adds the marker without using the looped overlay logic.
Version: 4.10.0#
Date: 21/06/2025
Summary#
Major release introducing a third submodule for automatic PDF report generation via CLI. Also includes CLI enhancements, helper utilities, and hotfixes across plotting and data handling functions.
Report generator CLI#
A command-line interface has been introduced to automatically generate a PDF report for model performance evaluation based on minimal input datasets.
Invoked via the GenerateReport entrypoint, this routine:
Accepts input via file paths or dictionaries
Runs SST/CHL-like analyses automatically (same ones proposed in the associated test cases)
Optionally compiles the results into a PDF (plots + summaries)
Always saves individual plots/dataframes, regardless of PDF output
Key CLI flags/options:
input: str or dict path(s) to data--output-dir: optional output path--check: validate structure only (no run)--no-pdf: suppress PDF creation--verbose: print run-time messages--open-report: open PDF post-run--variable,--unit: override plot labels--no-banner: suppress ASCII banner--version,--info: metadata displa
Report submodule#
All functions and classes for PDF report composition have been moved into a standalone submodule.
This enables advanced users to programmatically generate custom reports.
Includes layout tools, content templates, and internal page managers
Covered by 3 new testing suites:
Report classes
Report functions
Full report generation flow
Report helper functions#
To support the new submodule, various utilities were added or improved:
file_io.py
find_file_with_keywords: auto-match filenamesselect_3d_variable: extract usableDataArrayfromDataset
time_utils.py
is_invalid_time_index: check for broken time seriesensure_datetime_index: enforce time indexingprompt_for_datetime_index: prompt user to define one
utils.py
convert_dataarrays_in_df: safely convert 3D/2DDataArraytoDataFrame
Additional input label normalization ensures all common synonyms (mod, sim, obs, sat, etc.) are interpreted correctly.
Hotifxes#
plot_spatial_efficiency:Fixed handling of 1-column layouts
Improved colorbar/suptitle rendering
compute_fftandplot_spectral:Now skip ZeroDivision instances gracefully
CHL/SSTtest cases:Corrected geolocation error caused by inconsistent ocean masks
Future Work#
Next steps before final release:
Fully refactor the
README.mdwith diagrams and working image linksPublish the public test-case dataset
General hotfixes
Planned project deadline: June 22, 2025 (subject to change)
Version: 4.9.1#
Date: 13/06/2025
Summary#
Test case scripts have been updated to function as proper entrypoints, improving general usability.
Addition of flake8 lining.
New Entrypoints#
One of the final steps toward full project modularization has been completed:
The test case scripts — which illustrate example usages — have been refactored into callable entrypoints.
The updated names are as follows:
sst-analyzeto callSST_data_analyzer.pychl-analyzeto callCHL_data_analyzer.pybfm-analyzeto callBenthic_Layer.pydata-setupperto callData_reader_setupper.py
The installation of these new entrypoints in handled by the setup.py script and the pyproject.toml
Key changes:
Each script is now wrapped in a
main()function, called explicitly at runtimeFile paths are now relative to
__file__rather than the working directory (cwd), ensuring portability and reducing errorsAdditional verbosity has been added to clarify script behavior and improve interpretability for users and contributors
Flake8#
At an attempt to further ensure correct code sintax a lining for flake8 has been added in the ci.yml file.
Future Work#
Next steps before final release:
Integrate
argparseand build a__main__.pycontroller for the two primary modulesFully refactor the
README.mdwith diagrams and working image linksPublish the public test-case dataset
Final pass of module and import cleanup
Planned project deadline: June 22, 2025 (subject to change)
Version: 4.9.0#
Date: 11/06/2025
Summary#
Major update introducing climate data analysis utilities, .json save support for model data, and enhanced control over line plots. Minor cleanup and test alignment also included.
Climate Data Analysis#
To broaden the analytical capabilities of the project, new statistical tools have been added to the stats_math_utils module, primarily aimed at climate and long-term signal analysis. Each function is equipped with Timer logging and proper test coverage.
Newly added functions include:
detrend_linear,detrend_poly_dim— for lightweight linear and polynomial detrendingmonthly_anomaly,yearly_anomaly,detrended_monthly_anomaly— anomaly isolators for different timescalesnp_covariance,np_correlation,np_regression— NumPy-based correlation and regression metricsextract_multidecadal_peak,extract_multidecadal_peaks_from_spectra— signal diagnostics from spectral power densitiesidentify_extreme_events— simple threshold-based extreme event detection
.json Model Save Support#
The save_model_data function now supports .json output alongside already existing file types.
Updated plot_line Logic#
To address incorrect NaN handling by seaborn, the plot_line function now supports two rendering engines:
matplotlib(default) — maintains breaks in line plots for missing dataseaborn— can still be manually selected by settinglibrary='sns', but note it masksNaNvalues incorrectly in time series
This change ensures that all timeseries plots reflect missing data clearly and behave as expected under scientific studies.
Minor Cleanup#
Unified dataset and mask imports in
SST_data_analyserandCHL_data_analysertest scriptsAll imports are now grouped at the top for clarity and to avoid repeated disk reads
Future Work#
With the exception of critical bugs or fixes, this is the final major feature release before introducing CLI tools and project packaging. Final steps include:
argparseintegration and__main__.pysetupSetup of script entrypoints (test cases + main)
README.md refactor with updated figures and proper image paths
Final publication of the public test-case dataset
Planned project deadline: June 22, 2025 (subject to change)
Version: 4.8.8#
Date: 11/06/2025
Summary#
This patch delivers hotfixes to the dense_water_timeseries function and its associated tests, along with proactive adjustments for upcoming matplotlib changes regarding colormap handling.
dense_water_timeseries Enhancements#
The dense_water_timeseries function has been enhanced with two key additions:
A
savefigflag to enable conditional plot saving.An
output_pathparameter to define where plots are saved.
These improvements support streamlined figure generation and automation. The associated test suite has been updated accordingly to validate the new behavior.
Colormap Futureproofing#
In preparation for the deprecation of the cm submodule in matplotlib version 3.11, colormap access has been refactored from
cmap = cm.get_cmap("plasma")
to
cmap = plt.colormaps["plasma"]
This change ensures compatibility with upcoming versions and affects:
bfm_plots.pyformatting.py
Known Limitations#
A few deprecation warnings remain when running the full test suite. These are not critical and are due to:
Legacy test data usage that will be cleaned up during the final documentation/testing phase.
Upcoming
matplotlibchanges with insufficient current documentation.
These warnings will be addressed once clearer upstream documentation is available and regression testing is complete.
Future Work#
Extend logging and timing to remaining modules (
stats_math_utils,Data_saver)Introduce climate data analysis tools.
Finalize README.md overhaul and fix broken image paths
Publish a clean, public test-case dataset for reproducibility
Planned project deadline: June 22, 2025 (subject to change)
Version: 4.8.7#
Date: 10/06/2025
Summary#
Introduced logging and performance timing utilities to monitor function usage and track computational costs across the codebase.
Logging and Timing Utilities#
A new utility class, Timer, has been added to the time_utils.py submodule. This tool is designed to be integrated with all core functions to monitor execution duration and improve traceability.
Logging is now performed via two channels:
app.log: A traditional log file with human-readable messages and timestamps.eliot.log: A structured.jsonlog formatted for use with theeliot-treetool, enabling users to visualize computation flow in a hierarchical tree structure.
Nearly all functions in the Processing submodule have been wrapped with the Timer, with the exceptions of:
stats_math_utilsData_saver
These will be updated in upcoming patches after:
Logging is fully integrated into the remaing computational submodules.
Refactors to handle deprecations and new features (e.g., JSON export for model data in
Data_saverand extended analysis instats_math_utils) are completed.The
Timerand logging mechanisms are implemented without decorators, and instead integrated via manual indentation withinwhileloops or inline blocks.
While decorators would have led to cleaner function definitions, they do not support fine-grained logging of internal steps. This trade-off prioritizes thorough logging over code brevity.
Future Work#
Climate data analysis: Finalize the new analytical functions.
Documentation overhaul: Complete the
README.mdrework and finalize all documentation elements.Public test-case datasets: Ensure reproducibility by uploading a curated set of test data.
Extend logging utilities: Finalize integration of the logging/timing system across all remaining submodules.
Planned project deadline: June 22, 2025 (subject to change)
Version: 4.8.6#
Date: 09/06/2025
Summary#
This minor patch finalizes the in-script documentation, introduces fixes to input validation logic, and updates related test scripts accordingly. These refinements ensure greater robustness and compatibility of the testing infrastructure.
Documentation Expansion#
The in-script documentation for Plots.py and bfm_plot.py has been completed, marking the conclusion of this phase. All currently implemented modules and functions now include:
Descriptive docstrings
Clear inline comments
Logical code block headers
Test Case Hotfixes#
Test case scripts, particularly those related to the CHL and SST data analyses, were failing due to stricter input validation introduced in earlier patches. These issues have now been addressed, and test executions pass as expected.
Input Validation & Logic Hotfixes#
The following corrections were made to resolve issues introduced by overly strict or incorrect validation logic:
compute_density_bottomAdjusted threshold for
Bmostvalidity from>=1to>=0to support surface-level datasets.Corrected input validation: previously assumed input was a list during yearly iteration, now correctly handles input as a nested
Dict[Year][Month].
calc_densityRemoved incorrect check requiring the first dimension of
temperature/salinityto match that ofdepths. This mismatch is expected in many real-world cases.
compute_dense_water_volumeRemoved boolean-only check on
mask3d, as masks may also contain numeric information (e.g., depth).
Associated tests were updated or removed accordingly, resulting in a negligible change to overall test coverage, now reaching 95.19%.
Future Work#
New feature set for climate data analysis.
Documentation overhaul including a repaired and updated
README.mdPublic release of full test-case datasets to ensure reproducibility
Logging and timing utilities to be added for performance profiling
Planned project deadline: June 22, 2025 (subject to change)
Version: 4.8.5#
Date: 08/06/2025
Summary#
This minor patch continues test coverage expansion with two new suites focused on the plotting modules. Input validation has also been improved, particularly for output-related parameters.
Test Coverage#
Two new test suites have been added for the Plots.py and bfm_plots.py submodules, further enhancing overall test coverage.
With these additions, the project’s coverage has now reached 95.23%.
Note: due to the structure of the plotting functions and the extensione of test data used, the new testing suits take a while to complete.
Hotfixes & Enhancements#
Input Validation Improvements#
Improved input validation for
output_path,variable_label, andunit_labelin plotting functions.Replaced direct attribute access with
getattr(options, 'output_path', None)to safely check for the presence ofoutput_pathand provide clearer error messages when missing.Similar validation was added for
variable_labelandunit_labelto avoidAttributeErrorwhen default values were not properly handled, especially when generating LaTeX-formatted labels.Added clauses to skip empty data in the
dense_water_timeseriesfunction to avoid unnecessary plotting.
Plot Behavior Consistency#
Reverted the
intervalused inplt.pause()to a hardcoded value to ensure consistent behavior during both test execution and interactive use.This pause allows the user to confirm that plots are correctly generated before they are saved. However, for full plot inspection, users should refer to the saved image files in the specified
output_path.
Future Work#
Introduce new feature set for climate data analysis.
Complete a full documentation overhaul and repair the broken
README.mdUpload the full test-case dataset to support reproducibility and public testing
Begin adding logging and timing utilities to evaluate and profile performance
Planned project deadline: June 22, 2025 (subject to change)
Version: 4.8.4#
Date: 07/06/2025
Summary#
Extended coverage of testing script to Taylor and Target plotting functions.
Testing expansion#
New testing units have been created to test Taylor and Target plotting functions making progress into full project coverage.
Current coverage for these testing suits is greater than 90%.
Current overall testing coverage is
Future Work#
Finalize tests for all remaining plotting scripts (general plotting functions and bfm specific plots)
Introduce new feature set for climate data analysis.
Complete a full documentation overhaul and repair the broken
README.mdUpload the full test-case dataset to support reproducibility and public testing
Begin adding logging and timing utilities to evaluate and profile performance
Planned project deadline: June 22, 2025 (subject to change)
Version: 4.8.3#
Date: 07/06/2025
Summary#
This minor patch expands test coverage, removes legacy files, and introduces a new utility for numeric data validation.
Test Coverage#
All scripts in the Processing submodule have now reached >90% test coverage.
Contrary to earlier statements, new test modules have also been added for both model and satellite data reading functions. While these are still tailored to specific test-case datasets, their inclusion improves confidence and completeness in the test suite.
Deprecation of Legacy Files#
As part of a general repository cleanup, the Costants.py file has been removed. All constants previously defined in this module are now obsolete and unused by the current version of the codebase.
New Utility: Numeric Data Checker#
A new utility function, check_numeric_data, has been added to the utils.py submodule.
This function checks whether inputs are valid numeric types (e.g., integers, floats, NumPy arrays), which is essential for computational routines that assume numeric-only input.
Users are encouraged to validate their data with this function when unsure before using processing functions that rely on numeric inputs.
Future Work#
Finalize tests for all remaining plotting scripts
Introduce new feature set for climate data analysis.
Complete a full documentation overhaul and repair the broken
README.mdUpload the full test-case dataset to support reproducibility and public testing
Begin adding logging and timing utilities to evaluate and profile performance
Planned project deadline: June 22, 2025 (subject to change)
Version: 4.8.2#
Date: 06/06/2025
Summary#
This minor patch establishes groundwork for connecting the repository to multiple third-party tools aimed at enhancing documentation, code quality, and testing infrastructure. It also includes general hotfixes to both test suites and associated functions.
Repository Enhancements#
Building on the previously implemented Continuous Integration with Codecov, this update initializes support for several additional tools:
Sphinx and ReadTheDocs
These tools have been integrated to generate an HTML-based project documentation site. The initial build includes:README contents
CHANGELOG
Requirements
Test case descriptions
As part of this integration, the
CHANGELOG.mdfile has been relocated to thedocs/folder, along with a copy of therequirements.txtand all example images previously used in theREADME.md.
This causes the current rootREADME.mdto appear broken. This will be resolved during the final documentation overhaul near project completion.Codacy and Codebeat
These platforms have been linked to provide automated, objective feedback on code quality, including complexity metrics and maintainability suggestions.
The setup.py has also been expanded to encompass more general information.
Test Coverage Update#
Codecov-reported test coverage has increased to 59.91%. While the global improvement is modest, several key modules now exceed 90% coverage, including:
Efficiency_metricsData_alignmentData_saverfile_ioBFM_data_readertime_utilsTaylor_computationsMissing_datastats_math_utilsutils
These improvements are primarily due to expanded testing targeting input validation and edge cases.
General Enhancements#
All Processing modules that are currently tested have been updated with:
Enhanced docstrings and inline comments
Stricter input validation
Clearer error messages for robustness
Hotfixes#
Removed a partial duplication of the
convert_to_serializationfunctionUpdated relevant test cases to align with the revised logic
In
test_efficiency, redundant test dictionaries were moved into reusable setup functions to reduce complexity and improve readability
Future Works#
Patch and finalize all remaining untested or partially tested scripts
Introduce a new feature set for climate data analysis.
Complete documentation overhaul and fix the broken
README.mdUpload the public test-case dataset to support reproducibility
Planned project deadline: June 22, 2025 (subject to change)
Version: 4.8.1#
Date: 06/06/2025
Summary#
This minor patch finalizes the in-script documentation and completes input validation integration for all Processing submodules that have already undergone testing. These improvements enhance code clarity, safety, and maintainability.
Hotfixes#
Updated the following test scripts to accommodate recent input validation changes:
test_target_computationstest_taylor_computationstest_time_utilstest_utils
Version: 4.8.0#
Date: 06/06/2025
Summary#
This release introduces continuous integration (CI) with Codecov-linked testing, expands installation tools, and includes multiple hotfixes across test cases to ensure compatibility and consistency.
Codecov Integration#
To improve test quality and tracking, the Codecov tool has been integrated with the GitHub repository, offering automated test coverage analysis. The current test coverage is approximately 57%, primarily limited by missing tests for data readers and plotting routines.
Target coverage: 85–90% across all tested modules.
Current coverage highlights:
formatting.py: 71%BFM_data_reader.py: 80%Data_saver.py: 76%Density.py: 63%file_io.py: 82%
Installation Tools#
Multiple files have been added or enhanced to support proper installation of the package:
requirements.txtpyproject.tomlsetup.pyMANIFEST.in
Entrypoints have been initialized, with plans to enable:
Running test case scripts as CLI entry points.
Executing main processing/plotting functions through command-line interfaces in future releases.
Hotfixes & Documentation#
Expanded docstrings, inline comments, and input validation across modules.
Fixed issues in tests involving
matlab.engineto align with CI and ensure accurate coverage tracking under Codecov.
Future Work#
Finalize in-script documentation across all modules.
Extend test suite to improve test coverage to >85%, especially in:
Plotting functions
Satellite/model data readers
Remaining utility modules
Version: 4.7.1#
Date: 05/06/2025
Summary#
This patch introduces several hotfixes along with expanded documentation and enhanced input validation for the Density.py module.
Enhancements#
Expanded inline comments, docstrings, and added input validation in
Density.pyfor improved readability and reliability.
Hotfixes#
Adjusted
dummy_Bmostin tests to use 0-based indexing instead of 1-based.Updated
test_get_common_series_by_year_month_invalid_structureto raise a TypeError (a formal fix to reflect the expected behavior).Removed the 12-month validation check in
data_alignment.pyfunctions to allow compatibility with smaller datasets.
Version: 4.7.0#
Date: 05/06/2025
Summary#
The Data_saver.py script has been updated to improve clarity, robustness, and flexibility for users.
Enhancements#
Expanded documentation with improved docstrings, inline comments, and input validation to ensure safer use.
Two new utility functions added to support saving data in
.jsonformat, broadening the script’s export capabilities:convert_to_serializable: Converts standard Python objects (including NumPy arrays and datetime types) into JSON-conformant formats.save_variable_to_json: Handles the actual save operation to a.jsonfile.
Version: 4.6.3#
Date: 05/06/2025
Summary#
The data_alignment.py script has been updated with more extensive documentation and input validation enhancements.
Enhancements#
Expanded and clarified all docstrings to better describe function behavior and expected inputs.
Added comprehensive inline comments to improve code readability and maintainability.
Implemented general input validation across all functions to ensure proper usage and clearer error reporting.
Version: 4.6.2#
Date: 05/06/2025
Summary#
The BFM_data_reader.py submodule, along with its corresponding pytest, has been updated, cleaned, and enhanced with improved documentation and validation.
Documentation Expansion#
Performed a comprehensive rework of
BFM_data_reader.py, including:Expanded and clarified docstrings.
Improved inline comments for better maintainability and readability.
Extensive input validation checks to ensure robustness and user feedback.
Updated
test_bfm_data_reader.pyto align with the new input validation logic and avoid testing conflicts.
Version: 4.6.1#
Date: 05/06/2025
Summary#
This hotfix addresses updates to keep testing aligned with recent functional updates and improves robustness through better input validation.
Hotfixes#
Updated the following test modules to reflect recent changes:
test_efficiency_metrics.pytest_stats_math_utils.pytest_utils.py
Minor fixes were applied to associated functions to:
Ensure consistent input validation
Improve edge case handling
Maintain compatibility with expanded testing coverage
Version: 4.6.0#
Date: 04/06/2025
Summary#
This update introduces a new section focused on temporal and spectral analysis of error components (mean bias, unbiased RMSE, standard deviation error, and cross-correlation). Additionally, it fixes a critical issue in the standard_deviation_error computation.
Temporal Analysis#
The new functionality computes the time evolution of key error metrics using 2D daily mean datasets. These time series are then correlated with cloud coverage percentages to explore how cloud cover affects model performance. This analysis is handled via:
compute_error_timeseriescompute_stats_single_time
Both located in theEfficiency_metrics.pysubmodule and supported by statistical tools fromstats_math_utils.py.
New plotting utilities have been introduced to visualize these insights.
Spectral Analysis#
Two types of spectral analysis have been introduced:
Power Spectral Density (PSD)
Cross Spectral Density (CSD)
These help identify dominant temporal frequencies and assess relationships between error signals and cloud cover.
All related computations are located in stats_math_utils.py.
Hotfix#
Fixed incorrect computation logic in the
standard_deviation_errorfunction, which previously returned incorrect values.
Future Works#
Add test coverage for new functions introduced in this update.
Continue expanding and refining in-script documentation for improved clarity.
Rework and expand the
README.mdfor clearer user guidance.Complete the packaging setup:
AUTHORSMANIFEST.inpyproject.tomlrequirements.txtOptional
environment.yml
Version: 4.5.0#
Date: 04/06/2025
Summary#
This update introduces a complete pytest suite to validate the project’s functionality and ensure stability. It also includes several minor hotfixes to align behavior with intended design and pass the newly added tests.
Pytest implementation#
A new Pytests folder has been added under the Test cases directory. This folder contains all the pytest scripts created to test nearly all implemented functions across the project. Each test includes a concise comment explaining its purpose and additional inline documentation clarifying the logic and expected results.
NOTE: Pytests were not added for the model/satellite data loading modules and the plotting functions due to the complexity of mocking paths and structured datasets. These components are already validated through the main test-case scripts (
SST,CHL,Benthic). However, helper functions used within those modules are fully covered — e.g., plotting helpers are tested intest_formatting, and file-related logic is covered intest_file_io.
Hotfixes#
Several functions were hotfixed to:
Restore correct behavior where recent changes introduced inconsistencies.
Ensure compatibility and accuracy in alignment with test coverage.
Standardize outputs and edge-case handling across the board.
Future Works#
Expand and refine in-script documentation for improved clarity and maintainability.
Rework and expand the
README.mdto better guide users through installation and usage.Implement remaining files required for complete user installation:
AUTHORS,MANIFEST.in,pyproject.toml,requirements.txt, and optionalenvironment.yml.Introduce a more detailed error decomposition in the temporal analysis (timeseries).
Begin exploration of frequency-domain error analysis using Fast Fourier Transform (FFT).
Version: 4.4.0#
Date: 02/06/2025
Summary#
This minor release introduces cloud coverage analysis and improves the spatial efficiency plotting system. It also formalizes the resampling process into a reusable function for streamlined time-based analysis.
New Feature: Cloud Coverage Timeseries#
The % of cloud cover over the basin is now plotted alongside the temporal bias in a dedicated timeseries figure.
The plotting function has been updated to split the output into two plots:
Plot 1: Standard timeseries metrics (e.g., observed vs. modeled values).
Plot 2: BIAS (moved from Plot 1) and the new cloud coverage percentage, visualized using:
Raw data,
7-day running mean,
30-day running mean.
Pearson correlation coefficients are computed and printed for each version using the BIAS and the cloud coverage.
Resampling Function#
The previously inline resampling logic has been extracted into a standalone function.
This function is now part of the
time_utils.pysubmodule and supports consistent and efficient time aggregation for monthly and yearly analyses.
Spatial Efficiency Plot Enhancements#
Subplot label alignment and positioning have been adjusted to improve readability and reduce clutter.
All previously hardcoded configuration values (e.g., vmin/vmax, colormap, layout) have been moved into a default options file for easier maintenance and customization.
Colorbar unit labels now support LaTeX-style formatting, improving clarity and visual consistency across plots.
Future works#
Unless a further expansion of the project is required the future patches will focus on improving the documentation by adding extensive comments to the existing functions.
Version: 4.3.2#
Date: 02/06/2025
Summary#
This patch extends the spatial performance evaluation capabilities to include yearly average datasets, alongside enhancements to plotting, code robustness, and minor hotfixes.
Yearly Spatial Performance#
The compute_spatial_efficiency function has been adapted to support yearly performance evaluation, in addition to the existing monthly metrics.
The
SST_data_analyser.pyandCHL_data_analyser.pyscripts now include examples for yearly spatial analysis.Supporting functions in
stats_math_utils.pywere modified to accept flexibletimeinput handling via a new argument.Result dictionaries have been expanded to structure yearly results for future use and reference.
While the data is stored, plotting remains the primary method of interpreting 2D spatial performance outputs.
Enhanced Plotting#
The plotting function has been upgraded to dynamically adapt to the number of metrics:
The figure layout now prefers a square-like shape (2×N for ≤4 plots, 3×N for >4 plots).
Rows with fewer plots than the max are center-aligned for improved visual balance.
Added fail-safes in filename generation to replace illegal path characters (e.g.,
/in metric titles).Improved overall layout and clarity, with better visual spacing and text placement.
Hotfixes#
Removed incorrect
(°C)label fromCHL_data_analyser.py, left over from SST-related code.Resolved a bug in
plt.savefig()where slashes (/) in titles were misinterpreted as folder paths, breaking the save process.
Future Work#
Planned improvements for the next patch include:
Externalizing all
plot_spatial_efficiencyconfiguration options to a dedicated default file.Delegating unit label formatting to the
format_unit()function with full LaTeX support.Refactoring the monthly resampling code block into a standalone reusable function.
Introducing cloud coverage timeseries plots.
Separating the timeseries BIAS plot from the current gridspec (gs) layout to align it with cloud coverage analysis.
Version: 4.3.1#
Date: 01/06/2025
Summary#
Hotfix release addressing issues in value range handling for spatial metrics plots, along with improvements to the visualization layout and inline documentation for test scripts.
Plot Enhancements#
Subplot layout adjusted to 3 columns × 4 rows (previously 4×3) for improved readability.
Added coordinate labels and grid lines to maps; improved geolocation accuracy.
Colorbars repositioned to the bottom of the page for consistency across metrics.
Introduced a custom colorbar for the cross-correlation metric.
Enhanced plot text rendering for better clarity.
Added support for saving plots to a specified output directory.
Hotfixes#
Resolved an issue where
vminandvmaxvalues were not correctly propagated to the plotting function, resulting in inconsistent value ranges across plots.
Test Case Updates#
Added explanatory comments to the
SST_data_analyser.pyandCHL_data_analyser.pyscripts to improve readability and ease of use.
NOTE – Commit 797b3ab:
An earlier commit message incorrectly stated the removal of an import. In reality, the update introduced a higher-resolution coastline and adjusted the land mask facecolor in map plots. Future commit messages will be reviewed more carefully to avoid misreporting changes.
Version: 4.3.0#
Date: 01/06/2025
Summary#
This new version introduces a new section of the project focused on spatial performance analysis for the Sea Surface Temperature (SST) and Chlorophyll (CHL) fields. The SST_data_analyser.py and CHL_data_analyser.py test scripts have been expanded to demonstrate the new workflows.
Spatial Performance Analysis#
As a natural evolution of the project, spatial performance evaluation has been added to complement the existing temporal analysis. This feature is specifically designed for physical parameters retrieved from NEMO simulations, namely SST and CHL.
Dataset Structure#
Each metric is aimed to support two types of datasets:
Monthly Composite Averages (e.g., all Januaries, all Februaries, etc.)
Yearly Averages (e.g., 2000, 2001, etc.)
These datasets are created by resampling interpolated outputs to regular daily values into regular monthly values, which are then used to create the final datasets illustrated above.
Resampling Methods#
Three approaches (two primary) are supported for generating the monthly datasets:
Xarray + ThreadPoolExecutor
Usesxarray.resamplewith chunking and multithreading for in-memory operations.Currently a standalone code block in test scripts—planned for functional integration in future updates.
CDO-Based Resampling
If installed,Climate Data Operators (CDO)can be used via direct command-line calls from the scripts.Detailed documentation for this option will be included in the upcoming 5.0 README update.
NOTE:
Monthly datasets are required for this new spatial analysis section.
To ensure full usability of the test case scripts without requiring CDO installation, a precomputed monthly dataset is included in the test casedata/folder. Skipping this step will prevent the spatial analysis from running as intended.
Efficiency Metrics#
Five metrics are available for spatial performance evaluation:
Mean Bias
Standard Deviation Error
Raw Standard Deviation
Cross-Correlation
Unbiased RMSE
These can be visualized in 3×4 multi-panel plots.
Future Work#
Integration of yearly spatial performance evaluation
Enhanced plotting: improved colormaps and presentation
Rework of cloud coverage timeseries and
timeseries()functionFinal cleanup pass for docstrings and comments
CLI support with
argparse, structured logging, and function-level testing
Version: 4.2.9#
Date: 30/05/2025
Summary#
This patch addresses a series of critical issues introduced in previous updates. All hotfixes have been applied to restore functionality.
Hotfixes#
Benthic_physical_plots:
Fixed mismatch in the number of values extracted fromget_benthic_plot_parameters(increased from 7 to 8).get_benthic_plot_parameters:
Correctedswifsusage for theO2ofield which was incorrectly set toTrue, now set toFalse.fill_anular_region:
Reverted frompolygonback toax.fillto ensure compatibility with Cartesian-basedSkillMetricsplots.Data_saver.py:
Fixed import errors caused by typos intypingandpathlibimports.MOD_data_readerandSAT_data_reader:
Resolved typo in thetypingimport statement.eliminate_empty_field:
Fixed an issue withNaNslicing. A warning is currently ignored, as it does not affect output and will be addressed in a future update.check_missing_days:
Corrected the printed start year in timeline messages (from 1988 to 2000). This was a display-only issue and did not affect the underlying data.get_season_mask:
Fixed an error that occurred when attempting to convert an already-existing array, preventing unnecessary conversion.
Validation#
All test case example scripts executed successfully, and no additional runtime errors were detected.
Version: 4.2.8#
Date: 30/05/2025
Summary#
This patch continues the series of reworks focused on improving code clarity, stability, and performance, specifically targeting the formatting.py submodule and related functions.
General Enhancements#
Refactored –
formatting.py:
All functions in the submodule have been cleaned up and optimized for improved readability and reliability.Function Updates:
format_unit: Further optimized to ensure accurate and consistent label generation.get_variable_label_unit: Now depends onformat_unitfor standardized output.fill_anular_region: Reworked to usepolygoninstead offillfor target plot rendering.get_min_max_for_identity_line: Vectorized to enhance computational performance.compute_geolocalised_coords: Updated to usenp.arangefor faster coordinate computation.swifs_colormap: Rewritten for clarity, with improved handling of default cases and addedTypingfor return values.get_benthic_plot_parameters: Cleaned up and typed for better readability and stability.
Future Work#
Note:
This update concludes the current patch cycle aimed at optimizing and cleaning up existing code, in the following patches the spatial performance related functions will be implemented.
The next major revision will include:
Final pass for comment and docstring consistency
Integration of the
argparselibrary for CLI supportStructured logging implementation
Unit testing of main computation functions
Version: 4.2.7#
Date: 30/05/2025
Summary#
This release includes a refactoring of submodules functions to enhance code clarity, stability, and maintainability, alongside minor performance optimizations.
General Enhancements#
Refactoring –
stats_math_utils.py:
Cleaned and reorganized function implementations for improved stability and readability.
Variable names have been updated to remove assumptions tied to specific data sources (e.g., model vs. satellite).
Comments and docstrings have been revised to improve clarity and documentation quality.Optimization –
compute_coverage_stats:
Slight improvements to input handling for better robustness and reduced likelihood of runtime errors.Cleanup –
time_utils.py:
Function declarations have been cleaned and input validation messages rewritten for clarity.
Optimizations were applied to reduce nested loops and improve performance.Improvements –
utils.py:
Enhanced error messages and streamlined function parameters.
Refactored internal logic to eliminate unnecessary nested loops and improve efficiency.
Version: 4.2.6#
Date: 30/05/2025
Summary#
This release includes cleanups and performance improvements for data reading modules, enhancing readability, reliability, and execution speed.
General Enhancements#
Refactoring –
MOD_data_reader.pyandSAT_data_reader.py:
Functions in these submodules have been refactored to improve code clarity and reduce computational overhead.Input Validation Improvements:
Replaced outdatedassertstatements with explicitRaiseErrorsfor more robust input validation and clearer error handling.
Version: 4.2.5#
Date: 30/05/2025
Summary#
This release provides cleanup and performance improvements in the Efficiency_metrics.py submodule, as well as bug fixes affecting test cases in Benthic_layer.py.
General Enhancements#
Refactoring –
Efficiency_metrics.py:
Functions within this submodule have been cleaned up for improved readability and maintainability.
Optimization efforts targeted reducing computational time, particularly by minimizing nested loops.
Bug Fixes#
Benthic_physical_plotBug:
Fixed an issue where an incorrect number of variables was extracted during the execution of theget_benthic_plot_parametershelper function.compute_dense_water_volumeBug:
Resolved a problem where thevalid_maskparameter was not being passed correctly to thecalc_densityhelper function.
Version: 4.2.4#
Date: 29/05/2025
Summary#
This release focuses on improved documentation and minor optimizations within the file_io.py module.
General Enhancements#
Expanded Docstrings:
Function docstrings withinfile_io.pyhave been updated to provide clearer, more exhaustive information for end users and developers.mask_readerOptimization:
Refactored to usesliceinstead ofsqueezefor removing extra dimensions, ensuring safer array handling.call_interpolatorCleanup:
General code cleanup and minor optimizations to improve readability and maintainability.
Version: 4.2.3#
Date: 29/05/2025
Summary#
This update refactors the Density.py submodule to enhance performance, maintainability, and numerical stability in density-related computations.
Enhancements#
Code Cleanup and Typing Improvements:
Functions within theDensity.pysubmodule have undergone general cleanup. Input types are now explicitly defined usingtypingannotations, and docstrings have been expanded for improved clarity and documentation.NaN Value Handling:
A general masking step has been added to thecalc_densityfunction to filter outNaNvalues during computation, increasing reliability.
Dense Water Mass Computation#
Function Rework –
compute_dense_water_volume:
The function has been restructured to utilize existing utility functions for reading.gzdatasets, promoting code reuse and modularity.Loop Optimization:
Deprecated nestedforloops have been replaced with efficientNumPyoperations such aswhereandbroadcast, significantly improving performance.
Note:
Since the recent changes primarily involve refactoring, minor fixes, and performance optimizations, with no major new features, the versioning has been adjusted to reflect these as patch-level updates. The old versions 4.3.0 and 4.4.0 have been renamed to 4.2.1 and 4.2.2.
Version: 4.2.2#
Date: 29/05/2025
Summary#
This patch includes refactoring of the Data_saver.py submodule to enhance performance, stability, and input flexibility.
Enhancements#
Refactored Data Saving Functions:
Functions in theData_saver.pysubmodule have been optimized for improved path handling. File paths can now be provided as eitherPathobjects orstr, offering greater flexibility.Improved Error Handling:
Replaced legacyassertstatements with explicitRaiseErrorsto provide clearer and more reliable error reporting.
Version: 4.2.1#
Date: 29/05/2025
Summary#
This update focuses on performance and reliability improvements within the data_alignment.py submodule.
Enhancements#
Refactored
apply_3d_maskFunction:
Theapply_3d_maskfunction has been reworked to simplify broadcasting across datasets, improving clarity and maintainability.Improved Input Validation:
AddedRaiseErrorsto enforce correct input types and shapes, ensuring more robust error handling and early failure detection.
Version: 4.2.0#
Date: 29/05/2025
Summary#
This release introduces a new computation feature: the percentage of cloud coverage and the percentage of available data within a basin. These enhancements lay the groundwork for more detailed analyses in future updates.
New Features#
Coverage Statistics Calculation:
A new function,compute_coverage_stats, has been added to thestats_math_utils.pysubmodule. It calculates:Percentage of available data in a basin
Percentage of cloud coverage
These metrics are intended for future visualization alongside time series data plotted using the
timeseriesfunction.
Fixes & Improvements#
Dependency Cleanup:
Removed deprecated libraries and modules from theSST_data_analyser.pytest script.Bug Fix – Dataset Selection:
Resolved an issue in theread_model_datafunction that caused incorrect dataset selection under certain conditions.
Version: 4.1.0#
Date: 28/05/2025
Summary#
Version 4.1.0 introduces a full refactor of the Data_reader_setupper.py test case, integrating the Interpolator_v2.m MATLAB script and consolidating data reading and saving functions for both satellite and model datasets. This release also marks the formal deprecation of L4 data handling, shifting all analysis to L3s datasets.
Major Changes#
Data_reader_setupper.py#
Complete refactor of the test case script
Integrated
Interpolator_v2.musingmatlab.engineUnified data reading and saving logic for satellite and model datasets
All previously used reader functions are now deprecated
File management is critical: avoid moving files while the interpolator is running
Unified Data Reading Functions#
MOD_data_reader.py & SAT_data_reader.py#
Each script now contains a single function for reading model or satellite datasets
Handles both
chlandsstvariables via the newvarnameargumentFuture work will focus on improved robustness for irregular key names (e.g.,
adjusted_sea_surface_temperaturein CMEMS)
Missing_data.py#
Rewritten to eliminate dependency on external constants
Now runs autonomously and prepares for future optimizations
Data_saver.py#
Merged functions into dedicated
save_model_data()andsave_satellite_data()routinesSimplifies saving of processed and interpolated data
Interpolator_v2.m Integration#
Executed from Python using
matlab.engineNew helper:
call_interpolator()File structure and paths must remain unchanged during execution
setup.pyandMANIFEST.inupdated to include required MATLAB files during installation
Deprecations#
L4 Data: Now officially deprecated. Support removed from both
CHL_data_analyzer.pyandSST_data_analyzer.pyLegacy Reader Functions: Replaced by new unified readers
Minor Fixes & Adjustments#
Adjusted file names and dictionary keys in:
CHL_data_analyzer.pySST_data_analyzer.py
Minor typo corrections in
Benthic_layer.pyFixed label positioning in
Taylor_diagrams.pymonthly plot function
Future Work#
Further optimization and cleanup of
Data_reader_setupper.pyfunctionsGeneral cleanup of unused files in the GitHub repository
Begin development of spatial performance analysis modules
Version: 4.0.0 - OFFICIAL RELEASE#
Date: 25/05/2025
Summary#
This marks the official release of version 4.0.0. The Benthic_layer.py test case and all related BFM function scripts have been fully overhauled and are now operational. Additionally, setup.py has been updated to include all necessary dependencies for full functionality.
Benthic_layer.py Functionality#
While the high-level purpose of the Benthic_layer.py script remains the same, its underlying functions have been entirely restructured. All previous versions are now deprecated.
Modular BFM Function Scripts#
All functions related to BFM simulation variable handling are now organized in a dedicated module folder. This improves clarity, reuse, and integration with the rest of the project. Below is a breakdown of the updated logic and implementations.
Geolocalization#
Introduced
geo_coordsfunction to convert raw Cartesian coordinates into geodetic coordinates (longitude, latitude) and Eulerian angles (φ and λ).Uses input horizontal resolution in degrees.
Bottom Layer Computation#
New functions:
compute_Bmost()andcompute_Bleast()to extract benthic and surface layers, respectively.Visualization enhancements include:
deepcolormap fromcmoceanOptional 3D rendering using Plotly (
surfaceand3dmesh), paving the way for advanced deep water mass analysis.
Temperature and Salinity#
Functions now support parallel loading for improved performance.
Datasets are cached for reuse in subsequent density calculations.
Plotting options externalized for customization.
Colormaps:
Temperature:
cmocean.thermalSalinity:
cmocean.haline
Density#
Computation logic extracted into a dedicated function.
EOS-80 remains the default method, aligning with simulation standards.
Optional methods retained for future flexibility.
Plotting colormap:
cmocean.dense
Dense Water Masses#
New functionality added for detection and visualization of dense water masses (threshold: 1029.2 kg/m³, per Oddo et al.).
Enhancements:
2D maps with black contour overlays
3D volume estimation via cell counting (800×800×2 m³)
Timeline plotting of deep water formation across all methods
Chemical Species#
Data loading and plotting routines are now fully separated.
Parallel loading avoided due to memory limitations of large
.ncfiles.All plots use logarithmic color scaling with
cmoceanandmatplotlibcolormaps.Enhanced colorbars and axis labeling for clarity.
Oxygen#
Uses scalar steps and
cmocean.oxycolormap.Thresholds:
Hypoxia: < 62.5 mmol/m³
Hyperoxia: > 312.5 mmol/m³
Thresholds adjustable via the
formatting.pymodule.
Chlorophyll-a#
Logarithmic steps
Colormap:
viridis
N-family (Nutrients)#
Logarithmic steps
Colormap:
YlGnBu
P-family (Primary Producers)#
Logarithmic steps
Colormap:
cmocean.algae
Z-family (Secondary Producers)#
Logarithmic steps
Colormap:
cmocean.turbid
R-family (Particulate Organic Matter)#
Logarithmic steps
Colormap:
cmocean.matter
Future Work#
The refactor of the Benthic_layer.py suite is now complete. Planned next steps include:
Integration of L3S Sea Surface Temperature data, replacing L4 due to underperformance.
Enhancements to the SST reading functions, with improved handling of missing or invalid data.
General performance optimization and code cleanup across all test case scripts.
Version: 4.0.0-δ - UNSTABLE#
Date: 23/05/2025
Summary#
This update brings the Data_reader_setupper.py test case script back online, resolving previous issues related to function imports and path handling.
Data_reader_setupper.py Reactivation#
The script
Data_reader_setupper.pyis now functional again.Legacy hardcoded paths have been replaced with dynamic internal paths, removing the need for manual path extensions via
sys.path.append.This change enhances modularity and reduces the risk of import errors during execution or testing.
Note:
While the script is operational, optimization of the functions used within it has been deferred to a future update. The focus will shift to performance improvements once theL3sSea Surface Temperature data integration is underway.
Version: 4.0.0-γ - UNSTABLE#
Date: 23/05/2025
Summary#
This incremental update completes the rework of plotting functions used in the SST and CHL analysis workflows by updating the Taylor Diagrams and Target Plots plotting and computational scripts following the same conventions introduced in the previous release.
Function Headers and Documentation#
Both
TaylorandTargetplotting functions now include detailed headers and inline comments designed to enhance code readability and provide clear guidance on usage.These headers document function purpose, input arguments, return values, and expected keyword arguments.
Computational Function Improvements#
The underlying computation functions supporting these plots have been refined to:
Employ
itertoolsfor efficient iteration where applicable.Replace all
assertstatements with explicitRaiseErrorsto ensure robustness, even in optimized Python execution modes.Include standardized function headers; comprehensive inline comments will be introduced in a subsequent update.
Future Work#
The refactoring of analysis test case functions will be temporarily paused, with plans to revisit optimization efforts at a later stage.
Next steps include fixing the
Data_reader_setupper.pytest case script to align with updated function paths; however, function re-optimization will wait until integration of theSea Surface TemperatureL3sdata is complete.Subsequently, the
Benthic_layerscripts will be corrected and overhauled to improve computational performance and extend functionality, including the planned calculation of deep water formation volumes.
Version: 4.0.0-β - UNSTABLE#
Date: 22/05/2025
Summary#
This beta release continues the structural and functional overhaul of the project. Key updates include replacing assert statements with explicit RaiseErrors for robustness, the full integration of default plotting options, and improved documentation through consistent function headers. Additionally, the SST and CHL analysis test cases are now operational again following updates to the internal function paths.
This version is still UNSTABLE. While core scripts for SST and CHL analysis are functioning, most other scripts remain incompatible due to unrefactored paths and legacy syntax. Use is advised only for testing specific updated modules.
Major Changes#
RaiseErrors Replace assert Statements#
All validation previously handled through
assertstatements has been replaced withraise ValueError(...)or appropriate exceptions.This ensures checks remain active even when scripts are executed in optimized (
-O) mode, increasing the robustness and reliability of the library at the cost of some runtime performance.
Default Plotting Options Refactored#
Plotting functions used in
SST_data_analyzer.pyandCHL_data_analyzer.pynow fully rely on centralized default options.Legacy hardcoded options have been moved to a dedicated defaults file, allowing users to override or extend behavior more flexibly.
The default
dpiremains set at 300 to maintain publication-quality output, but this will be lowered in the final release for faster rendering.
Function Headers and Documentation#
All plotting functions (excluding Taylor and Target plots) and newly added scripts now include comprehensive headers:
Function purpose
Expected inputs and return types
Supported keyword arguments (
kwargs)Example usage
This marks the beginning of a broader documentation effort to improve code clarity and onboarding for new contributors.
Reactivated Test Cases#
Both
SSTandCHLtest case scripts are now functional again after internal path corrections.The
setup.pyfile has been updated accordingly, though users are still advised to install missing dependencies manually for full compatibility.
Fixed Issues#
Taylor Diagram Tick Labeling
RMSD ticks are now configurable via a
tickRMSparameter.The first tick value determines both the tick spacing and the RMSD label position, resolving longstanding issues of fixed/static placement.
Validation of Target and Regression Plot Behavior
Following extensive review and expert consultation, the anomalous behavior observed in
Target PlotsandRegression LinesforL4 CHLdata is confirmed to be data-driven, not a bug.Validation artifacts are present in the dataset itself; a
pytesttest suite will be released in the near future to systematically verify these findings.
Future Work#
Near-Term Roadmap#
Add headers and docstrings to
TaylorandTargetplot functions, improving readability and consistency.Refactor internal
forloops usingitertoolsto reduce redundancy and optimize performance.
Upcoming Feature Development#
Begin refactoring of the
Benthic_layer.pytest script:Modularize computation and plotting functions
Implement monthly volume calculations for deep water formation
(based on upcoming work from Oddo et al.)
Add functionality to export plotting data as both
.csvand.ncfiles (currently postponed due to priority conflicts).Launch support for
L3sdata in Sea Surface Temperature analysis.This will deprecate support for
L4data due to unsatisfactory reliability and quality of results.
Version: 4.0.0-α - UNSTABLE#
Date: 20/05/2025
Summary#
This release marks a major overhaul of the project, transitioning it from standalone scripts into a fully modular Python package. The deprecated Corollary.py and Auxilliary.py scripts have been restructured and their functions relocated to more logically organized modules. Several changes to the package structure and default plotting options are introduced to enhance usability and maintainability.
Major Changes#
Hydrological_model_validator as a Python Package#
The core structure of the
Hydrological_model_validatorproject has been reworked into a Python package, making it easier to install and use as a library.A new
Setup.pyscript has been introduced, enabling installation of necessary dependencies. However, as not all dependencies are included, manual installation of additional libraries (as listed in the README) is still required.The
ProcessingandPlottingmodules have been reorganized as submodules, allowing users to import specific functions from their respective scripts.The
Pathcommand from thepathlibPython library has been deprecated across the codebase, though it will remain in test case scripts for accessing data directories.
Deprecations#
The
Corollary.pyandAuxilliary.pyscripts are now officially deprecated due to the complexity and overabundance of functions. These functionalities have been moved to more specialized scripts to improve organization and maintainability.
New Functionality: Modularized Script Collections#
To better organize the deprecated functions from Corollary.py and Auxilliary.py, new themed scripts have been introduced within the Processing and Plotting modules. These changes ensure better modularity and improve the user experience by grouping related functions. The new scripts are as follows:
Processing Module:#
time_utils.py:leapyeartrue_true_time_series_lengthsplit_to_monthly,split_to_yearlyget_common_yearsget_season_mask
data_alignment.py:get_valid_mask,get_valid_mask_pandasalign_pandas_series,align_numpy_seriesget_common_series_by_year,get_common_series_by_year_monthextract_mod_sat_keygather_monthly_data_across_years
file_io.py:mask_readerload_dataset
stat_math_utils.py:fit_huberfit_lowessround_up_to_nearest
utils.py:find_key
Plotting Module:#
formatting.py:format_unitget_variable_from_label_unitfill_anular_regionget_min_max_for_identity_line_style_axis_custom
These scripts will be expanded as necessary to accommodate additional functions and improve usability.
Default Plotting Options#
All previous options used in the plotting functions are now set as defaults. If the user does not provide custom options, the package will automatically apply these default settings, improving ease of use and flexibility for customizations. This will be further enhanced in the upcoming test case update.
Test Case Scripts#
Test case scripts have been relocated to a dedicated folder alongside the data folder. This structure allows for better organization and easier management of test data moving forward.
UNSTABLE#
Important Notice: This release is extremely unstable due to the fundamental changes in file paths and the overall structure of the package. Many old paths used to fetch functions have been broken, and some functions are still in the process of being integrated into the new structure.
It is advised to avoid using this release for anything beyond basic plotting functions. Upcoming updates will address these issues and re-implement missing functionalities, restoring full compatibility.
Future Work#
The immediate focus is to re-enable all core functions within the new package structure and restore their usability as quickly as possible.
Future updates will:
Move default options for
TargetandTaylorplots into a centralized configuration file.Rework the
Benthic_layer.pytest case script to separate computational and plotting functions, which are currently intertwined.Provide the option to save plot data in both
.csvand.ncformats.Optimize the data reading/setup scripts, with a focus on improving performance and finally integrating the interpolator into the Python ecosystem.
Version: 3.1.1#
Date: 19/05/2025
Summary#
Small patch for both analyser scripts regarding typos and a patch for both Target_plot.py and Taylor_diagrams.py functions’ scripts regarding a couple of bugs.
Taylor_diagrams.py#
Fixed a visualization issue for the Taylor_diagrams.py scripts for which the title would not be properly displayed in the saved image, the extension of the plot is extended to accommodate more space for the text.
Target_plots.py#
Fixed a bug due to which the yearly Target plot would be saved as a white image with nothing inside
Version: 3.1#
Date: 18/05/2025
Summary#
This update introduces a rework of the Whiskerbox and Violin plot functions and adds a new utility for streamlined variable extraction.
Plot Enhancements#
Whiskerbox and Violin plots have been restructured to follow the same logic and structure used in the other plotting functions.
These plots now support:
Automatic key extraction from nested dictionaries.
Dynamic title and label formatting via existing auxiliary functions.
New Helper Function: gather_monthly_data_across_years#
A new utility function,
gather_monthly_data_across_years, has been implemented to facilitate data extraction across multiple years.Currently tailored for box/violin plot input, but will be tested and adapted for wider use across additional plotting and computation workflows.
Future Direction#
Further optimize plotting routines for speed and clarity.
Begin reworking data loading and interpolation functions for faster runtime.
Explore full Python replacement of the current MATLAB
Interpolato.mscript.Improvements of changelogs listing the new functions that are added in each update. The added function in the previous 3.x.x updates are:
ver 3.0.1:
In
Corollary.py:get_common_series_by_year(slices dataset based on years)get_common_series_by_year_month(slices dataset based on years and months)
ver 3.0
All of
Auxiliary.py(functions to aid for the necessary computations regarding the plotting function, contains statistics and other)All of
Target_computations-py(computations and normalisations necessary for the correct plotting of the Target Plots)All of
Taylor_computations.py(computations and normalisations necessary for the correct plotting of the Taylor Diagrams)All of
Density.py(bundles necessary density computations)In
Corollary.py:extract_mod_sat_keys(allows for the identification/extraction of model and satellite dictionary keys)
Known Issues#
RMSD Label Placement: Labels are currently tied to a fixed first arc value. Further investigation into the
SkillMetricslibrary is ongoing to determine how arc ranges are defined and whether label placement can be dynamically bound to them.Static RMSD Arc Ticks: Taylor diagrams use the same arc ticks across plots. While this helps with consistency in test cases, dynamic adjustment would improve generality. Removing the
tickrmsoverride may solve this, but could also interfere with label alignment (see above).Unexpected Target Plot Results: Initial performance scores from Target plots appear lower than anticipated. Ongoing testing will determine if this is a bug, data artifact, or an accurate model assessment.
Chlorophyll regression analysis occasionally produces anomalous values — further investigation is underway.
Version: 3.0.1#
Date: 18/05/2025
Summary#
This is a minor update focused on expanding DataFrame usability within the SST and CHL analysis scripts and improving dataset loading performance.
Expanded Use of Pandas DataFrames#
SST and CHL analysis scripts now fully leverage
pandasDataFrames, enabling:Seamless integration of the
datetimedimension.More efficient time-based slicing into monthly and yearly datasets using native
pandasmethods.
Enhances clarity and performance for long-term and seasonal trend analysis.
Faster Dataset Loading#
Introduced parallel loading of SST datasets using
ThreadPoolExecutorfrom Python’sconcurrent.futuresmodule.Significantly improves script runtime when dealing with large temporal datasets.
Version: 3.0#
Date: 18/05/2025
Summary#
This version introduces a major rework and optimization of the plotting functions used for model validation and comparison. It focuses on improving clarity, maintainability, and performance in both visual output and computational workflow.
New Taylor Diagrams and Target Plots#
Normalization: All monthly validation parameters are now normalized by their respective standard deviations, allowing for the unified display of all markers in a single diagram.
Marker Logic Update: Marker representations have been reworked based on a consistent logic [insert table when available].
Enhanced Visualization:
Taylor Diagrams: Now include RMSD arcs and repositioned RMSD labels outside the plot area to avoid marker overlap.
Target Plots: Include color-coded performance zones to quickly assess model accuracy and bias.
Violin Plots#
Introduced violin plots as an alternative to whisker-box plots.
Violin plots offer a smoother visual of data distribution but are less informative regarding outliers.
This plot type is included for completeness and comparative analysis.
Seaborn Integration#
Most plotting functions now utilize the Seaborn library.
Advantages include:
Better integration with
pandasDataFrames.More expressive and customizable visualizations.
Improved consistency across plots.
Separation of Computations#
A significant refactor has begun to modularize core functionality:
Extracted key routines from plotting scripts into a new Auxiliary script:
Label formatting (e.g., variable names, units).
Key identification from datasets.
Seasonal masks and data groupings.
Statistical calculations required for Taylor and Target diagrams.
Regression line generation (Huber, LOWESS, etc.).
This modularization paves the way for cleaner, more testable code in preparation for the final pytest integration.
Future Direction#
Further optimize plotting routines for speed and clarity.
Begin reworking data loading and interpolation functions for faster runtime.
Explore full Python replacement of the current MATLAB
Interpolato.mscript.
Known Issues#
RMSD Label Placement: Labels are currently tied to a fixed first arc value. Further investigation into the
SkillMetricslibrary is ongoing to determine how arc ranges are defined and whether label placement can be dynamically bound to them.Static RMSD Arc Ticks: Taylor diagrams use the same arc ticks across plots. While this helps with consistency in test cases, dynamic adjustment would improve generality. Removing the
tickrmsoverride may solve this, but could also interfere with label alignment (see above).Unexpected Target Plot Results: Initial performance scores from Target plots appear lower than anticipated. Ongoing testing will determine if this is a bug, data artifact, or an accurate model assessment.
Chlorophyll regression analysis occasionally produces anomalous values — further investigation is underway.
Version: 2.11#
Date: 14/05/2025
Summary#
Whisker-box plots have been implemented for satellite Basin Average SST and CHL datasets.
Whisker Plots#
A new visualization tool — the whisker-box plot — has been added to both the SST and CHL analysis scripts. These plots provide a clearer view of statistical distributions, highlighting mean values and outliers in the Basin Average datasets. Their primary purpose is to enhance model performance evaluation by offering a more nuanced look at dataset variability.
Future Developments#
With the implementation of this feature, the 2.x development cycle is considered complete.
The next major update will initiate Version 3.0, which will focus on:
Refactoring all functions to improve computational efficiency and streamline logic.
Introducing new libraries, such as Seaborn, for more advanced and elegant plotting.
Ensuring result consistency, with side-by-side testing to confirm output reliability compared to previous versions.
Resolving known issues and bugs from earlier versions.
Expanding documentation, including:
Clearer comments and structure within the codebase.
A step-by-step guide for running the test case.
A pytest module to automate testing of computational functions.
Version 3.0 will mark a shift toward a more maintainable, scalable, and user-friendly project structure.
Known Issues#
Taylor Diagrams still use static RMSD ranges — dynamic scaling is planned.
Taylor and Target plots continue to depend on pre-defined
.csvconfiguration files.Chlorophyll regression analysis occasionally produces anomalous values — further investigation is underway.
LaTeX rendering in colorbar labels may break under certain conditions.
While dynamic colorbar scaling may improve usability, the current fixed scaling highlights extremes effectively; additional testing is ongoing.
Version: 2.10#
Date: 13/05/2025
Summary#
The Benthic Layer Analysis script has been expanded to version 2.0 with significant enhancements to data extraction, analysis, and visualization.
This update builds upon the initial version, adding functionality for the extraction and plotting of temperature and salinity data at the benthic layer. Additionally, the script now computes and visualizes the density field using three distinct equations of state, providing more accurate insights into deep water formation. As a result, the pressure field will no longer be included in the project, as the density field is deemed a more reliable representation of the evolution of dense water formation.
Key Enhancements:#
Temperature and Salinity Maps:
The temperature and salinity values at the benthic layer are now extracted using a method similar to that employed for biogeochemical fields. These values are georeferenced and plotted using the same function as used for the biogeochemical species, ensuring consistent map generation.Density Computation & Plotting:
Temperature and salinity data are now processed to compute the density field, using the following three equations of state:Simplified equation of state
Equation of State for Seawater (1980)
Thermodynamic Equation of State (2010)
All three density fields are plotted using a fixed color range to allow for easy comparison of the differences between the equations of state.
Paper are provided in the Bibliography section of the README for the user to read to better understand differences in these 3 different Equations of State.
Visual Enhancements:#
Plots now feature fixed color ranges, making it easier to identify and interpret the phenomena illustrated by the maps and plots.
Future Developments:#
Ongoing improvements to the density computation.
Additional enhancements to data visualization and analytical functions.
Known Issues#
Taylor Diagrams still use static RMSD ranges — dynamic scaling is planned.
Taylor and Target plots continue to depend on pre-defined
.csvconfiguration files.Chlorophyll regression analysis occasionally produces anomalous values — further investigation is underway.
LaTeX rendering in colorbar labels may break under certain conditions.
While dynamic colorbar scaling may improve usability, the current fixed scaling highlights extremes effectively; additional testing is ongoing.
Version: 2.10#
Date: 10/05/2025
Summary#
Initial version (v1.0) of the Benthic Geochemical Analysis script completed.
Benthic Geochemical Analysis Script#
This first iteration of the Benthic Layer Analysis script introduces foundational functionality for exploring geochemical dynamics at the sediment-water interface using output from the BFM-NEMO coupled model.
Features:#
Computes the deepest active layer (out of 48 vertical layers) in each model grid cell across the domain.
Enables visualization of the model basin bathymetry and deepest layer distribution.
Allows users to select a chemical species from the simulation for spatial plotting.
Generates georeferenced 2D contour maps of selected species at the benthic interface, enriched with coastlines.
Default contour resolution is 51 levels (configurable in code) for the geochemical species, Benthic Depth plot uses 26.
Future Development#
This script represents the first half of the full analysis pipeline. Future updates will introduce:
Computation and visualization of the pressure field within the water column.
Diagnostic tools for investigating deep water formation processes in the Northern Adriatic Sea.
Known Issues#
Taylor Diagrams still use static RMSD ranges — dynamic scaling is planned.
Taylor and Target plots continue to depend on pre-defined
.csvconfiguration files.Chlorophyll regression analysis occasionally produces anomalous values — further investigation is underway.
LaTeX rendering in colorbar labels may break under certain conditions.
While dynamic colorbar scaling may improve usability, the current fixed scaling highlights extremes effectively; additional testing is ongoing.
Version: 2.9#
Date: 06/05/2025
Summary#
Introduced seasonal scatterplots for both Sea Surface Temperature (SST) and Chlorophyll (CHL) datasets to support more detailed seasonal analysis.
Seasonal Scatterplots#
Based on the insights from previous scatterplot analyses, new plots have been developed to break down basin-averaged values by season.
The data is first decomposed into seasonal subsets and visualized in individual season-specific scatterplots.
A combined scatterplot is also generated, consolidating all seasonal data and color-coding points according to their respective seasons.
Each plot includes:
A best-fit line
A Huber regression line for robust linear fitting (less sensitive to outliers)
A LOWESS (Locally Weighted Scatterplot Smoothing) non-linear regression line to highlight trends in densely clustered areas.
Known Issues#
The Taylor diagrams still use a fixed RMSD range; dynamic scaling is planned for a future update.
Taylor and Target plots continue to rely on static
.csvfiles, which limits flexibility.Some anomalous values have been observed in the CHL regression fits; further investigation is required to determine whether these are data artifacts or bugs.
Version: 2.8#
Date: 24/04/2025
Summary#
Updated the CHL analysis script and performed minor cleanup in the SST script.
Chlorophyll Analysis Script Updated to Version 2.0#
The CHL analysis script has been updated to align with the improvements made in the SST script.
Comments have been revised to more clearly identify level 3 and level 4 analysis sections.
Plots are now displayed for 3 seconds before closing and are saved in dynamically created folders.
Other Changes#
Added a print statement to the SST analysis script to inform the user when the BIAS has been computed.
Known Issues#
Similar to the SST plots, the CHL plots have a fixed range for the RMSD in the Taylor diagrams. Future updates aim to make the RMSD range dynamic for improved graph display.
Version: 2.7#
Date: 24/04/2025
Summary#
Updated the SST analysis script and fixed minor path-related issues.
Sea Surface Temperature Data Analysis Script Version 2.0#
The SST analysis script has been updated to version 2.0, enhancing user interaction through improved print statements and more effective plot handling.
Implemented dynamic path creation, enabling the automatic creation of ad-hoc folders to save plots, as seen in previous scripts.
Plots are now displayed for 3 seconds before automatically closing; this duration can be adjusted by modifying the value in the plotting functions.
Other Changes#
Removed the
Plot outputsfolder to declutter the project structure. Example outputs will be provided in a future update to the README.Removed
leap_year.py, as its functions have been migrated toCorollary.py.Merged scatter plots, time series plots, and efficiency plots functions into a single script,
Plots.py.
Known Issues#
The RMSD (Root Mean Square Deviation) in the Taylor diagrams is not currently displayed correctly. This issue stems from the
.csvfiles provided in the folder. Future updates will aim to make the ranges dynamic for improved graph display.
Version: 2.6#
Date: 23/04/2025
Summary#
Introduced basin average computation and performed significant cleanup of old scripts.
Data Setupper Complete#
The data reading and setup script is now fully complete, with the addition of functionality to compute the daily mean basin average time series for SST datasets (both satellite and model).
The data saving script has been updated to support saving the new basin average data.
The CHL dataset still requires processing through the interpolator to compute its basin average.
Other Changes#
Conducted a major cleanup of the functions folder. All outdated functions have been removed, with the exception of the
leap_yearfunction, which remains in use for analysis scripts.
Version: 2.5#
Date: 23/04/2025
Summary#
Expanded data saving functionality to support model datasets.
Data Saver Expanded#
The
Data_saver.pyscript has been extended with new functions to handle saving model datasets.Due to the large size of the model data, it has been split into multiple files, with each file corresponding to a specific year.
Other Changes#
Added assertions for enhanced code stability and error handling.
Reorganized files and functions to improve code structure and maintainability.
Version: 2.4#
Date: 23/04/2025
Summary#
Introduced functions for reading model data and expanded the main script to support them.
Model Data#
Added necessary functions for reading model data, now integrated into a new script.
The main code has been updated to handle the newly added model data reading functionality.
Other Changes#
Reorganized functions for improved clarity and maintainability.
Added assertions to enhance code stability and error handling.
Version: 2.3#
Date: 23/04/2025
Summary#
Introduced the ability to import and apply a mask to align satellite and model data.
Mask Handling#
The model data contains
NaNvalues for landmasses, so a mask has been implemented to ensure proper alignment between satellite and model datasets.The mask is essential for the interpolation process, and the one used in the current test case will be provided at a later stage.
Other Changes#
Reorganized functions for improved code structure.
Created a corollary file to house additional functions that are not directly related to missing data handling or data reading.
Version: 2.2#
Date: 23/04/2025
Summary#
Introduced the Data_saver.py script to enhance data management and facilitate the transition to the interpolator.
Data Saving#
Two new steps have been implemented to allow users to save manipulated satellite datasets for validation or later processing.
Users can now save datasets in a dedicated folder in either
.mator.ncformats:.mat files contain all necessary variables for interpolation.
.nc files offer flexibility for future expansion if needed.
Each saved dataset is timestamped with the date of the run for tracking purposes.
Other Changes#
Reorganized and moved functions to continue restructuring the codebase for improved modularity and maintainability.
Version: 2.1#
Date: 23/04/2025
Summary#
Introduced a dedicated script for reading satellite SST (Sea Surface Temperature) data.
Code Structure#
Added functions for reading satellite SST data, now integrated into the main script.
The overall structure remains streamlined, with new functionality focused on SST data handling.
Other Changes#
Refined and reorganized functions for improved clarity and maintainability.
Version: 2.0#
Date: 22/04/2025
Summary#
Reworked the Data_reader_setupper script from the ground up, with the script now moved to the home directory.
Code Structure#
Currently works only with the satellite CHL datasets.
The structure of the functions remains unchanged, but they have been moved to a dedicated SAT data script.
The user can now define which level of data to handle via an input line.
Other Changes#
Added assertions to improve code stability.
Version: 1.0
Date: 20/04/2025
Summary#
Initial implementation of core functionality up to the Data Analysis and Efficiency Metrics stage. The codebase is structured to handle both satellite and model data through two distinct scripts, with support for saving outputs in .nc and .mat formats.
Data Handling#
Satellite Data
CHL (Chlorophyll): Loads data into a single array and checks for missing fields. Requires the data level to be specified via a variable in the code.
SST (Sea Surface Temperature): Reads data and converts temperature values to Celsius.
Model Data
CHL: Retrieves and reads chlorophyll data from model outputs.
SST: Retrieves and reads sea surface temperature data; also computes basin-averaged SST values.
Code Structure#
Core functions are distributed across multiple files.
Interpolation is handled using a MATLAB script due to Python limitations in managing grids with adjacent cells of identical values.
This MATLAB script also computes the basin average for CHL data.
Analysis#
SST and CHL analyses are performed in separate scripts to mitigate RAM limitations.
Implemented analyses include:
Time series plots
Scatter plots
Taylor diagrams
Target diagrams
Multiple efficiency metrics (refer to Krause et al., 2005, as listed in the Bibliography)