Skip to content

Add cohort data exporter from PlantsModel#912

Merged
davidorme merged 46 commits intodevelopfrom
911-add-cohort-data-exporter-from-plantsmodel
Jul 3, 2025
Merged

Add cohort data exporter from PlantsModel#912
davidorme merged 46 commits intodevelopfrom
911-add-cohort-data-exporter-from-plantsmodel

Conversation

@davidorme
Copy link
Copy Markdown
Collaborator

@davidorme davidorme commented Jun 21, 2025

Description

This PR introduces the models.plants.exporter module that provides a the CommunityDataExporter tool for plant community data. This data could be worked into the data object and exported by the standard mechanism, but the data is not used by any other model and consists of sets of highly ragged arrays of community data per cell that is much more suited to a data frame style export than the DataArray format.

There are three kinds of data:

  • cohort data - one line per cohort per cell
  • community canopy data - one line per canopy layer per cell
  • stem canopy data - one line per canopy layer per cohort per cell

Each of those different classes has a bunch of attributes that are stored within the communities, canopies and stem_allocation attributes within the Plants Model.

So, this PR:

  • Adds new section of configuration for the plants model schema to specify output paths for each kind of a data and an attribute list to subset the data written out.
[plants.community_data_export]
required_data = ['canopy', ... ]
cohort_attributes = []
community_canopy_attributes = []
stem_canopy_attributes = []
  • Adds the new exporter class CommunityDataExplorer with:

    • an __init__ method that the settings above directly
    • a from_config factory method that creates an instance from the settings loaded in a Config object.
    • a dump method that can be called to compile and output the three data types to CSV file. It switches from write to append mode after first use to avoid duplicating column headers.
    • Some private validation methods for paths and attribute subsets.
  • Updates PlantsModel to require a CommunityDataExplorer object at __init__ and then to call the dump() method at the end of model setup and in each update.

  • Adds a bunch of tests in tests.models.plants.test_explorer.py

Fixes #911 (issue)

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works
  • Relevant documentation reviewed and updated

@davidorme davidorme linked an issue Jun 21, 2025 that may be closed by this pull request
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 21, 2025

Codecov Report

❌ Patch coverage is 98.07692% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.00%. Comparing base (31cb58d) to head (38e7977).
⚠️ Report is 1738 commits behind head on develop.

Files with missing lines Patch % Lines
virtual_ecosystem/models/plants/exporter.py 97.93% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #912      +/-   ##
===========================================
+ Coverage    93.89%   94.00%   +0.10%     
===========================================
  Files           77       78       +1     
  Lines         5912     6067     +155     
===========================================
+ Hits          5551     5703     +152     
- Misses         361      364       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, I think this approach makes sense, leaving up to the model to export some of the data relevant for it. My only concern is if the data it exports might be ever be updated by a model run after plats within the same timestep. I wonder if it would be better to have an export step that is run for every model after all the updates are done.

@davidorme
Copy link
Copy Markdown
Collaborator Author

My only concern is if the data it exports might be ever be updated by a model run after plats within the same timestep. I wonder if it would be better to have an export step that is run for every model after all the updates are done.

After all the updates are done within a time step. That's an interesting idea that I think might be useful. In this specific case, we're intending to exporting the internal cohort structure and tree allometry, which is only under the control of the plants model, and indeed isn't exposed to the other models through data.

But... I think we could revisit this same structure and make it part of the BaseModel to create model exporter methods for dumping internal model state. I'm just not keen to do that now 😄

@davidorme davidorme marked this pull request as ready for review July 1, 2025 17:55
@davidorme davidorme requested a review from dalonsoa July 1, 2025 17:55
Copy link
Copy Markdown
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of comments, but it is broadly looking good.

Comment on lines +441 to +442
time=self.model_timing.start_time
+ time_index * self.model_timing.update_interval,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might come handy to add a simple method in the base model that returns the current time:

def current_time(self, time_index: int) -> int: # was this in epoch?
    return self.model_timing.start_time + time_index * self.model_timing.update_interval

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to revisit the timing in more detail but this is a good idea.

@davidorme davidorme requested a review from dalonsoa July 2, 2025 19:05
@davidorme
Copy link
Copy Markdown
Collaborator Author

@sallymatson I've finished updating from @dalonsoa's review now. Diego - your suggestions have simplified the code and testing quite a bit. Nothing hugely changed but just more polished, I think.

Copy link
Copy Markdown
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better :)

@davidorme davidorme merged commit 189340b into develop Jul 3, 2025
29 of 30 checks passed
@davidorme davidorme deleted the 911-add-cohort-data-exporter-from-plantsmodel branch July 3, 2025 14:11
@davidorme davidorme restored the 911-add-cohort-data-exporter-from-plantsmodel branch July 3, 2025 14:14
@davidorme davidorme mentioned this pull request Jul 3, 2025
8 tasks
@arne-exe
Copy link
Copy Markdown
Collaborator

Hi @davidorme, I just had a look at this, the exporter looks very neat and will be super useful for looking at the outputs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add cohort data exporter from PlantsModel

4 participants