Skip to content

Added separate bacteria and fungi pools#762

Merged
jacobcook1995 merged 10 commits intodevelopfrom
758-separate-microbes-into-bacteria-and-fungi
Feb 28, 2025
Merged

Added separate bacteria and fungi pools#762
jacobcook1995 merged 10 commits intodevelopfrom
758-separate-microbes-into-bacteria-and-fungi

Conversation

@jacobcook1995
Copy link
Copy Markdown
Collaborator

@jacobcook1995 jacobcook1995 commented Feb 25, 2025

Description

This PR splits the microbial pool into separate fungi and bacteria pools. At the moment these groups behave identically but can be parameterised differently. In future (see #761) I want to split enzymes up by which taxonomic group generated them.

In order to get the existing code to work with multiple microbial groups I had to introduce a MicrobialGroup data class. This basically exists to bundle the relevant constants for a specific microbial group together. I'm not sure whether the way I did this is the most sensible approach so I am open to suggestions.

@hrlai, tagging you in so that you can see the changes coming in. The most significant for your purposes will be that most of the microbial parameters are now split between fungi and bacteria versions.

Fixes #758

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works
  • Relevant documentation reviewed and updated

@jacobcook1995 jacobcook1995 linked an issue Feb 25, 2025 that may be closed by this pull request
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 25, 2025

Codecov Report

❌ Patch coverage is 97.89474% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.72%. Comparing base (4894d4d) to head (b606981).
⚠️ Report is 2500 commits behind head on develop.

Files with missing lines Patch % Lines
...ample_data/generation_scripts/soil_example_data.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #762      +/-   ##
===========================================
+ Coverage    94.69%   94.72%   +0.02%     
===========================================
  Files           74       75       +1     
  Lines         5149     5197      +48     
===========================================
+ Hits          4876     4923      +47     
- Misses         273      274       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a small suggestion, but otherwise it looks good.

I think there are a couple of places where there's code duplication, applying the same thing to bacteria and fungi, that could be replaced with a loop, but I'm not sure if it is worth the effort at this time or if the calculation is exactly the same, or if it would be the same in the future, so I would leave it as it is, for now. We can revisit that later, if needed.

Comment on lines +10 to +11
@dataclass
class MicrobialGroup:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get this right, this dataclass is mean to contain the constants that each group (bacteria and fungi) requires, and is not meant to change over the course of the simulation, right?

If that's the case, I'd make this a frozen dataclass, to ensure that, and maybe add Constants to the class name to make it even more clear:

Suggested change
@dataclass
class MicrobialGroup:
@dataclass(frozen=True)
class MicrobialGroupConstants:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep these constants should never change in the course of a simulation, I'll make it a frozen class and rename!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment you already have all of this data packaged in SoilConsts but you are packaging it up into a consistent wrapper and leaving the door open to add more groups? I think as it stands its between two stools - your code allows multiple groups but the config has got you pinned at only one of each.

I think you've basically got microbial functional types here and it would make sense to adopt something like the framework used in the Plants Model update. Rather than having a long list of of parameters which (currently) only allow one of each of fungi and bacteria, then have an array of tables in TOML (or package in a CSV file as Plants does).

[[soil.microbial_group]]
type = "fungal"
max_uptake_rate_labile_C = 0.1
etc = "etc"

If you add type to your dataclass, you can then simply load an instance using MicrobialGroup(**args_from_config) and hence ditch make_bacterial_functional_group and make_fungal_functional_group. The only thing you then need to do is to divide what you get from loading the config into the fungal and bacterial subgroups in make_full_set_of_microbial_groups.

Copy link
Copy Markdown
Collaborator Author

@jacobcook1995 jacobcook1995 Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does make sense, I guess my main concern would be how these constants will get documented because the majority of "constants" used in the soil model in reality vary between functional groups.

Would they be documented as attributes in MicrobialGroup rather than in the constants submodule?

Copy link
Copy Markdown
Collaborator

@davidorme davidorme Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you have a consistent set of attributes across both fungi and bacteria but the values of those attributes differ between the two (and also could vary between different subgroups within those two, but currently don't).

So, I think the attribute itself should be easy to document in the constants submodule - it plays a consistent role - but then the actual values for each group are harder. Ultimately this is becoming more like data - as a user, you'd want to document what sources were used to describe those groups but that isn't something that we document within the code?

I guess you'd also need to handle the case where users omit one of the two! Could you run the simulation without one of them or should it break?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should break if one of the two is omitted. It's not implemented yet but the idea is that each functional groups interacts with the soil pools (somewhat) differently. With the basic idea being the functional groups are defined at a coarse enough level that they represent really meaningfully different actors, as the model is not really designed to simulate competition between similar organisms.

I really don't want to allow users to define arbitrary functional groups, because this would essentially entail spending a lot of development effort for a feature we would tell people not to use (because the results would be nonsensical, like if two bacterial functional groups were added one would be guaranteed to go extinct).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it's worth adopting your approach though as it will save me time and effort (and avoid cluttering constants) when I get around to split fungi into multiple groups

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't want to allow users to define arbitrary functional groups

Is that then a simple check that the types are unique on the loaded set of groups?

spending a lot of development effort for a feature we would tell people not to use

But we do want them to use it - it's a less cluttered way of defining the constants. We just don't want them to use it wrong 😄 It also opens up using a CSV for the parameterisation, which is much more user friendly than TOML for this kind of configuration.

Copy link
Copy Markdown
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to go all the way and refactor the config to support this. It'll end up much cleaner and I don't think it's too hard to do.

Comment on lines +10 to +11
@dataclass
class MicrobialGroup:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment you already have all of this data packaged in SoilConsts but you are packaging it up into a consistent wrapper and leaving the door open to add more groups? I think as it stands its between two stools - your code allows multiple groups but the config has got you pinned at only one of each.

I think you've basically got microbial functional types here and it would make sense to adopt something like the framework used in the Plants Model update. Rather than having a long list of of parameters which (currently) only allow one of each of fungi and bacteria, then have an array of tables in TOML (or package in a CSV file as Plants does).

[[soil.microbial_group]]
type = "fungal"
max_uptake_rate_labile_C = 0.1
etc = "etc"

If you add type to your dataclass, you can then simply load an instance using MicrobialGroup(**args_from_config) and hence ditch make_bacterial_functional_group and make_fungal_functional_group. The only thing you then need to do is to divide what you get from loading the config into the fungal and bacterial subgroups in make_full_set_of_microbial_groups.

@jacobcook1995
Copy link
Copy Markdown
Collaborator Author

@davidorme I've changed the code so that MicrobialGroupConstants now gets populated based on the config rather than the contents of SoilConsts, let me know if my approach looks sensible

Copy link
Copy Markdown
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much cleaner to me. I think it's the right direction! We could add loading from CSV but with the current two groups, putting this in TOML isn't anything like the fuss it is for plant functional types. That's something to park, maybe until you add the different fungal types.

I've made a suggestion on the validation code - just easier to read and doesn't mask an error in the exception.

Comment on lines +84 to +108
expected_groups = ["fungi", "bacteria"]

if "soil" not in config:
msg = "Model configuration for soil model not found."
LOGGER.critical(msg)
raise ConfigurationError(msg)

defined_groups = [
group["name"] for group in config["soil"]["microbial_group_definition"]
]

if set(defined_groups) != set(expected_groups):
if not set(expected_groups).issubset(set(defined_groups)):
msg = (
"The following expected soil microbial groups are not defined: "
f"{', '.join(set(expected_groups) - set(defined_groups))}"
)
LOGGER.critical(msg)
if not set(defined_groups).issubset(set(expected_groups)):
msg = (
"The following microbial groups are not valid: "
f"{', '.join(set(defined_groups) - set(expected_groups))}"
)
LOGGER.critical(msg)
raise ConfigurationError(msg)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code below is easier to read - also the current code masks the first error in the exception reporting if both occur.

Suggested change
expected_groups = ["fungi", "bacteria"]
if "soil" not in config:
msg = "Model configuration for soil model not found."
LOGGER.critical(msg)
raise ConfigurationError(msg)
defined_groups = [
group["name"] for group in config["soil"]["microbial_group_definition"]
]
if set(defined_groups) != set(expected_groups):
if not set(expected_groups).issubset(set(defined_groups)):
msg = (
"The following expected soil microbial groups are not defined: "
f"{', '.join(set(expected_groups) - set(defined_groups))}"
)
LOGGER.critical(msg)
if not set(defined_groups).issubset(set(expected_groups)):
msg = (
"The following microbial groups are not valid: "
f"{', '.join(set(defined_groups) - set(expected_groups))}"
)
LOGGER.critical(msg)
raise ConfigurationError(msg)
if "soil" not in config:
msg = "Model configuration for soil model not found."
LOGGER.critical(msg)
raise ConfigurationError(msg)
expected_groups = {"fungi", "bacteria"}
defined_groups = {
group["name"] for group in config["soil"]["microbial_group_definition"]
}
undefined_groups = expected_groups.difference(defined_groups)
unexpected_groups = defined_groups.difference(expected_groups)
if undefined_groups:
msg = (
"The following expected soil microbial groups are not defined: "
f"{', '.join(set(expected_groups) - set(defined_groups))}"
)
LOGGER.critical(msg)
if unexpected_groups:
msg = (
"The following microbial groups are not valid: "
f"{', '.join(set(defined_groups) - set(expected_groups))}"
)
LOGGER.critical(msg)
if undefined_groups or unexpected_groups:
raise ConfigurationError("The soil microbial group configuration contains errors. Please check the log".)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh that's really useful thanks, I didn't think I was approaching that part in an ideal manner.

Also, yes think reading from CSV is something to implement down the road, once we've had a bit more chance to see how using TOML works in practice for these groups

Copy link
Copy Markdown
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jacobcook1995 jacobcook1995 merged commit 52f15b8 into develop Feb 28, 2025
16 checks passed
@jacobcook1995 jacobcook1995 deleted the 758-separate-microbes-into-bacteria-and-fungi branch February 28, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Separate microbes into bacteria and fungi

4 participants