Skip to content

Added trophic flow analytical tool and test#243

Draft
nickwctan wants to merge 1 commit intomainfrom
animal---tool-trophic-mass-flow
Draft

Added trophic flow analytical tool and test#243
nickwctan wants to merge 1 commit intomainfrom
animal---tool-trophic-mass-flow

Conversation

@nickwctan
Copy link
Copy Markdown
Collaborator

@nickwctan nickwctan commented Mar 20, 2026

This PR adds an analytical tool to visualise the animal trophic interaction output from Virtual Ecosystem and calculates the trophic mass flow according to carbon, nitrogen or phosphorus.

Currently, by default, it tracks the carbon sum of resource pool (AKA resource kind) of all consumer_cohort in each timestep. This outputs faceted plot for visualisation purposes.

I envision that this will be extended in the future to create multiple graphical outputs to analyse trophic interactions.

@davidorme Is there a way we can mimic the pytest pathing in Virtual Ecosystem so that it also works similarly in the ve_data_science repo? At the moment I have placed the test file in the same folder, which is not ideal.

@davidorme
Copy link
Copy Markdown
Collaborator

@nickwctan When we chatted in Teams, I hadn't realised that you wanted to push a datafile to act as part of a test suite for the new tool. That's quite a different use case than just uploading a datafile for people to check the code out with. I'm completely on board with adding a datafile that is used for code testing. It would be good to reduce the file size as much as possible though - maybe cut the file down to fewer time steps or cells?

I think the way to do this is to have a tools/testing_data directory - we may end up going for R or python or stick with both for the tools but sharing common test files is sensible!

On the location of the test files, there are a couple of choices of organising files, which includes having them within the code directory:

https://docs.pytest.org/en/7.1.x/explanation/goodpractices.html

It doesn't have to be a separate folder as in the VE repo. I wouldn't worry about this now, we can restructure as needed later.

Copy link
Copy Markdown
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickwctan This is really neat - hadn't realised you were also building in testing.

I've got basically two types of comments:

  1. You have added the if __name__=="__main__": sections to both files, but neither of those really reflect how the files should be used in practice (for different reasons for the tests vs the tool).
  2. Your methods on the tool are using return self a lot. That isn't generally how class methods are used in Python - if your method is updating the class instance (i.e. doing self.something = "a thing" then the method can simply return None and the class instance that you ran the method on has been updated.
In [2]: class Test:
   ...: 
   ...:     def __init__(self):
   ...: 
   ...:         self.a = 12
   ...: 
   ...:     def update_a(self) -> None:
   ...: 
   ...:         self.a = 24
   ...: 
   ...: 
   ...: test = Test()
   ...: test.a
   ...: 
Out[2]: 12

In [3]: test.update_a()
   ...: test.a
Out[3]: 24

ETA:

  1. You have a ton of print statements. That is great for debugging - do you actually want these unavoidably appearing in all your notebooks that use the tool? You can filter out printed messages in notebook code, but that isn't something users would normally expect to have to do.

    You could use the logging system to print these messages, which allows a user to suppress logging. I think you have to make a decision about whether these are debugging messages that were useful for you during coding (but should now just be deleted) or if they are actually a useful output that users might sometimes want (in which case, logging, but more complex!)

Comment on lines +88 to +97
# ================================================================
# running the test
# ================================================================

if __name__ == "__main__":
print("Running tests")
test_class_initialisation(analysis())
test_load_data(analysis())
test_process_data(analysis())
print("All tests passed")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wouldn't run the tests like this. You'd just do pytest test_trophic_mass_flow.py from the command line and pytest handles everything else.

A Python file does not have to have an if __name__ == "__main__": section so you can simply delete these lines.

Comment on lines +392 to +414
if __name__ == "__main__": #
"""Run test separately.

Only run this chunk of codes when executed directly and not imported.
animal_trophic_interactions.csv can be generated when doing a ve_run.
This csv can be found in ".../out"
Place the csv file in the same folder as your py files.
"""
# create instance for analysis
analysis = TrophicFlowAnalysis(
"animal_trophic_interactions.csv", config={"cumulative": False, "n_cols": 2}
)

# load and process data (method chaining)
analysis.load_data().process_data()

# summary
analysis.summary()

# create plots
analysis.plot_faceted(save_path="mass_flow_faceted.png")

print("\nDone")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, you'd never just execute this Python file like this, or need to move the output files. Instead, what you would do is - in the Python analysis script that uses this tool -

from ../../tools.python.trophic_mass_flow import TrophicFlowAnalysis

analysis = TrophicFlowAnalysis('../../data/scenarios/scenario/out/animal_trophic_interactions.csv')

There will be some things to sort out there with the relative locations of the tool module that I need to think about, but I would again remove this and maybe move the code into the docstring to show the workflow.

Comment on lines +48 to +49
print(f"trophic flow initialised for {file_path}")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think breaking out the functionality into methods here is good practice, but in terms of actual users is anyone ever going to create a TrophicFlowAnalysis instance without loading and processing the data?

If we include calling those methods in __init__ then the usage is simpler.

Suggested change
print(f"trophic flow initialised for {file_path}")
print(f"trophic flow initialised for {file_path}")
self.load_data()
self.process_data()

Copy link
Copy Markdown
Collaborator

@davidorme davidorme Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually, linking over to @Baizurah's recent PR (#241) - if you were using this in a notebook, could you imagine wanting to do other things with the data from animal_trophic_interactions.csv in the file? Would you want to filter it, plot other things etc?

If that is the case, then forcing a reload of that data within this class is much less flexible than having the class __init__ have a df argument that passes in an existing loaded dataframe. You have to pass it a dataframe with the right structure of course, but it is probably a better design for the code. This is sometimes called "Separation of concerns": your class about analysis of a particular dataset, not loading the dataset in the first place.

Comment on lines +173 to +185
def process_data(self) -> "TrophicFlowAnalysis":
"""Run all the processing steps in correct order.

Returns:
A processed dataframe.

"""
print("\nProcessing data...")
self.convert_units()
self.group_and_aggregate()
self.pivot_data()
print("Processing complete!")
return self
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't return self here. The way to do this is return None and have the object just update its own internal state in place.

Suggested change
def process_data(self) -> "TrophicFlowAnalysis":
"""Run all the processing steps in correct order.
Returns:
A processed dataframe.
"""
print("\nProcessing data...")
self.convert_units()
self.group_and_aggregate()
self.pivot_data()
print("Processing complete!")
return self
def process_data(self) -> None:
"""Run all the processing steps in correct order.
Updates the processed data attributes
"""
print("\nProcessing data...")
self.convert_units()
self.group_and_aggregate()
self.pivot_data()
print("Processing complete!")

Comment on lines +144 to +149
def pivot_data(
self,
index_col: str | None = None,
columns_col: str | None = None,
values_col: str | None = None,
) -> "TrophicFlowAnalysis":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't returning anything, just updating the class to populate self.pivoted_df

Suggested change
def pivot_data(
self,
index_col: str | None = None,
columns_col: str | None = None,
values_col: str | None = None,
) -> "TrophicFlowAnalysis":
def pivot_data(
self,
index_col: str | None = None,
columns_col: str | None = None,
values_col: str | None = None,
) -> None:

Comment on lines +69 to +83
def load_data(self) -> "TrophicFlowAnalysis":
"""Load the animal trophic interaction output.

Load data from the output CSV file from Virtual Ecosystem
into a dataframe.

Returns:
A dataframe.

"""

print("loading data...")
self.df = pd.read_csv(self.file_path, parse_dates=["time"])
print(f" Loaded {len(self.df)} rows, {len(self.df.columns)} columns")
return self
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def load_data(self) -> "TrophicFlowAnalysis":
"""Load the animal trophic interaction output.
Load data from the output CSV file from Virtual Ecosystem
into a dataframe.
Returns:
A dataframe.
"""
print("loading data...")
self.df = pd.read_csv(self.file_path, parse_dates=["time"])
print(f" Loaded {len(self.df)} rows, {len(self.df.columns)} columns")
return self
def load_data(self) -> None:
"""Load the animal trophic interaction output.
Load data from the output CSV file from Virtual Ecosystem
into a dataframe.
Returns:
A dataframe.
"""
print("loading data...")
self.df = pd.read_csv(self.file_path, parse_dates=["time"])
print(f" Loaded {len(self.df)} rows, {len(self.df.columns)} columns")

Comment on lines +85 to +102
def convert_units(
self, columns: list[str] | None = None, multiplier: float = 1000
) -> "TrophicFlowAnalysis":
"""Convert measurement units (default: from kg to g).

Args:
columns: columns to multiply.
multiplier: multiplier factor.

Returns:
A dataframe with units converted.

"""
cols = columns or self.config["column_values"]
if self.config["convert_to_grams"]:
print(f"Converting {cols} by {multiplier} to grams")
self.df[cols] = self.df[cols] * multiplier
return self
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def convert_units(
self, columns: list[str] | None = None, multiplier: float = 1000
) -> "TrophicFlowAnalysis":
"""Convert measurement units (default: from kg to g).
Args:
columns: columns to multiply.
multiplier: multiplier factor.
Returns:
A dataframe with units converted.
"""
cols = columns or self.config["column_values"]
if self.config["convert_to_grams"]:
print(f"Converting {cols} by {multiplier} to grams")
self.df[cols] = self.df[cols] * multiplier
return self
def convert_units(
self, columns: list[str] | None = None, multiplier: float = 1000
) -> None:
"""Convert measurement units (default: from kg to g).
Args:
columns: columns to multiply.
multiplier: multiplier factor.
Returns:
A dataframe with units converted.
"""
cols = columns or self.config["column_values"]
if self.config["convert_to_grams"]:
print(f"Converting {cols} by {multiplier} to grams")
self.df[cols] = self.df[cols] * multiplier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants