Conversation
|
@nickwctan When we chatted in Teams, I hadn't realised that you wanted to push a datafile to act as part of a test suite for the new tool. That's quite a different use case than just uploading a datafile for people to check the code out with. I'm completely on board with adding a datafile that is used for code testing. It would be good to reduce the file size as much as possible though - maybe cut the file down to fewer time steps or cells? I think the way to do this is to have a On the location of the test files, there are a couple of choices of organising files, which includes having them within the code directory: https://docs.pytest.org/en/7.1.x/explanation/goodpractices.html It doesn't have to be a separate folder as in the VE repo. I wouldn't worry about this now, we can restructure as needed later. |
There was a problem hiding this comment.
@nickwctan This is really neat - hadn't realised you were also building in testing.
I've got basically two types of comments:
- You have added the
if __name__=="__main__":sections to both files, but neither of those really reflect how the files should be used in practice (for different reasons for the tests vs the tool). - Your methods on the tool are using
return selfa lot. That isn't generally how class methods are used in Python - if your method is updating the class instance (i.e. doingself.something = "a thing"then the method can simply returnNoneand the class instance that you ran the method on has been updated.
In [2]: class Test:
...:
...: def __init__(self):
...:
...: self.a = 12
...:
...: def update_a(self) -> None:
...:
...: self.a = 24
...:
...:
...: test = Test()
...: test.a
...:
Out[2]: 12
In [3]: test.update_a()
...: test.a
Out[3]: 24ETA:
-
You have a ton of
printstatements. That is great for debugging - do you actually want these unavoidably appearing in all your notebooks that use the tool? You can filter out printed messages in notebook code, but that isn't something users would normally expect to have to do.You could use the
loggingsystem to print these messages, which allows a user to suppress logging. I think you have to make a decision about whether these are debugging messages that were useful for you during coding (but should now just be deleted) or if they are actually a useful output that users might sometimes want (in which case,logging, but more complex!)
| # ================================================================ | ||
| # running the test | ||
| # ================================================================ | ||
|
|
||
| if __name__ == "__main__": | ||
| print("Running tests") | ||
| test_class_initialisation(analysis()) | ||
| test_load_data(analysis()) | ||
| test_process_data(analysis()) | ||
| print("All tests passed") |
There was a problem hiding this comment.
You wouldn't run the tests like this. You'd just do pytest test_trophic_mass_flow.py from the command line and pytest handles everything else.
A Python file does not have to have an if __name__ == "__main__": section so you can simply delete these lines.
| if __name__ == "__main__": # | ||
| """Run test separately. | ||
|
|
||
| Only run this chunk of codes when executed directly and not imported. | ||
| animal_trophic_interactions.csv can be generated when doing a ve_run. | ||
| This csv can be found in ".../out" | ||
| Place the csv file in the same folder as your py files. | ||
| """ | ||
| # create instance for analysis | ||
| analysis = TrophicFlowAnalysis( | ||
| "animal_trophic_interactions.csv", config={"cumulative": False, "n_cols": 2} | ||
| ) | ||
|
|
||
| # load and process data (method chaining) | ||
| analysis.load_data().process_data() | ||
|
|
||
| # summary | ||
| analysis.summary() | ||
|
|
||
| # create plots | ||
| analysis.plot_faceted(save_path="mass_flow_faceted.png") | ||
|
|
||
| print("\nDone") |
There was a problem hiding this comment.
Again, you'd never just execute this Python file like this, or need to move the output files. Instead, what you would do is - in the Python analysis script that uses this tool -
from ../../tools.python.trophic_mass_flow import TrophicFlowAnalysis
analysis = TrophicFlowAnalysis('../../data/scenarios/scenario/out/animal_trophic_interactions.csv')There will be some things to sort out there with the relative locations of the tool module that I need to think about, but I would again remove this and maybe move the code into the docstring to show the workflow.
| print(f"trophic flow initialised for {file_path}") | ||
|
|
There was a problem hiding this comment.
I think breaking out the functionality into methods here is good practice, but in terms of actual users is anyone ever going to create a TrophicFlowAnalysis instance without loading and processing the data?
If we include calling those methods in __init__ then the usage is simpler.
| print(f"trophic flow initialised for {file_path}") | |
| print(f"trophic flow initialised for {file_path}") | |
| self.load_data() | |
| self.process_data() |
There was a problem hiding this comment.
And actually, linking over to @Baizurah's recent PR (#241) - if you were using this in a notebook, could you imagine wanting to do other things with the data from animal_trophic_interactions.csv in the file? Would you want to filter it, plot other things etc?
If that is the case, then forcing a reload of that data within this class is much less flexible than having the class __init__ have a df argument that passes in an existing loaded dataframe. You have to pass it a dataframe with the right structure of course, but it is probably a better design for the code. This is sometimes called "Separation of concerns": your class about analysis of a particular dataset, not loading the dataset in the first place.
| def process_data(self) -> "TrophicFlowAnalysis": | ||
| """Run all the processing steps in correct order. | ||
|
|
||
| Returns: | ||
| A processed dataframe. | ||
|
|
||
| """ | ||
| print("\nProcessing data...") | ||
| self.convert_units() | ||
| self.group_and_aggregate() | ||
| self.pivot_data() | ||
| print("Processing complete!") | ||
| return self |
There was a problem hiding this comment.
Don't return self here. The way to do this is return None and have the object just update its own internal state in place.
| def process_data(self) -> "TrophicFlowAnalysis": | |
| """Run all the processing steps in correct order. | |
| Returns: | |
| A processed dataframe. | |
| """ | |
| print("\nProcessing data...") | |
| self.convert_units() | |
| self.group_and_aggregate() | |
| self.pivot_data() | |
| print("Processing complete!") | |
| return self | |
| def process_data(self) -> None: | |
| """Run all the processing steps in correct order. | |
| Updates the processed data attributes | |
| """ | |
| print("\nProcessing data...") | |
| self.convert_units() | |
| self.group_and_aggregate() | |
| self.pivot_data() | |
| print("Processing complete!") |
| def pivot_data( | ||
| self, | ||
| index_col: str | None = None, | ||
| columns_col: str | None = None, | ||
| values_col: str | None = None, | ||
| ) -> "TrophicFlowAnalysis": |
There was a problem hiding this comment.
This isn't returning anything, just updating the class to populate self.pivoted_df
| def pivot_data( | |
| self, | |
| index_col: str | None = None, | |
| columns_col: str | None = None, | |
| values_col: str | None = None, | |
| ) -> "TrophicFlowAnalysis": | |
| def pivot_data( | |
| self, | |
| index_col: str | None = None, | |
| columns_col: str | None = None, | |
| values_col: str | None = None, | |
| ) -> None: |
| def load_data(self) -> "TrophicFlowAnalysis": | ||
| """Load the animal trophic interaction output. | ||
|
|
||
| Load data from the output CSV file from Virtual Ecosystem | ||
| into a dataframe. | ||
|
|
||
| Returns: | ||
| A dataframe. | ||
|
|
||
| """ | ||
|
|
||
| print("loading data...") | ||
| self.df = pd.read_csv(self.file_path, parse_dates=["time"]) | ||
| print(f" Loaded {len(self.df)} rows, {len(self.df.columns)} columns") | ||
| return self |
There was a problem hiding this comment.
| def load_data(self) -> "TrophicFlowAnalysis": | |
| """Load the animal trophic interaction output. | |
| Load data from the output CSV file from Virtual Ecosystem | |
| into a dataframe. | |
| Returns: | |
| A dataframe. | |
| """ | |
| print("loading data...") | |
| self.df = pd.read_csv(self.file_path, parse_dates=["time"]) | |
| print(f" Loaded {len(self.df)} rows, {len(self.df.columns)} columns") | |
| return self | |
| def load_data(self) -> None: | |
| """Load the animal trophic interaction output. | |
| Load data from the output CSV file from Virtual Ecosystem | |
| into a dataframe. | |
| Returns: | |
| A dataframe. | |
| """ | |
| print("loading data...") | |
| self.df = pd.read_csv(self.file_path, parse_dates=["time"]) | |
| print(f" Loaded {len(self.df)} rows, {len(self.df.columns)} columns") |
| def convert_units( | ||
| self, columns: list[str] | None = None, multiplier: float = 1000 | ||
| ) -> "TrophicFlowAnalysis": | ||
| """Convert measurement units (default: from kg to g). | ||
|
|
||
| Args: | ||
| columns: columns to multiply. | ||
| multiplier: multiplier factor. | ||
|
|
||
| Returns: | ||
| A dataframe with units converted. | ||
|
|
||
| """ | ||
| cols = columns or self.config["column_values"] | ||
| if self.config["convert_to_grams"]: | ||
| print(f"Converting {cols} by {multiplier} to grams") | ||
| self.df[cols] = self.df[cols] * multiplier | ||
| return self |
There was a problem hiding this comment.
| def convert_units( | |
| self, columns: list[str] | None = None, multiplier: float = 1000 | |
| ) -> "TrophicFlowAnalysis": | |
| """Convert measurement units (default: from kg to g). | |
| Args: | |
| columns: columns to multiply. | |
| multiplier: multiplier factor. | |
| Returns: | |
| A dataframe with units converted. | |
| """ | |
| cols = columns or self.config["column_values"] | |
| if self.config["convert_to_grams"]: | |
| print(f"Converting {cols} by {multiplier} to grams") | |
| self.df[cols] = self.df[cols] * multiplier | |
| return self | |
| def convert_units( | |
| self, columns: list[str] | None = None, multiplier: float = 1000 | |
| ) -> None: | |
| """Convert measurement units (default: from kg to g). | |
| Args: | |
| columns: columns to multiply. | |
| multiplier: multiplier factor. | |
| Returns: | |
| A dataframe with units converted. | |
| """ | |
| cols = columns or self.config["column_values"] | |
| if self.config["convert_to_grams"]: | |
| print(f"Converting {cols} by {multiplier} to grams") | |
| self.df[cols] = self.df[cols] * multiplier | |
This PR adds an analytical tool to visualise the animal trophic interaction output from Virtual Ecosystem and calculates the trophic mass flow according to carbon, nitrogen or phosphorus.
Currently, by default, it tracks the carbon sum of resource pool (AKA resource kind) of all consumer_cohort in each timestep. This outputs faceted plot for visualisation purposes.
I envision that this will be extended in the future to create multiple graphical outputs to analyse trophic interactions.
@davidorme Is there a way we can mimic the pytest pathing in Virtual Ecosystem so that it also works similarly in the ve_data_science repo? At the moment I have placed the test file in the same folder, which is not ideal.