Synchronizing time information in outputs by davidorme · Pull Request #1258 · ImperialCollegeLondon/virtual_ecosystem

davidorme · 2026-01-13T15:35:08Z

Description

Feedback from the data science team on the time information in the outputs (#1257) is best summarized as: try harder.

We know there are broader problems with the timing (see #1246) but for the time being this PR:

Adds a sequence of approximate dates for the start of each update (approximate because we're not incrementing by whole number of days thanks to pint) to the ModelTiming object.
Writes those dates to the NetCDF outputs.
Fixes an off-by-one error that made the time_index values differ between the final_state.nc and the continuous data output.
Updates the plant community data export to use the ModelTiming dates and also adds the matching index.
Updates the animal data exporter to provide the same info.
@TaranRallings For the animal exporter, I've updated the exporter class so that it always outputs the time, cohort and cell id data. Those field cannot now be deselected in the config - I can't see any way that data exported without those fields would be usable.

Important

There was also an annoying revealed issue.

The test_cli_integration::pytest_ve_run test was recently changed to set the end date in the example data to give a truncated two month run time to reduce the runtime of the test suite. That leads to a problem because the resulting coordinates for time in the truncated model (2 steps, 2 months) do not match the coordinates in the data sources (24 steps, 24 months) so xarray quite rightly fails because we have inconsistent coordinates for the same dimension name.

So... I've invented a new configuration option core.debug.truncate_run_at_update that leaves the definition of the time coordinates well alone but just stops the simulation at a given update.

I think this is a generally useful tool - users might have a dataset where they want to tinker and see how things run for the first N steps without having to continually truncate time series etc. With this, they can do that. I think we also want a core debug config for other options as well (like the logging level).

Fixes #1257

Type of change

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)

Key checklist

Make sure you've run the pre-commit checks: $ pre-commit run -a
All tests pass: $ poetry run pytest

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works
Relevant documentation reviewed and updated

arne-exe · 2026-01-14T01:58:37Z

Adds a sequence of approximate dates for the start of each update (approximate because we're not incrementing by whole number of days thanks to pint) to the ModelTiming object.

Writes those dates to the NetCDF outputs.

Fixes an off-by-one error that made the time_index values differ between the final_state.nc and the continuous data output.

Updates the plant community data export to use the ModelTiming dates and also adds the matching index.

Related to this, I'd be keen to hear any suggestions on how to streamline the input data being prepared with regards to the aspects above.

Below I show part of the R script that generates the coordinates for the time dimension. At the moment, this approach is different from the one that the model uses (where 30.4 days = an average month).

# time_index
# The time_index depends on the intended runtime of the simulation
# For the Maliau site, use 11 years (2010-2020) with monthly intervals and
# express using days since origin (in this case 2010-01-01)
generate_monthly_timestamps <- function(
  start = "2010-01-01",
  end = "2020-12-31",
  origin = "2010-01-01"
) {
  time <- seq(as.Date(start), as.Date(end), by = "month")
  as.numeric(difftime(time, as.Date(origin), units = "days"))
}

time <- generate_monthly_timestamps()
time

time_index <- 0:(length(time) - 1)

Which gives:

> time
  [1]    0   31   59   90  120  151  181  212  243  273  304  334  365  396  424  455  485  516  546  577  608  638  669  699  730  761
 [27]  790  821  851  882  912  943  974 1004 1035 1065 1096 1127 1155 1186 1216 1247 1277 1308 1339 1369 1400 1430 1461 1492 1520 1551
 [53] 1581 1612 1642 1673 1704 1734 1765 1795 1826 1857 1885 1916 1946 1977 2007 2038 2069 2099 2130 2160 2191 2222 2251 2282 2312 2343
 [79] 2373 2404 2435 2465 2496 2526 2557 2588 2616 2647 2677 2708 2738 2769 2800 2830 2861 2891 2922 2953 2981 3012 3042 3073 3103 3134
[105] 3165 3195 3226 3256 3287 3318 3346 3377 3407 3438 3468 3499 3530 3560 3591 3621 3652 3683 3712 3743 3773 3804 3834 3865 3896 3926
[131] 3957 3987

I don't think my comment here needs to be solved immediately, and the desired approach may become apparent from further updates in #1246, but putting it here as it's highly related. If anything, it's good to stress the link with time_index and time variable in the input data.

I guess for now the approach in R above doesn't hurt (as the time variable is not being used yet by the model). If at some point in the future the time variable (i.e., the time coordinates) in the input data and the model need to match, then the R script for generating the input data will need to follow the approach of 30.4 days (or any other duration chosen at that time).

@davidorme do you think it would be better to prepare the input data using the 30.4 days as well (in the meantime until further decisions in #1246)? Or would this give issues due to 30.4 not being an integer value?

codecov-commenter · 2026-01-14T10:16:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.05%. Comparing base (3913f2b) to head (dcabb12).
⚠️ Report is 8 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1258      +/-   ##
===========================================
+ Coverage    95.03%   95.05%   +0.01%     
===========================================
  Files           71       71              
  Lines         7276     7298      +22     
===========================================
+ Hits          6915     6937      +22     
  Misses         361      361

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

davidorme · 2026-01-14T11:51:45Z

Related to this, I'd be keen to hear any suggestions on how to streamline the input data being prepared with regards to the aspects above.

Yup - I'd been thinking the same thing. Ultimately users need to be able to confirm what the timestamps will be in the model without having to run the model and test the failure mode!

jacobcook1995

This looks like a sensible change to me. (Can't really comment on the animal exporter stuff though)

davidorme · 2026-01-16T08:51:45Z

Related to this, I'd be keen to hear any suggestions on how to streamline the input data being prepared with regards to the aspects above.

@arne-exe I think the function below is the best match to the current Python code for calculating the updates, and the values written out to file by this PR.

generate_timestamps <- function(
  start = "2010-01-01",
  end = "2020-12-31",
  interval_in_days = 30.4375
) {

  # Get the start and end as datetime objects and find the runtime as a difftime
  start <- as.POSIXct(start)
  end <- as.POSIXct(end)
  interval <- as.difftime(interval_in_days, units="days")
  config_runtime <- (end - start)

  # Check the difftimes use the same units
  stopifnot(
    attr(config_runtime, 'units') == "days"  && 
    attr(interval, 'units') == "days"
  )

  # Get the time sequence, which can extend the actual runtime to fit the last iteration 
  n_updates <- ceiling(unclass(config_runtime) / unclass(interval)) # Edited to fix @arne-exe's bug spot.
  time_indices <- seq(0, n_updates-1)
  diffs <- interval * time_indices
  interval_starts <- start + diffs

  # Converts start datetimes to dates, which truncates to day
  interval_starts <- as.Date(interval_starts)

  return(list(interval_starts=interval_starts, time_indices=time_indices))

}

Which yields:

r$> generate_timestamps()
$interval_starts
  [1] "2010-01-01" "2010-01-31" "2010-03-02" "2010-04-02" "2010-05-02" "2010-06-02" "2010-07-02" "2010-08-02" "2010-09-01" "2010-10-01"
 [11] "2010-11-01" "2010-12-01" "2011-01-01" "2011-01-31" "2011-03-03" "2011-04-02" "2011-05-03" "2011-06-02" "2011-07-02" "2011-08-02"
 [21] "2011-09-01" "2011-10-02" "2011-11-01" "2011-12-02" "2012-01-01" "2012-01-31" "2012-03-02" "2012-04-01" "2012-05-02" "2012-06-01"
 [31] "2012-07-02" "2012-08-01" "2012-09-01" "2012-10-01" "2012-10-31" "2012-12-01" "2012-12-31" "2013-01-31" "2013-03-02" "2013-04-02"
 [41] "2013-05-02" "2013-06-01" "2013-07-02" "2013-08-01" "2013-09-01" "2013-10-01" "2013-11-01" "2013-12-01" "2014-01-01" "2014-01-31"
 [51] "2014-03-02" "2014-04-02" "2014-05-02" "2014-06-02" "2014-07-02" "2014-08-02" "2014-09-01" "2014-10-01" "2014-11-01" "2014-12-01"
 [61] "2015-01-01" "2015-01-31" "2015-03-03" "2015-04-02" "2015-05-03" "2015-06-02" "2015-07-02" "2015-08-02" "2015-09-01" "2015-10-02"
 [71] "2015-11-01" "2015-12-02" "2016-01-01" "2016-01-31" "2016-03-02" "2016-04-01" "2016-05-02" "2016-06-01" "2016-07-02" "2016-08-01"
 [81] "2016-09-01" "2016-10-01" "2016-10-31" "2016-12-01" "2016-12-31" "2017-01-31" "2017-03-02" "2017-04-02" "2017-05-02" "2017-06-01"
 [91] "2017-07-02" "2017-08-01" "2017-09-01" "2017-10-01" "2017-11-01" "2017-12-01" "2018-01-01" "2018-01-31" "2018-03-02" "2018-04-02"
[101] "2018-05-02" "2018-06-02" "2018-07-02" "2018-08-02" "2018-09-01" "2018-10-01" "2018-11-01" "2018-12-01" "2019-01-01" "2019-01-31"
[111] "2019-03-03" "2019-04-02" "2019-05-03" "2019-06-02" "2019-07-02" "2019-08-02" "2019-09-01" "2019-10-02" "2019-11-01" "2019-12-02"
[121] "2020-01-01" "2020-01-31" "2020-03-02" "2020-04-01" "2020-05-02" "2020-06-01" "2020-07-02" "2020-08-01" "2020-09-01" "2020-10-01"
[131] "2020-10-31" "2020-12-01"

$time_indices
  [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
 [36]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69
 [71]  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104
[106] 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131

arne-exe · 2026-01-19T03:43:31Z

@arne-exe I think the function below is the best match to the current Python code for calculating the updates, and the values written out to file by this PR.

Thanks @davidorme, I will update the plant input data script accordingly so that it matches the dates instead of the days since start, and uses the 30.4375 interval length.

The code below is pretty neat. I don't usually use checks like this but I can see why they are useful.

  # Check the difftimes use the same units
  stopifnot(
    attr(config_runtime, 'units') == "days"  && 
    attr(interval, 'units') == "days"
  )

A small adjustment for future reference, "r" and "d" are undefined in n_updates <- ceiling(unclass(r)/unclass(d)), and I think you meant to use ceiling(unclass(config_runtime) / unclass(interval)).

davidorme · 2026-01-19T10:33:49Z

A small adjustment for future reference, "r" and "d" are undefined in n_updates <- ceiling(unclass(r)/unclass(d)), and I think you meant to use ceiling(unclass(config_runtime) / unclass(interval)).

I did - I have updated the function in place.

davidorme · 2026-01-19T10:35:52Z

@TaranRallings Does the animal exporter change seem sensible?

TaranRallings

LGTM. There will be some fiddling to align with my current work but that's best done by me anyway.

davidorme · 2026-01-19T12:39:15Z

Thanks @TaranRallings.

nickwctan · 2026-01-20T06:50:28Z

Thanks @davidorme for adding the time index in the exporter!

davidorme added 5 commits January 13, 2026 14:02

Added approx datestamps to model timing.

42da6b8

Test that output has the right length

6e493e2

Increment time_index _after_ writing continuous data not before

1b2a2f8

Adding timestamps to NetCDF outputs

a137d6d

Adding time_index into plant CSV output alongside timestamp

0f0fd3a

davidorme linked an issue Jan 13, 2026 that may be closed by this pull request

Differences in time_index dimension of example output data #1257

Closed

Fixing tests

42440d6

Fixing issue with test truncation

be61520

davidorme added 4 commits January 14, 2026 10:45

Fix docstring issue

cf1c642

Updating animal exporter to match plants

9805ba6

Fixed animal tests

78633d5

Making animal exporter always include time,cell and cohort

434f7d2

Update animal exporter tests for new required/available split

d9b7cc3

davidorme requested review from TaranRallings and jacobcook1995 January 14, 2026 12:15

jacobcook1995 approved these changes Jan 16, 2026

View reviewed changes

arne-exe mentioned this pull request Jan 19, 2026

Scenario Parameterization & Maliau Dataset ImperialCollegeLondon/ve_data_science#158

Closed

TaranRallings approved these changes Jan 19, 2026

View reviewed changes

Merge branch 'develop' into 1257-time-index-woes-part-one

dcabb12

davidorme merged commit 8368bb1 into develop Jan 19, 2026
13 checks passed

davidorme deleted the 1257-time-index-woes-part-one branch January 19, 2026 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronizing time information in outputs#1258

Synchronizing time information in outputs#1258
davidorme merged 13 commits intodevelopfrom
1257-time-index-woes-part-one

davidorme commented Jan 13, 2026 •

edited

Loading

Uh oh!

arne-exe commented Jan 14, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 14, 2026 •

edited

Loading

Uh oh!

davidorme commented Jan 14, 2026

Uh oh!

jacobcook1995 left a comment

Uh oh!

davidorme commented Jan 16, 2026 •

edited

Loading

Uh oh!

arne-exe commented Jan 19, 2026

Uh oh!

davidorme commented Jan 19, 2026

Uh oh!

davidorme commented Jan 19, 2026

Uh oh!

TaranRallings left a comment

Uh oh!

davidorme commented Jan 19, 2026

Uh oh!

Uh oh!

nickwctan commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

davidorme commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Key checklist

Further checks

Uh oh!

arne-exe commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

davidorme commented Jan 14, 2026

Uh oh!

jacobcook1995 left a comment

Choose a reason for hiding this comment

Uh oh!

davidorme commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arne-exe commented Jan 19, 2026

Uh oh!

davidorme commented Jan 19, 2026

Uh oh!

davidorme commented Jan 19, 2026

Uh oh!

TaranRallings left a comment

Choose a reason for hiding this comment

Uh oh!

davidorme commented Jan 19, 2026

Uh oh!

Uh oh!

nickwctan commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

davidorme commented Jan 13, 2026 •

edited

Loading

arne-exe commented Jan 14, 2026 •

edited

Loading

codecov-commenter commented Jan 14, 2026 •

edited

Loading

davidorme commented Jan 16, 2026 •

edited

Loading