Skip to content

Synchronizing time information in outputs#1258

Merged
davidorme merged 13 commits intodevelopfrom
1257-time-index-woes-part-one
Jan 19, 2026
Merged

Synchronizing time information in outputs#1258
davidorme merged 13 commits intodevelopfrom
1257-time-index-woes-part-one

Conversation

@davidorme
Copy link
Copy Markdown
Collaborator

@davidorme davidorme commented Jan 13, 2026

Description

Feedback from the data science team on the time information in the outputs (#1257) is best summarized as: try harder.

We know there are broader problems with the timing (see #1246) but for the time being this PR:

  • Adds a sequence of approximate dates for the start of each update (approximate because we're not incrementing by whole number of days thanks to pint) to the ModelTiming object.
  • Writes those dates to the NetCDF outputs.
  • Fixes an off-by-one error that made the time_index values differ between the final_state.nc and the continuous data output.
  • Updates the plant community data export to use the ModelTiming dates and also adds the matching index.
  • Updates the animal data exporter to provide the same info.
  • @TaranRallings For the animal exporter, I've updated the exporter class so that it always outputs the time, cohort and cell id data. Those field cannot now be deselected in the config - I can't see any way that data exported without those fields would be usable.

Important

There was also an annoying revealed issue.

The test_cli_integration::pytest_ve_run test was recently changed to set the end date in the example data to give a truncated two month run time to reduce the runtime of the test suite. That leads to a problem because the resulting coordinates for time in the truncated model (2 steps, 2 months) do not match the coordinates in the data sources (24 steps, 24 months) so xarray quite rightly fails because we have inconsistent coordinates for the same dimension name.

So... I've invented a new configuration option core.debug.truncate_run_at_update that leaves the definition of the time coordinates well alone but just stops the simulation at a given update.

I think this is a generally useful tool - users might have a dataset where they want to tinker and see how things run for the first N steps without having to continually truncate time series etc. With this, they can do that. I think we also want a core debug config for other options as well (like the logging level).

Fixes #1257

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works
  • Relevant documentation reviewed and updated

@davidorme davidorme linked an issue Jan 13, 2026 that may be closed by this pull request
@arne-exe
Copy link
Copy Markdown
Collaborator

arne-exe commented Jan 14, 2026

  • Adds a sequence of approximate dates for the start of each update (approximate because we're not incrementing by whole number of days thanks to pint) to the ModelTiming object.
  • Writes those dates to the NetCDF outputs.
  • Fixes an off-by-one error that made the time_index values differ between the final_state.nc and the continuous data output.
  • Updates the plant community data export to use the ModelTiming dates and also adds the matching index.

Related to this, I'd be keen to hear any suggestions on how to streamline the input data being prepared with regards to the aspects above.

Below I show part of the R script that generates the coordinates for the time dimension. At the moment, this approach is different from the one that the model uses (where 30.4 days = an average month).

# time_index
# The time_index depends on the intended runtime of the simulation
# For the Maliau site, use 11 years (2010-2020) with monthly intervals and
# express using days since origin (in this case 2010-01-01)
generate_monthly_timestamps <- function(
  start = "2010-01-01",
  end = "2020-12-31",
  origin = "2010-01-01"
) {
  time <- seq(as.Date(start), as.Date(end), by = "month")
  as.numeric(difftime(time, as.Date(origin), units = "days"))
}

time <- generate_monthly_timestamps()
time

time_index <- 0:(length(time) - 1)

Which gives:

> time
  [1]    0   31   59   90  120  151  181  212  243  273  304  334  365  396  424  455  485  516  546  577  608  638  669  699  730  761
 [27]  790  821  851  882  912  943  974 1004 1035 1065 1096 1127 1155 1186 1216 1247 1277 1308 1339 1369 1400 1430 1461 1492 1520 1551
 [53] 1581 1612 1642 1673 1704 1734 1765 1795 1826 1857 1885 1916 1946 1977 2007 2038 2069 2099 2130 2160 2191 2222 2251 2282 2312 2343
 [79] 2373 2404 2435 2465 2496 2526 2557 2588 2616 2647 2677 2708 2738 2769 2800 2830 2861 2891 2922 2953 2981 3012 3042 3073 3103 3134
[105] 3165 3195 3226 3256 3287 3318 3346 3377 3407 3438 3468 3499 3530 3560 3591 3621 3652 3683 3712 3743 3773 3804 3834 3865 3896 3926
[131] 3957 3987

I don't think my comment here needs to be solved immediately, and the desired approach may become apparent from further updates in #1246, but putting it here as it's highly related. If anything, it's good to stress the link with time_index and time variable in the input data.

I guess for now the approach in R above doesn't hurt (as the time variable is not being used yet by the model). If at some point in the future the time variable (i.e., the time coordinates) in the input data and the model need to match, then the R script for generating the input data will need to follow the approach of 30.4 days (or any other duration chosen at that time).

@davidorme do you think it would be better to prepare the input data using the 30.4 days as well (in the meantime until further decisions in #1246)? Or would this give issues due to 30.4 not being an integer value?

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.05%. Comparing base (3913f2b) to head (dcabb12).
⚠️ Report is 8 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1258      +/-   ##
===========================================
+ Coverage    95.03%   95.05%   +0.01%     
===========================================
  Files           71       71              
  Lines         7276     7298      +22     
===========================================
+ Hits          6915     6937      +22     
  Misses         361      361              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@davidorme
Copy link
Copy Markdown
Collaborator Author

Related to this, I'd be keen to hear any suggestions on how to streamline the input data being prepared with regards to the aspects above.

Yup - I'd been thinking the same thing. Ultimately users need to be able to confirm what the timestamps will be in the model without having to run the model and test the failure mode!

Copy link
Copy Markdown
Collaborator

@jacobcook1995 jacobcook1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a sensible change to me. (Can't really comment on the animal exporter stuff though)

@davidorme
Copy link
Copy Markdown
Collaborator Author

davidorme commented Jan 16, 2026

Related to this, I'd be keen to hear any suggestions on how to streamline the input data being prepared with regards to the aspects above.

@arne-exe I think the function below is the best match to the current Python code for calculating the updates, and the values written out to file by this PR.

generate_timestamps <- function(
  start = "2010-01-01",
  end = "2020-12-31",
  interval_in_days = 30.4375
) {

  # Get the start and end as datetime objects and find the runtime as a difftime
  start <- as.POSIXct(start)
  end <- as.POSIXct(end)
  interval <- as.difftime(interval_in_days, units="days")
  config_runtime <- (end - start)

  # Check the difftimes use the same units
  stopifnot(
    attr(config_runtime, 'units') == "days"  && 
    attr(interval, 'units') == "days"
  )

  # Get the time sequence, which can extend the actual runtime to fit the last iteration 
  n_updates <- ceiling(unclass(config_runtime) / unclass(interval)) # Edited to fix @arne-exe's bug spot.
  time_indices <- seq(0, n_updates-1)
  diffs <- interval * time_indices
  interval_starts <- start + diffs

  # Converts start datetimes to dates, which truncates to day
  interval_starts <- as.Date(interval_starts)

  return(list(interval_starts=interval_starts, time_indices=time_indices))

}

Which yields:

r$> generate_timestamps()
$interval_starts
  [1] "2010-01-01" "2010-01-31" "2010-03-02" "2010-04-02" "2010-05-02" "2010-06-02" "2010-07-02" "2010-08-02" "2010-09-01" "2010-10-01"
 [11] "2010-11-01" "2010-12-01" "2011-01-01" "2011-01-31" "2011-03-03" "2011-04-02" "2011-05-03" "2011-06-02" "2011-07-02" "2011-08-02"
 [21] "2011-09-01" "2011-10-02" "2011-11-01" "2011-12-02" "2012-01-01" "2012-01-31" "2012-03-02" "2012-04-01" "2012-05-02" "2012-06-01"
 [31] "2012-07-02" "2012-08-01" "2012-09-01" "2012-10-01" "2012-10-31" "2012-12-01" "2012-12-31" "2013-01-31" "2013-03-02" "2013-04-02"
 [41] "2013-05-02" "2013-06-01" "2013-07-02" "2013-08-01" "2013-09-01" "2013-10-01" "2013-11-01" "2013-12-01" "2014-01-01" "2014-01-31"
 [51] "2014-03-02" "2014-04-02" "2014-05-02" "2014-06-02" "2014-07-02" "2014-08-02" "2014-09-01" "2014-10-01" "2014-11-01" "2014-12-01"
 [61] "2015-01-01" "2015-01-31" "2015-03-03" "2015-04-02" "2015-05-03" "2015-06-02" "2015-07-02" "2015-08-02" "2015-09-01" "2015-10-02"
 [71] "2015-11-01" "2015-12-02" "2016-01-01" "2016-01-31" "2016-03-02" "2016-04-01" "2016-05-02" "2016-06-01" "2016-07-02" "2016-08-01"
 [81] "2016-09-01" "2016-10-01" "2016-10-31" "2016-12-01" "2016-12-31" "2017-01-31" "2017-03-02" "2017-04-02" "2017-05-02" "2017-06-01"
 [91] "2017-07-02" "2017-08-01" "2017-09-01" "2017-10-01" "2017-11-01" "2017-12-01" "2018-01-01" "2018-01-31" "2018-03-02" "2018-04-02"
[101] "2018-05-02" "2018-06-02" "2018-07-02" "2018-08-02" "2018-09-01" "2018-10-01" "2018-11-01" "2018-12-01" "2019-01-01" "2019-01-31"
[111] "2019-03-03" "2019-04-02" "2019-05-03" "2019-06-02" "2019-07-02" "2019-08-02" "2019-09-01" "2019-10-02" "2019-11-01" "2019-12-02"
[121] "2020-01-01" "2020-01-31" "2020-03-02" "2020-04-01" "2020-05-02" "2020-06-01" "2020-07-02" "2020-08-01" "2020-09-01" "2020-10-01"
[131] "2020-10-31" "2020-12-01"

$time_indices
  [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
 [36]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69
 [71]  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104
[106] 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131

@arne-exe
Copy link
Copy Markdown
Collaborator

@arne-exe I think the function below is the best match to the current Python code for calculating the updates, and the values written out to file by this PR.

Thanks @davidorme, I will update the plant input data script accordingly so that it matches the dates instead of the days since start, and uses the 30.4375 interval length.

The code below is pretty neat. I don't usually use checks like this but I can see why they are useful.

  # Check the difftimes use the same units
  stopifnot(
    attr(config_runtime, 'units') == "days"  && 
    attr(interval, 'units') == "days"
  )

A small adjustment for future reference, "r" and "d" are undefined in n_updates <- ceiling(unclass(r)/unclass(d)), and I think you meant to use ceiling(unclass(config_runtime) / unclass(interval)).

@davidorme
Copy link
Copy Markdown
Collaborator Author

A small adjustment for future reference, "r" and "d" are undefined in n_updates <- ceiling(unclass(r)/unclass(d)), and I think you meant to use ceiling(unclass(config_runtime) / unclass(interval)).

I did - I have updated the function in place.

@davidorme
Copy link
Copy Markdown
Collaborator Author

@TaranRallings Does the animal exporter change seem sensible?

Copy link
Copy Markdown
Collaborator

@TaranRallings TaranRallings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There will be some fiddling to align with my current work but that's best done by me anyway.

@davidorme
Copy link
Copy Markdown
Collaborator Author

Thanks @TaranRallings.

@davidorme davidorme merged commit 8368bb1 into develop Jan 19, 2026
13 checks passed
@davidorme davidorme deleted the 1257-time-index-woes-part-one branch January 19, 2026 13:44
@nickwctan
Copy link
Copy Markdown
Collaborator

Thanks @davidorme for adding the time index in the exporter!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Differences in time_index dimension of example output data

6 participants