Synchronizing time information in outputs#1258
Conversation
Related to this, I'd be keen to hear any suggestions on how to streamline the input data being prepared with regards to the aspects above. Below I show part of the R script that generates the coordinates for the time dimension. At the moment, this approach is different from the one that the model uses (where 30.4 days = an average month). Which gives: I don't think my comment here needs to be solved immediately, and the desired approach may become apparent from further updates in #1246, but putting it here as it's highly related. If anything, it's good to stress the link with time_index and time variable in the input data. I guess for now the approach in R above doesn't hurt (as the time variable is not being used yet by the model). If at some point in the future the time variable (i.e., the time coordinates) in the input data and the model need to match, then the R script for generating the input data will need to follow the approach of 30.4 days (or any other duration chosen at that time). @davidorme do you think it would be better to prepare the input data using the 30.4 days as well (in the meantime until further decisions in #1246)? Or would this give issues due to 30.4 not being an integer value? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #1258 +/- ##
===========================================
+ Coverage 95.03% 95.05% +0.01%
===========================================
Files 71 71
Lines 7276 7298 +22
===========================================
+ Hits 6915 6937 +22
Misses 361 361 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Yup - I'd been thinking the same thing. Ultimately users need to be able to confirm what the timestamps will be in the model without having to run the model and test the failure mode! |
jacobcook1995
left a comment
There was a problem hiding this comment.
This looks like a sensible change to me. (Can't really comment on the animal exporter stuff though)
@arne-exe I think the function below is the best match to the current Python code for calculating the updates, and the values written out to file by this PR. generate_timestamps <- function(
start = "2010-01-01",
end = "2020-12-31",
interval_in_days = 30.4375
) {
# Get the start and end as datetime objects and find the runtime as a difftime
start <- as.POSIXct(start)
end <- as.POSIXct(end)
interval <- as.difftime(interval_in_days, units="days")
config_runtime <- (end - start)
# Check the difftimes use the same units
stopifnot(
attr(config_runtime, 'units') == "days" &&
attr(interval, 'units') == "days"
)
# Get the time sequence, which can extend the actual runtime to fit the last iteration
n_updates <- ceiling(unclass(config_runtime) / unclass(interval)) # Edited to fix @arne-exe's bug spot.
time_indices <- seq(0, n_updates-1)
diffs <- interval * time_indices
interval_starts <- start + diffs
# Converts start datetimes to dates, which truncates to day
interval_starts <- as.Date(interval_starts)
return(list(interval_starts=interval_starts, time_indices=time_indices))
}Which yields: r$> generate_timestamps()
$interval_starts
[1] "2010-01-01" "2010-01-31" "2010-03-02" "2010-04-02" "2010-05-02" "2010-06-02" "2010-07-02" "2010-08-02" "2010-09-01" "2010-10-01"
[11] "2010-11-01" "2010-12-01" "2011-01-01" "2011-01-31" "2011-03-03" "2011-04-02" "2011-05-03" "2011-06-02" "2011-07-02" "2011-08-02"
[21] "2011-09-01" "2011-10-02" "2011-11-01" "2011-12-02" "2012-01-01" "2012-01-31" "2012-03-02" "2012-04-01" "2012-05-02" "2012-06-01"
[31] "2012-07-02" "2012-08-01" "2012-09-01" "2012-10-01" "2012-10-31" "2012-12-01" "2012-12-31" "2013-01-31" "2013-03-02" "2013-04-02"
[41] "2013-05-02" "2013-06-01" "2013-07-02" "2013-08-01" "2013-09-01" "2013-10-01" "2013-11-01" "2013-12-01" "2014-01-01" "2014-01-31"
[51] "2014-03-02" "2014-04-02" "2014-05-02" "2014-06-02" "2014-07-02" "2014-08-02" "2014-09-01" "2014-10-01" "2014-11-01" "2014-12-01"
[61] "2015-01-01" "2015-01-31" "2015-03-03" "2015-04-02" "2015-05-03" "2015-06-02" "2015-07-02" "2015-08-02" "2015-09-01" "2015-10-02"
[71] "2015-11-01" "2015-12-02" "2016-01-01" "2016-01-31" "2016-03-02" "2016-04-01" "2016-05-02" "2016-06-01" "2016-07-02" "2016-08-01"
[81] "2016-09-01" "2016-10-01" "2016-10-31" "2016-12-01" "2016-12-31" "2017-01-31" "2017-03-02" "2017-04-02" "2017-05-02" "2017-06-01"
[91] "2017-07-02" "2017-08-01" "2017-09-01" "2017-10-01" "2017-11-01" "2017-12-01" "2018-01-01" "2018-01-31" "2018-03-02" "2018-04-02"
[101] "2018-05-02" "2018-06-02" "2018-07-02" "2018-08-02" "2018-09-01" "2018-10-01" "2018-11-01" "2018-12-01" "2019-01-01" "2019-01-31"
[111] "2019-03-03" "2019-04-02" "2019-05-03" "2019-06-02" "2019-07-02" "2019-08-02" "2019-09-01" "2019-10-02" "2019-11-01" "2019-12-02"
[121] "2020-01-01" "2020-01-31" "2020-03-02" "2020-04-01" "2020-05-02" "2020-06-01" "2020-07-02" "2020-08-01" "2020-09-01" "2020-10-01"
[131] "2020-10-31" "2020-12-01"
$time_indices
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
[36] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
[71] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
[106] 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
Thanks @davidorme, I will update the plant input data script accordingly so that it matches the dates instead of the days since start, and uses the 30.4375 interval length. The code below is pretty neat. I don't usually use checks like this but I can see why they are useful. A small adjustment for future reference, "r" and "d" are undefined in |
I did - I have updated the function in place. |
|
@TaranRallings Does the animal exporter change seem sensible? |
TaranRallings
left a comment
There was a problem hiding this comment.
LGTM. There will be some fiddling to align with my current work but that's best done by me anyway.
|
Thanks @TaranRallings. |
|
Thanks @davidorme for adding the time index in the exporter! |
Description
Feedback from the data science team on the time information in the outputs (#1257) is best summarized as: try harder.
We know there are broader problems with the timing (see #1246) but for the time being this PR:
pint) to theModelTimingobject.time_indexvalues differ between thefinal_state.ncand the continuous data output.ModelTimingdates and also adds the matching index.Important
There was also an annoying revealed issue.
The
test_cli_integration::pytest_ve_runtest was recently changed to set the end date in the example data to give a truncated two month run time to reduce the runtime of the test suite. That leads to a problem because the resulting coordinates for time in the truncated model (2 steps, 2 months) do not match the coordinates in the data sources (24 steps, 24 months) soxarrayquite rightly fails because we have inconsistent coordinates for the same dimension name.So... I've invented a new configuration option
core.debug.truncate_run_at_updatethat leaves the definition of the time coordinates well alone but just stops the simulation at a given update.I think this is a generally useful tool - users might have a dataset where they want to tinker and see how things run for the first N steps without having to continually truncate time series etc. With this, they can do that. I think we also want a core debug config for other options as well (like the logging level).
Fixes #1257
Type of change
Key checklist
pre-commitchecks:$ pre-commit run -a$ poetry run pytestFurther checks