Skip to content

Conversation

@davxy
Copy link
Owner

@davxy davxy commented Jun 19, 2025

✅ Please report your success processing these!


Protocol version 0.6.6 $^*$

  • Extended the fetch host call with new variants.
  • Updated numeric identifiers used in fetch.
  • Updated numeric identifiers for PVM errors.
  • PVM wrangled operands changed.
  • Removed the traces 000...000.bin/json step, as it was not a valid trace step and was intended to be handled specially for genesis. Since it shared the same format as regular trace steps, it could be ambiguous or misleading.
  • Introduced an explicit genesis.bin file containing the genesis state and header.
  • The authorizer trace field has been moved to the end of the accumulation operand encoding (C.29)

(*) WARNING: DEVIATIONS

fetch host call for protocol parameters ($\omega_{10}=0$) has been implemented according to this (currently) unreleased change: gavofyork/graypaper#414

For the fetch hostcall id we're still using 18 as per GP 0.6.6. The picked change only concerns the value returned for w_10=0

Screenshot_2025-06-21-21-09-27-15_3aea4af51f236e4932235fdada7d1643


Closes: #53 #79

@sourabhniyogi
Copy link

As we just did all our 0.6.6 + 0.6.7 changes together this week (where there are many more representation changes in 0.6.7 [C16, new elements of Service Accounts, storage keys read/write, etc]) we hope to get in sync in 0.6.7.

@clearloop
Copy link

by updating the test vectors, accumulate, reports_l0 and reports_l1 got affected, at the test test_accumulate_ready_queued_reports_1, we got panicked message:

     WARN program: Panic message: panicked at crates/jam-pvm-common/src/host_calls.rs:209:46:
    host call returns correct type; qed: Error 

which is confused, bcz if returns correct type, why panicked, and the test is broken on storage mismatched, hope that can publish jam-pvm-common-{version}-rc0 that we can know what happens in the inner logic of the program!

@danicuki
Copy link

when we run traces with new Operand encoding, they fail. We need to use the 0.6.5 operand encoding to pass. In particular, I believe the vectors the ↕xo position is not correct (should be in the end of the encoding according to 0.6.6

@davxy
Copy link
Owner Author

davxy commented Jun 20, 2025

our jam published crates are for 0.6.5, which means the can't work with new vectors.

If you are using them, pls ask in the channel for an update

@davxy
Copy link
Owner Author

davxy commented Jun 20, 2025

In particular, I believe the vectors the ↕xo position is not correct (should be in the end of the encoding according to 0.6.6

@danicuki wdym here? There is no ↕xo in new operators encoding. has been replaced by |o|

@davxy
Copy link
Owner Author

davxy commented Jun 20, 2025

by updating the test vectors, accumulate, reports_l0 and reports_l1 got affected, at the test test_accumulate_ready_queued_reports_1, we got panicked message:

     WARN program: Panic message: panicked at crates/jam-pvm-common/src/host_calls.rs:209:46:
    host call returns correct type; qed: Error 

which is confused, bcz if returns correct type, why panicked, and the test is broken on storage mismatched, hope that can publish jam-pvm-common-{version}-rc0 that we can know what happens in the inner logic of the program!

you mean that the program embedded with the vector panics?

@clearloop
Copy link

clearloop commented Jun 20, 2025

WARN program: Panic message: panicked at crates/jam-pvm-common/src/host_calls.rs:209:46:
host call returns correct type; qed: Error

you mean that the program embedded with the vector panics?

yes, this logging message is emitted by the host call 100, that there could be a logic in

https://github.com/davxy/jam-test-vectors/blob/v0.6.6-rc0/stf/accumulate/test-service/src/lib.rs#L23

calling host call 18 (the fetch call), and panicked on decoding the returned value

WARN: Fetch host call for protocol parameters (kind=0) has been implemented according to GP v0.6.7

I've checked that my kind 0 is matched with GP-0.6.7 (encoded_params_length=136), so that I believe there could be some problems in the embedded test service in the test vectors, that I need the source of jam-pvm-common to verify if the parameters used in the test-service is the same as I defined in our implementation

@davxy
Copy link
Owner Author

davxy commented Jun 20, 2025

WARN program: Panic message: panicked at crates/jam-pvm-common/src/host_calls.rs:209:46:
host call returns correct type; qed: Error

you mean that the program embedded with the vector panics?

yes, this logging message is emitted by the host call 100, that there could be a logic in

https://github.com/davxy/jam-test-vectors/blob/v0.6.6-rc0/stf/accumulate/test-service/src/lib.rs#L23

calling host call 18 (the fetch call), and panicked on decoding the returned value

WARN: Fetch host call for protocol parameters (kind=0) has been implemented according to GP v0.6.7

I've checked that my kind 0 is matched with GP-0.6.7 (length=136), so that I believe there could be some problems in the embedded test service in the test vectors, that I need the source of jam-pvm-common to verify if the parameters used in the test-service is the same as I defined in our implementation

Mmm... that test service sources are outdated and not the ones we're using to build the test service embedded within the test vectors. Are you using it to rebuild the binary or something?

@clearloop
Copy link

clearloop commented Jun 20, 2025

Mmm... that test service sources are outdated and not the ones we're using to build the test service embedded within the test vectors. Are you using it to rebuild the binary or something?

Hmm, no, I don't use or modify it any more as I could pass all provided accumulate tests without modifications weeks ago, the panic happens in the embedded test service in the test vectors in this PR, so using it as a template, I was trying to describe the root case of the panic and why I need the source of jam-pvm-common (I want to verify how the test-service embedded in the test vectors calling the fetch host call, what type it expects, why my defined type triggers panic, etc.)

@davxy
Copy link
Owner Author

davxy commented Jun 20, 2025

why I need the source of jam-pvm-common (I want to verify how the test-service embedded in the test vectors calling the fetch host call, what type it expects, why my defined type triggers panic, etc.)

Fair enough, I'll double check our side. I think our new crates need to be published. Unfortunately that part is not under my jurisdiction :-D

@danicuki
Copy link

danicuki commented Jun 20, 2025

In particular, I believe the vectors the ↕xo position is not correct (should be in the end of the encoding according to 0.6.6

@danicuki wdym here? There is no ↕xo in new operators encoding. has been replaced by |o|

GP 0.6.6
Screenshot 2025-06-20 at 11 45 29

GP 0.6.5
Screenshot 2025-06-20 at 11 46 01

I was referring to formula C.29 - But even on formula B.9, here the traces are still passing when I use 0.6.5 formulas (using ↕o instead of |o|) on formula B.9 - if I change to |o| the traces fail.

The accumulate vectors are still all failing here when comparing storage items

Just to be clear:

  • traces pass using 0.6.5 formulas related to operands
  • accumulate vectors fail on any configuration

@davxy
Copy link
Owner Author

davxy commented Jun 20, 2025

@danicuki I discovered that we were using an outdated version of C.29 (the encoding for accumulation operators). Thanks for pointing that out.

Regarding B.9, I can confirm that we are using the v0.6.6 encoding, i.e.:

Screenshot_2025-06-20_15-18-13

NOTE: I've only updated the STF accumulate vectors for now. I'll update the traces once we’re aligned on the STF.

@clearloop perhaps this was the cause of your issuel?

@jaymansfield
Copy link

Tried the latest and running into the same panic issue with the accumulate vectors as everyone else. My accumulate arguments, operands and fetch host call all seem to match the 0.6.6 spec.

@davxy
Copy link
Owner Author

davxy commented Jun 20, 2025

@clearloop @danicuki @jaymansfield alright, lets start simple. Does the panic happens in https://github.com/davxy/jam-test-vectors/blob/master/stf/accumulate/tiny/process_one_immediate_report-1.json ?

My accumulate arguments, operands and fetch host call all seem to match the 0.6.6 spec.

Edit: just in case... we are using fetch host call from 0.6.7. In particular, the variant with kind = 0 (ie. w_10 = 0) returns different stuff and we are using that stuff!

@jaymansfield
Copy link

@clearloop @danicuki @jaymansfield alright, lets start simple. Does the panic happens in https://github.com/davxy/jam-test-vectors/blob/master/stf/accumulate/tiny/process_one_immediate_report-1.json ?

My accumulate arguments, operands and fetch host call all seem to match the 0.6.6 spec.

Edit: just in case... we are using fetch host call from 0.6.7. In particular, the variant with kind = 0 (ie. w_10 = 0) returns different stuff and we are using that stuff!

Yes happens with process_one_immediate_report-1 for me at least.

Fetch with w_10=0 in 0.6.7 compared to 0.6.6 is just an updated Wb value right and same encoding? I've tried looking it over a few times and can't see any other difference.

@davxy
Copy link
Owner Author

davxy commented Jun 20, 2025

my bad! I just noticed that we're leveraging an unreleased change. If you want to process these accumulate vectors you need to apply this:

gavofyork/graypaper#414

I’ll update the README soon with a big, fat warning. Sorry if I made you lose time chasing this down 😅

@danicuki
Copy link

@davxy I understand you are doing your best here, but this "mixed" versions vectors is driving us crazy. I tried hard to match the accumulate vectors, using the fetch host call from all 0.6.6, 0.6.7 and even 0.6.8, unsuccessfully. This mixing with other 0.6.6 parts is hard to maintain and even know where we broke the system. I feel randomly trying a mix of different formulas in different GP versions to match the vectors.
My suggestion would be make everything 0.6.6 in this branch or give up of it and go straight to 0.6.8

@davxy
Copy link
Owner Author

davxy commented Jun 21, 2025

Hey @danicuki

using the fetch host call from all 0.6.6, 0.6.7 and even 0.6.8

The fetch we're using is the one from the linked PR, that should be the only deviation in question.

Please also note that this also impacts the PolkaJam nightly node and we want to maintain nightly node coherent with the vectors.

Unfortunately we included that PR and is there to stay as we're already using the introduced K (maximum number of tickets) and N (number of tickets per validator). I don't want to maintain a separate branch of our codebase for vectors delivery :-D

Lets also wait for the other teams feedback

@jaymansfield
Copy link

my bad! I just noticed that we're leveraging an unreleased change. If you want to process these accumulate vectors you need to apply this:

gavofyork/graypaper#414

I’ll update the README soon with a big, fat warning. Sorry if I made you lose time chasing this down 😅

Added in this change and still no luck on my side, same panic still. Maybe a trace for one of them would help us see what is expected to narrow down the difference?

@davxy
Copy link
Owner Author

davxy commented Jun 21, 2025

Also note, in gavofyork/graypaper#414 I see that fetch hostcall id is 1.
We're still using 18 as per GP 0.6.6

the change we introduced only concerns the values returned for w_10=0:

Screenshot_2025-06-21-21-09-27-15_3aea4af51f236e4932235fdada7d1643

I'll share https://github.com/davxy/jam-test-vectors/blob/master/stf/accumulate/tiny/process_one_immediate_report-1.json PVM trace on monday if it can help

@danicuki
Copy link

danicuki commented Jun 21, 2025

Thanks @davxy - will try to figure out whats going on after you publish the PVM trace. One doubt: what value for WB = are you using? 0.6.6 (12*2^20) or 0.6.7 13,794,30?

@clearloop
Copy link

clearloop commented Jun 24, 2025

Oh, I thought that had already been addressed, sorry. I'll share our pvm trace tomorrow. It's important that we're 100% aligned.

Thanks! we got a trace from @danicuki @jamixir in channels and encountered a problem that we handled the return status of storage write incorrectly, now we can pass all accumulate tests at 3cd379a

would like to share our trace of process_one_immediate_report_1.log if helps others

@davxy
Copy link
Owner Author

davxy commented Jun 24, 2025

Nice!

I've regenerated the stf/accumulate and also aligned traces vectors with the latest nightly we have...

Could you guys retry run stf/accumulate and also the now traces
Once we're aligned across at least 3+ teams, I'll go ahead and merge this.
Then start delivering the more complex and fun traces I already have ready 😄

Edit: It's important that we're aligned on everything before the merge - including gas consumption.
So there is any discrepancy, please report

@clearloop
Copy link

clearloop commented Jun 24, 2025

The problem

 program: Panic message: panicked at crates/jam-pvm-common/src/host_calls.rs:209:46:
    host call returns correct type; qed: Error    

is back in traces(reports_l0 + reports_l1) tests, hmm, before that, we reached two fetch(0) calls, if the parameters not update to date in the generator of report traces ?


@jaymansfield Your analysis made it easy to spot the issue. Thank you! We had mistakenly encoded R and H as u32 instead of u16. Could you give the vectors another try?

NOTE: I only regenerated accumulate STF vectors. Once we're aligned, I'll regenerate the traces as well

I can confirm use u32 back for R and H can pass the type checks of fetch(0), but on fetch(14), we got the type mismatch problem again, so basically, the Parameters (maybe + AccumulateItem) used in traces is not the same in accumulate

@davxy
Copy link
Owner Author

davxy commented Jun 24, 2025

@davxy
Copy link
Owner Author

davxy commented Jun 24, 2025

I can confirm use u32 back for R and H can pass the type checks of fetch(0), but on fetch(14), we got the type mismatch problem again, so basically, the Parameters (maybe + AccumulateItem) used in traces is not the same in accumulate

@clearloop
For sure the services are different! But AFAICT we call exactly the same fetch(14) implementation (the one in the new jam-pvm-common). I'm going to double check BTW...
Can you confirm that you are passing the new stf/accumulate? I.e. the ones shipped with 3cd379a

@clearloop
Copy link

@clearloop For sure the services are different! But AFAICT we call exactly the same fetch(14) implementation (the one in the new jam-pvm-common). I'm going to double check BTW... Can you confirm that you are passing the new stf/accumulate? I.e. the ones shipped with 3cd379a

sorry, my bad, my local test-vectors were not updated successfully, we can still pass accumulate at 3cd379a, and the type check panics are gone in traces

now we meet some problems related with storage IO in traces, could be bugs on our side, will dig into them

@arjanz
Copy link

arjanz commented Jun 24, 2025

PyJAMaz is passing accumulate vectors and traces up to reports-l0, we encounter two state-root mismatches in reports-l1, which could be bugs on our end. We are looking into it.

@dakk
Copy link

dakk commented Jun 24, 2025

Jampy is passing all traces tests expect for l1 >= 51, and expect for the fact that in some tests (also in accumulate) I'm consuming 2 more gas units; I'm trying to figure out why. I will debug l1 >= 51 after resolving the gas issue.

==== EDIT
All working expect for some traces/l1, and traces/l1 > 51

@arjanz
Copy link

arjanz commented Jun 24, 2025

We suspect there is an incorrect footprint items and bytes in the post state for service account 0 in trace 51. The host-call solicit is being called for preimage 0e5751c026e543b2e8ab2eb06099daa1d1e5df47778f7787faab45cdf12fe3a8 with length 0, and the value of preimage availability is being updated from [6, 9] to [6, 9, 51].

In our opinion, this should not change the footprint items from 48 to 50 and bytes from 161453 to 161534, as the preimage is not yet deleted.

@arjanz
Copy link

arjanz commented Jun 24, 2025

In trace 93 we get a panic after the second forget host-call. This result of the forget is in our case HUH and that leads to a panic with following log message:
Screenshot 2025-06-24 at 14 52 32

We suspect the condition y < t - D is not applied in the trace, also it seems related to our findings in trace 51 (see previous comment). Just to be sure, in our settings D=6.

Screenshot 2025-06-24 at 14 50 00

@davxy
Copy link
Owner Author

davxy commented Jun 24, 2025

@arjanz Why in your setting D is 6?

@arjanz
Copy link

arjanz commented Jun 24, 2025

@arjanz Why in your setting D is 6?

This is a good question. Apparently we assumed D=6 was part of the tiny settings, but cannot really find the source.

In polkajam when we look at the parameters RPC call, we see 'min_turnaround_period'=28800. After changing D to 28800, the result is the same though and both issues remain.

@davxy
Copy link
Owner Author

davxy commented Jun 24, 2025

In polkajam when we look at the parameters RPC call, we see 'min_turnaround_period'=28800. After changing D to 28800, the result is the same though and both issues remain.

According to GP, with the full config you have D = 19,200, which corresponds to 32 × full_epoch_slots (where full E = 600).
So may make sense to apply the same logic: 32 × tiny_epoch_slots (tiny E = 12), which gives D = 384 for the tiny config.

However 384 seems too much to properly test the forget hostcall.

By the way, those reports_l1 vectors don’t look quite right. Don’t spend more time on them for now - I need to double-check.

@davxy
Copy link
Owner Author

davxy commented Jun 24, 2025

I updated the reports-l1 to don't panic so easily. Please give them another shot.

I also dropped some details about the preimages expunge delay in the readme: https://github.com/davxy/jam-test-vectors/blob/v0.6.6-rc0/traces/README.md#preimage-expunge-delay

@dakk
Copy link

dakk commented Jun 24, 2025

I updated the reports-l1 to don't panic so easily. Please give them another shot.

I also dropped some details about the preimages expunge delay in the readme: https://github.com/davxy/jam-test-vectors/blob/v0.6.6-rc0/traces/README.md#preimage-expunge-delay

with updated tests and D value, Jampy is also passing all traces
image

@clearloop
Copy link

@spacejamapp now can pass all accumulate and traces at ae735b4

@jaymansfield
Copy link

All passing now for JavaJAM as well

@danicuki
Copy link

We at @jamixir confirm passing all traces

@arjanz
Copy link

arjanz commented Jun 24, 2025

I updated the reports-l1 to don't panic so easily. Please give them another shot.

I also dropped some details about the preimages expunge delay in the readme: https://github.com/davxy/jam-test-vectors/blob/v0.6.6-rc0/traces/README.md#preimage-expunge-delay

All good, passing all traces now!

@davxy
Copy link
Owner Author

davxy commented Jun 24, 2025

Very cool. So we can proceed with something more ambitious :-D

@davxy davxy merged commit 449f0bb into master Jun 24, 2025
12 checks passed
@davxy davxy deleted the v0.6.6-rc0 branch June 24, 2025 15:31
@jimjbrettj
Copy link

in accumulate::process_one_immediate_report_1 our implementation returns panic after the first ECALLI (Fetch), the reason is that w7 is 1, and ram at address 1 is not writable, so the Fetch host-call return panic according to 0.6.6. Not sure anyone encountered the same problem (since we upgraded from 0.6.4 to 0.6.6 and missed all 0.6.5 tests, it's likely we missed sth), and a pvm trace will be helpful to debug

@qiweiii did you ever get an answer on this? I am wondering about this myself

@qiweiii
Copy link

qiweiii commented Jun 27, 2025

in accumulate::process_one_immediate_report_1 our implementation returns panic after the first ECALLI (Fetch), the reason is that w7 is 1, and ram at address 1 is not writable, so the Fetch host-call return panic according to 0.6.6. Not sure anyone encountered the same problem (since we upgraded from 0.6.4 to 0.6.6 and missed all 0.6.5 tests, it's likely we missed sth), and a pvm trace will be helpful to debug

@qiweiii did you ever get an answer on this? I am wondering about this myself

Yes, since the write bytes length is 0 here, we skipped this write. It makes sense since it's not writing anything.
I'm still not sure if it should panic first or just skip writable check on address in this case, but to continue with the rest of tests I skipped it for now

@ascrivener
Copy link

Jamzilla passes as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.6.6

10 participants