CSDS: add new config status ACKED#13121
Conversation
Signed-off-by: Lidi Zheng <lidiz@google.com> Signed-off-by: Lidi Zheng <lidiz@google.com>
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
| // client has received the config but replied with NACK. | ||
| ERROR = 4; | ||
|
|
||
| // The client has received the config and repied with ACK. |
There was a problem hiding this comment.
I'm almost thinking we want separate status enum messages or very explicit field naming to help make clear which one (client vs. server status) we're talking about.
There was a problem hiding this comment.
Yeah, seeing these comments, I guess I do think that having a separate set of enums would be clearer. We can call them something like CLIENT_REQUESTED, CLIENT_ACKED, and CLIENT_NACKED.
We can either add these as a new values in this existing enum, or we can add a whole separate enum field for use in clients.
There was a problem hiding this comment.
Adding a new proto enum struct named ClientSideStatusConfig. It's a bit verbose, but I think it needs to have a very clear naming.
markdroth
left a comment
There was a problem hiding this comment.
Thanks for doing this, Lidi!
| // client has received the config but replied with NACK. | ||
| ERROR = 4; | ||
|
|
||
| // The client has received the config and repied with ACK. |
There was a problem hiding this comment.
Yeah, seeing these comments, I guess I do think that having a separate set of enums would be clearer. We can call them something like CLIENT_REQUESTED, CLIENT_ACKED, and CLIENT_NACKED.
We can either add these as a new values in this existing enum, or we can add a whole separate enum field for use in clients.
|
Let's split this out as a separate enum. I feel it's going to be necessary for any CSDS client to have some logic to make sense of whether they are interpreting a management or client status here. |
Signed-off-by: Lidi Zheng <lidiz@google.com>
lidizheng
left a comment
There was a problem hiding this comment.
@markdroth PTALA.
| // client has received the config but replied with NACK. | ||
| ERROR = 4; | ||
|
|
||
| // The client has received the config and repied with ACK. |
There was a problem hiding this comment.
Adding a new proto enum struct named ClientSideStatusConfig. It's a bit verbose, but I think it needs to have a very clear naming.
| // Client received the config and replied with ACK. | ||
| CLIENT_ACKED = 2; | ||
|
|
||
| // Client received the config and replied with NACK. |
There was a problem hiding this comment.
Let's explicitly document that the resource content that is returned in this case will be the most recent version accepted, if any, not the version that was NACKed.
There was a problem hiding this comment.
Documented the expected config dump behavior here and in the PerXdsConfig message.
| @@ -73,6 +89,7 @@ message PerXdsConfig { | |||
| "envoy.service.status.v2.PerXdsConfig"; | |||
|
|
|||
| ConfigStatus status = 1; | |||
There was a problem hiding this comment.
It's probably worth adding a comment here that status will be populated by xDS servers but client_side_config_status will be populated by xDS clients.
There was a problem hiding this comment.
(It's too bad that we can't use a oneof here, but that would be a breaking change for the go proto API.)
There was a problem hiding this comment.
Suggest adding (udpa.annotations.field_migrate).oneof_promotion annotations.
There was a problem hiding this comment.
Annotation added, TIL.
There was a problem hiding this comment.
@htuch Can you advice what should we use in v4alpha? The format suggests that the migration annotation is expected to be expanded into real oneof in v4alpha. But that will break backward compatibility.
There was a problem hiding this comment.
That's fine -- v4alpha will be a new major version (if it ever actually happens), so breaking changes are okay. This is actually the whole point of the annotation, to make sure this change happens when the next major version bump occurs.
There was a problem hiding this comment.
Oh, but it looks like this didn't actually work for some reason. The annotation was carried forward to v4alpha instead of being turned into a oneof. Maybe the annotation has the wrong syntax or something?
Signed-off-by: Lidi Zheng <lidiz@google.com>
lidizheng
left a comment
There was a problem hiding this comment.
@markdroth PTALAA.
| // Client received the config and replied with ACK. | ||
| CLIENT_ACKED = 2; | ||
|
|
||
| // Client received the config and replied with NACK. |
There was a problem hiding this comment.
Documented the expected config dump behavior here and in the PerXdsConfig message.
| @@ -73,6 +89,7 @@ message PerXdsConfig { | |||
| "envoy.service.status.v2.PerXdsConfig"; | |||
|
|
|||
| ConfigStatus status = 1; | |||
markdroth
left a comment
There was a problem hiding this comment.
Please also change the comment on line 23-26 to indicate that we are now going to use this on the xDS client side.
Thanks!
| } | ||
|
|
||
| // Config status from a client-side view. | ||
| enum ClientSideConfigStatus { |
There was a problem hiding this comment.
Suggest calling this ClientConfigStatus.
There was a problem hiding this comment.
Changed to ClientConfigStatus.
In this proto definition, there is the ClientConfig message. It might be a bit confusing to readers that why ClientConfigStatus is not the status for the ClientConfig message, but the client-side view of config status. On the other hand, the ClientSideConfigStatus is not much better.
Better naming is still welcomed. Picking ClientConfigStatus since it is simpler.
| @@ -73,6 +89,7 @@ message PerXdsConfig { | |||
| "envoy.service.status.v2.PerXdsConfig"; | |||
|
|
|||
| ConfigStatus status = 1; | |||
There was a problem hiding this comment.
(It's too bad that we can't use a oneof here, but that would be a breaking change for the go proto API.)
Signed-off-by: Lidi Zheng <lidiz@google.com>
lidizheng
left a comment
There was a problem hiding this comment.
@markdroth Thanks for the quick review! PTALAAA.
| } | ||
|
|
||
| // Config status from a client-side view. | ||
| enum ClientSideConfigStatus { |
There was a problem hiding this comment.
Changed to ClientConfigStatus.
In this proto definition, there is the ClientConfig message. It might be a bit confusing to readers that why ClientConfigStatus is not the status for the ClientConfig message, but the client-side view of config status. On the other hand, the ClientSideConfigStatus is not much better.
Better naming is still welcomed. Picking ClientConfigStatus since it is simpler.
|
Please also change the comment on line 23-26 to indicate that we are now going to use this on the xDS client side. Thanks! |
Signed-off-by: Lidi Zheng <lidiz@google.com>
Signed-off-by: Lidi Zheng <lidiz@google.com>
|
(Whoops -- I didn't realize that my approving the PR would satisfy the API approval requirement; I thought I had to say "/lgtm api". But since one of the maintainers has to review before merging anyway, that's probably okay.) |
| @@ -73,6 +89,7 @@ message PerXdsConfig { | |||
| "envoy.service.status.v2.PerXdsConfig"; | |||
|
|
|||
| ConfigStatus status = 1; | |||
There was a problem hiding this comment.
Suggest adding (udpa.annotations.field_migrate).oneof_promotion annotations.
|
|
||
| // Client received the config and replied with NACK. Notably, the attached | ||
| // config dump is not the NACKed version, but the most recent accepted one. If | ||
| // no config is accepted yet, the attached config dump will be empty. |
There was a problem hiding this comment.
Can we have a similar statement added about control plane behavior for configs with a negative status, i.e. "this is always the latest successful config?" Or maybe that's not even desired, we might want the latest successful and unsuccessful config? Would that also make sense for client? CC @fuqianggao
There was a problem hiding this comment.
This behavior wasn't clearly described in this proto definition before, or in other documentation. Could it be a control-plane specific detail?
Hi @fuqianggao, can you help to explain the designed behavior of config dump when client replied with NACK?
There was a problem hiding this comment.
For control plane, it will be the latest config that control plane has. If status is NACK, it means the config is sent to client, but rejected.
There was a problem hiding this comment.
Makes sense. I think the actionable thing is for @lidizheng to update the management server wording to reflect this. I'm still wondering in this discussion thread, is there value in having both active and failed configs reported?
There was a problem hiding this comment.
I suspect that for both xDS servers and xDS clients, there would be additional overhead in storing old versions of configs. In general, both sides will store only the latest version that they care about, so they don't have anything else to report here.
There was a problem hiding this comment.
And if all clients implement CSDS, the CLI can get the active config from the clients, and the most up-to-date config from the management server (if they are different).
There was a problem hiding this comment.
I have updated the comment for ConfigStatus.ERROR to note that the management server will dump the latest config in case of client replying NACK.
The behavior described by @menghanl is implemented by istioctl's proxy-status command.
There was a problem hiding this comment.
I like the idea of being able to query both client and server CSDS and effectively determine the failing and active config. @lidizheng can you please file an issue for Envoy to add support for CSDS? You don't have to do the implementation, but we should track this as an open feature request that we can share with the community.
There was a problem hiding this comment.
Filed as an issue: #13181. For Envoy, it overlaps with the admin config dump feature. Today, tools like istioctl is fetching xDS config from server via CSDS, and from client via admin config dump (HTTP).
Signed-off-by: Lidi Zheng <lidiz@google.com>
Signed-off-by: Lidi Zheng <lidiz@google.com>
| // the CSDS server is an xDS server. No matter what the client config status | ||
| // is, xDS clients should always dump the most recent accepted xDS config. | ||
| ClientConfigStatus client_status = 7; | ||
| ClientConfigStatus client_status = 7 |
There was a problem hiding this comment.
I think you might need to have a oneof existing. I suggest moving this field lexically before the existing status and wrapping it in a oneof status_config. That should allow the v4alpha generation to work.
There was a problem hiding this comment.
Lidi, I think you still need to address this comment.
There was a problem hiding this comment.
According to the style in the code base, the oneof_promotion annotation seems appear in pairs (example 1, example 2). Splitting the oneof options into different layers might break the consistency, and slightly weaken the semantic intention of the upcoming migration.
So, in my mind, keeping oneof_promotion in v3 and real oneof in v4alpha is the better way.
There was a problem hiding this comment.
Yes, that's what I meant.
The v4alpha file looks right now. The last time I looked, I thought it was still wrong, but maybe I was looking at an old snapshot.
Anyway, this looks great. Thanks!
htuch
left a comment
There was a problem hiding this comment.
LGTM modulo the thread on clarifying management server behavior on stale/nacked config. I'd like there to be consistency, and ideally we should have a plan for how to support reporting both last known good and bad config (at least on management server side), since both are useful, CC @fuqianggao
Signed-off-by: Lidi Zheng <lidiz@google.com>
Signed-off-by: Lidi Zheng <lidiz@google.com>
|
@htuch PTALAA. All tests are passing, and the mentioned behavior is documented. |
Signed-off-by: Lidi Zheng lidiz@google.com
Signed-off-by: Lidi Zheng lidiz@google.com
Commit Message: Add new config status for CSDS: ACKED.
Additional Description: As the CSDS service definition described, it has the potential to be used to expose xDS config from a client or proxy. gRPC wants to utilize this service to improve its debuggability. But the
ConfigStatusis designed from the control plane point of view. Especially, the client cannot predict if there is new config on its way, so it can't accurately claim any xDS config status as SYNCED. We need another config status to indicate the status that the client received the status and sent out ACK. The actual wording is up to debate.Risk Level: Low
Testing: Not needed
Docs Changes: Inline with proto definition
Release Notes: New config status for CSDS: ACKED.
CC @fuqianggao @markdroth @menghanl