-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Labels
Area: xDSIncludes everything xDS related, including LB policies used with xDS.Includes everything xDS related, including LB policies used with xDS.Type: Bug
Description
Here is a sequence of events that can lead to the ADS stream level flow control blocking forever.
- T-1: Listener resource is subscribed to and the request has been sent out
- T0:
recv()receives the Listener resource from the wire:resources, url, version, nonce, err := s.recvMessage(stream) - T1:
recv()sets thependingbit of the flow control totrueand invokes the response handler and passes it theonDonecallback:resourceNames, nackErr = s.eventHandler.onResponse(resp, s.fc.onDone) - T2: Response handler runs, and as part of handling the update, it subscribes to an RouteConfiguration resource. This results in
subscribe()being called on the ADS stream, which queues the request:s.requestCh.Put(typ) - T3: Response handler invokes the
onDonecallback to release flow control. This writes to thereadyChto unblock goroutines waiting for flow control. It hasn't yet set thependingbit tofalse:case fc.readyCh <- struct{}{}: - T4: Meanwhile, the
send()goroutine gets to run and processes the request for the RouteConfiguration. It callssendNew()to send this request out:if err := s.sendNew(stream, typ); err != nil { - T5:
sendNew()checks thependingbit of the flow control. This is not yet set tofalseby theonDonecallback. It will try to buffer this request, but before that happens, it loses CPU:if s.fc.pending.Load() { - T6: Meanwhile
recv()is in the next iteration of theforloop and has gotten unblocked on the call tofc.wait():if !s.fc.wait(ctx) { - T7:
recv()attempts to send out any buffered requests by callingsendBuffered, but that method does not find any buffered requests, becausesendNew()hasn't yet written to thebufferedRequestschannel.func (s *adsStreamImpl) sendBuffered(stream clients.Stream) error { - T8:
sendNew()now writes to thebufferedRequestschannel. - Anytime after T5: the
onDonecallback sets thependingbit tofalse.
But this request (buffered at T8) never gets sent out, because recv() is blocked waiting for some response from the management server, but no response is expected because the ADS stream has not requested any new resource.This will eventually lead to the RDS resource watch timer expiring, and being reported to the watcher as a resource-not-found error.
arjan-bal
Metadata
Metadata
Assignees
Labels
Area: xDSIncludes everything xDS related, including LB policies used with xDS.Includes everything xDS related, including LB policies used with xDS.Type: Bug