-
Notifications
You must be signed in to change notification settings - Fork 820
Description
Describe the bug
Background
The scenario is:
- using a Feign client that is configured by name (e.g.
@FeignClient(name = "downstream-service")) rather than by URL - using Spring-Cloud Loadbalancer with retry enabled (i.e. we're using the
RetryableFeignBlockingLoadBalancerClient) - the downstream service instance(s) are taking longer to respond than the configured Feign client timeout used by the client application.
Correct behaviour
- Occurs when
spring.cloud.loadbalancer.retry.max-retries-on-same-service-instanceis set to 0 - The client application retries according to the configuration of
max-retries-on-next-service-instance, and then at the end logs an error like this:
14:28:45.068 [http-nio-1234-exec-2] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet] - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is feign.RetryableException: Read timed out executing GET http://downstream-service/api/something] with root cause
java.net.SocketTimeoutException: Read timed out
at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:283)
...
at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:529)
at feign.Client$Default.convertResponse(Client.java:110)
at feign.Client$Default.execute(Client.java:106)
at org.springframework.cloud.openfeign.loadbalancer.LoadBalancerUtils.executeWithLoadBalancerLifecycleProcessing(LoadBalancerUtils.java:57)
at org.springframework.cloud.openfeign.loadbalancer.RetryableFeignBlockingLoadBalancerClient.lambda$execute$2(RetryableFeignBlockingLoadBalancerClient.java:168)
at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:329)
at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:225)
at org.springframework.cloud.openfeign.loadbalancer.RetryableFeignBlockingLoadBalancerClient.execute(RetryableFeignBlockingLoadBalancerClient.java:114)
at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:121)
at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:91)
at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:100)
at jdk.proxy2/jdk.proxy2.$Proxy107.getMeSomething(Unknown Source)
at com.oliverlockwood.retrybug.ClientController.hitTheDownstreamService(ClientController.java:20)
...
which, in my opinion, is perfectly reasonable.
Incorrect behaviour (i.e. the bug)
- Occurs when
spring.cloud.loadbalancer.retry.max-retries-on-same-service-instanceis set to 1 or higher - The client application retries according to the configuration of
max-retries-on-same-service-instance, andmax-retries-on-next-service-instance, and then at the end logs an error like this:
4:30:46.549 [http-nio-1234-exec-2] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet] - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is feign.RetryableException: downstream-service executing GET http://downstream-service/api/something] with root cause
java.net.UnknownHostException: downstream-service
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:567)
...
at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:529)
at feign.Client$Default.convertResponse(Client.java:110)
at feign.Client$Default.execute(Client.java:106)
at org.springframework.cloud.openfeign.loadbalancer.LoadBalancerUtils.executeWithLoadBalancerLifecycleProcessing(LoadBalancerUtils.java:57)
at org.springframework.cloud.openfeign.loadbalancer.RetryableFeignBlockingLoadBalancerClient.lambda$execute$2(RetryableFeignBlockingLoadBalancerClient.java:168)
at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:329)
at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:225)
at org.springframework.cloud.openfeign.loadbalancer.RetryableFeignBlockingLoadBalancerClient.execute(RetryableFeignBlockingLoadBalancerClient.java:114)
at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:121)
at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:91)
at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:100)
at jdk.proxy2/jdk.proxy2.$Proxy107.getMeSomething(Unknown Source)
at com.oliverlockwood.retrybug.ClientController.hitTheDownstreamService(ClientController.java:20)
...
which, in my opinion, is not acceptable. The actual underlying issue is lost, and in its place there's now a spurious, misleading UnknownHostException, because for some reason the implementation tries to hit the literal URL http://downstream-service/api/something, abandoning the whole concept of service discovery.
Workaround
Always set spring.cloud.loadbalancer.retry.max-retries-on-same-service-instance to 0.
Affected versions
I have explicitly proven that this bug reproduces on the following versions:
- Spring-Boot
3.0.3& Spring-Cloud2022.0.1 - Spring-Boot
2.7.9& Spring-Cloud2021.0.6 - Spring-Boot
2.7.7& Spring-Cloud2021.0.5 - Spring-Boot
2.6.2& Spring-Cloud2021.0.0
However, I suspect this bug has existed ever since the historical migration from Netflix Ribbon Loadbalancer to Spring-Cloud Loadbalancer.
Note: This is effectively a re-raise of #650, which was opened in December 2021, but closed by the maintainers because a detailed reproduction example was not provided at that time.
Sample
This is demonstrated in https://github.com/oliverlockwood/spring-cloud-openfeign-retry-sameserver-bug. Follow the README therein.