Skip to content

track blocked thread event and report statistics in metrics endpoint#1649

Merged
ggrossetie merged 5 commits intoyuzutech:mainfrom
khanguyen88:main
Oct 12, 2023
Merged

track blocked thread event and report statistics in metrics endpoint#1649
ggrossetie merged 5 commits intoyuzutech:mainfrom
khanguyen88:main

Conversation

@khanguyen88
Copy link
Contributor

Vert.x monitors and reports blocked thread. This PR will include that info in the /healthcheck endpoint so we can use that info to restart Kroki when perfomance degrades too much due to many blocked threads.

@ggrossetie
Copy link
Member

I think that's a good idea.
Did you investigate to understand why threads are blocked? It would be better to fix the root cause rather than periodically restart the service.

@khanguyen88
Copy link
Contributor Author

I had quite a few outages due to blocked threads. So far the logs indicated that certain inputs will cause PlantUml/Ditaa to hang or go into infinite loop. I'm looking out for those bad inputs, but in the meantime, I'm adding as the first step to improve reliability of my server.

@ggrossetie
Copy link
Member

Are you using the latest version? We are now using single executable binaries (produced by GraalVM) for both PlantUML and Ditaa so it shouldn't block threads anymore.

@khanguyen88
Copy link
Contributor Author

yes, I'm on 0.22

Comment on lines +27 to +28
private final Map<String, Instant> eventLoopStats = new ConcurrentHashMap<>();
private final Map<String, Instant> workerStats = new ConcurrentHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned since we don't do eviction since is basically a memory leak. We could use something like https://github.com/ben-manes/caffeine to use a time-based eviction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I switched to Caffeine's Cache to store the stats.

I was reluctant to use Caffeine as it was not included in the project and I didn't expect the 2 maps to grow above 100 entries.

Comment on lines +50 to +53
if (blockedThreadChecker != null) {
data.put("blockedWorkerPercentage", blockedThreadChecker.blockedWorkerThreadPercentage());
data.put("blockedEventLoopPercentage", blockedThreadChecker.blockedEventLoopThreadPercentage());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should export data using Prometheus format on /metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use Prometheus on my server but I'm happy to help. Can you create another issue and assign to me? I'll work on it when I have the bandwidth

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've created an issue

@khanguyen88 khanguyen88 requested a review from ggrossetie October 2, 2023 07:43
@khanguyen88
Copy link
Contributor Author

Hi @ggrossetie could you please review and merge this PR?

@ggrossetie ggrossetie changed the title track blocked thread event and report statistics in healthcheck endpoint track blocked thread event and report statistics in metrics endpoint Oct 12, 2023
@ggrossetie ggrossetie merged commit cc0a6d1 into yuzutech:main Oct 12, 2023
@boris-moduscreate
Copy link

@ggrossetie Any eta on a release which includes this? ❤️

@ggrossetie
Copy link
Member

I will try to push a new release before my vacation (mid January)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants