WIP Introduce support for pinning specific analysis threads to a logical CPU core by methodmissing · Pull Request #137 · mozilla-services/hindsight

methodmissing · 2017-11-28T23:34:30Z

References https://eli.thegreenplace.net/2016/c11-threads-affinity-and-hyperthreading/ with hyperthread behaviour seen in our environment and http://man7.org/linux/man-pages/man2/sched_setaffinity.2.html

to flesh out

currently logical CPU affinity == the assigned analysis thread id (0, 1, 2 etc.) and a cpu_affinity = true directive in the plugin config would pin to that. Maybe that's not a great default
What config would look like for a "fatter" plugin we want a dedicated core without switching for could like:

thread = 0 -- explicitly define this here to schedule on analysis thread 0
cpu_affinity = true -- always pin to logical core one as well

That way auxiliary / less important plugins can still migrate between remaining cores, but we guarantee resources for the most important ones.

@Shopify/moneyscale

trink · 2017-11-29T16:47:15Z

I have a number of comments/questions/concerns.

What specific performance bottleneck are you trying to address?
- Can it not be solved by:
  - redistributing work with the current configuration settings?
  - optimizing the existing plugin?
I would like to see some test results that show some throughput gains for a specific Hindsight use case
- How much can it hurt performance as it won't be the only thing running on that thread and now it is pinned?
- Does it add any value when several large plugins are assigned to a thread (as each will wipe out the others cached data)
Input and output plugins usually have heavy load and only run a single plugin why wouldn't they be considered for thread affinity?

…CPU core

methodmissing · 2017-12-11T01:58:39Z

Apologies for not circling back earlier. Tracked the performance issues we've seen down to IO throttling and crufty LPEG in an input plugin which stalled the rest of the pipeline for us significantly.

redistributing work with the current configuration settings - we use multiple plugin sources (rdkafka) from different locations and tried to work around the issue initially by pinning specific topics to their own analysis plugins (spinning up more of then via config files, route via inject_message's Type), but still couldn't utilise more CPU
optimizing the existing plugin as mentioned above :-)
Input and output plugins usually have heavy load and only run a single plugin why wouldn't they be considered for thread affinity? I agree, but haven't tested even though our inputs are pretty lopsided to begin with in terms of volume and at least 2 could benefit significantly from affinity as they always work off a very high volume of data in relation to the others. For such strong cases of knowing that ahead of time I think affinity makes sense if there's CPU to spare.

Two things that would have helped with debugging (happy to PR if any makes sense):

Support for the librdkafka stats_cb ( https://github.com/edenhill/librdkafka/wiki/Statistics ) as a configuration option (piped through to sandbox logger as with the debug rdkafka directive)
Currently plugins.tsv include runtime overhead of process_message, the message matcher and timer_event, but a column for Decoding Avg (ns) and Decoding SD (ns) for input plugins could be very useful for identifying bottlenecks through configuration or less than optimal code in the decode callbacks of input plugins
Having runtime stats from all 3 tiers would make it easier to reason about data flow through the system

Thoughts?

trink · 2017-12-11T04:45:59Z

Thanks for the update.

I/O throttling will only be caused by slow analysis or output plugins (the plugins and utilization tsv should help here).
LPEG grammars can be tested and bench marked here: http://lpeg.trink.com/

Debugging comments

stats_cb - there will be some Kafka updates coming to enhance debugging (the first sprint in January). I will include it with those (issue created and referenced above).
input plugin statistics have always been as issue as most of their execution is opaque to Hindsight. To profile work within the Lua code the inputs/decoders would have to contain their own instrumentation. This can currently be done but not in a standardize way (typically during debugging I use print() and the high resolution time stamp of the message output for profiling). The best solution may be to standardize the log message/output and automatically have tooling report on that data (this would allow numerous profile points e.g. parse, validation, transformation time etc even in chained decoders/sub-decoders). I would also allow custom profiling of any type of plugin.

methodmissing · 2017-12-11T12:14:40Z

Sounds great - with IO throttling I'm referring to platform throttling, not backpressure from hindsight :-)

trink · 2018-01-04T22:16:03Z

Having the cpu_affinity setting in the plugin configuration is misleading as it is not a per analysis plugin setting. Also, since a known/expected thread workload is being pinned to a CPU I would recommend excluding those threads from accepting dynamically loaded plugins.

This would produce a hindsight.cfg something like this (the plugin cfg would simply have an explicit thread specification):

-- assume 4 actual CPUs
analysis_threads = 64
analysis_cpu_affinity = {[0] = 0,  [62] = 3, [63] =3} 
-- thread 0 on CPU0, threads 62, 63 on CPU3, analysis threads 1-61 will accept dynamically loaded
-- plugins and will run on any available CPU

For the input/output plugin configuration we could add a cpu = N option (when specified it assigns CPU affinity).

I realize this is a lot more work, please feel free to open a feature request issue.

methodmissing changed the title ~~Introduce support for pinning specific analysis threads to a logical CPU core~~ WIP Introduce support for pinning specific analysis threads to a logical CPU core Nov 28, 2017

methodmissing force-pushed the thread-affinity branch 11 times, most recently from 51a336a to 699df4b Compare November 29, 2017 16:41

methodmissing force-pushed the thread-affinity branch from 699df4b to 3c3b9de Compare November 29, 2017 17:37

Introduce support for pinning specific analysis threads to a logical …

99d6823

…CPU core

methodmissing force-pushed the thread-affinity branch from 0f93973 to 99d6823 Compare November 30, 2017 00:18

trink mentioned this pull request Dec 11, 2017

Support for the librdkafka stats_cb mozilla-services/lua_sandbox_extensions#192

Closed

trink mentioned this pull request Jan 4, 2018

Create a utility to profile plugin execution timers #144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

WIP Introduce support for pinning specific analysis threads to a logical CPU core#137

WIP Introduce support for pinning specific analysis threads to a logical CPU core#137
methodmissing wants to merge 1 commit intomozilla-services:mainfrom
Shopify:thread-affinity

methodmissing commented Nov 28, 2017 •

edited

Loading

Uh oh!

trink commented Nov 29, 2017

Uh oh!

methodmissing commented Dec 11, 2017

Uh oh!

trink commented Dec 11, 2017

Uh oh!

methodmissing commented Dec 11, 2017

Uh oh!

trink commented Jan 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

methodmissing commented Nov 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

to flesh out

Uh oh!

trink commented Nov 29, 2017

Uh oh!

methodmissing commented Dec 11, 2017

Uh oh!

trink commented Dec 11, 2017

Uh oh!

methodmissing commented Dec 11, 2017

Uh oh!

trink commented Jan 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

methodmissing commented Nov 28, 2017 •

edited

Loading