Skip to content

Comments

WIP Introduce support for pinning specific analysis threads to a logical CPU core#137

Open
methodmissing wants to merge 1 commit intomozilla-services:mainfrom
Shopify:thread-affinity
Open

WIP Introduce support for pinning specific analysis threads to a logical CPU core#137
methodmissing wants to merge 1 commit intomozilla-services:mainfrom
Shopify:thread-affinity

Conversation

@methodmissing
Copy link

@methodmissing methodmissing commented Nov 28, 2017

References https://eli.thegreenplace.net/2016/c11-threads-affinity-and-hyperthreading/ with hyperthread behaviour seen in our environment and http://man7.org/linux/man-pages/man2/sched_setaffinity.2.html

to flesh out

  • currently logical CPU affinity == the assigned analysis thread id (0, 1, 2 etc.) and a cpu_affinity = true directive in the plugin config would pin to that. Maybe that's not a great default
  • What config would look like for a "fatter" plugin we want a dedicated core without switching for could like:
thread = 0 -- explicitly define this here to schedule on analysis thread 0
cpu_affinity = true -- always pin to logical core one as well

That way auxiliary / less important plugins can still migrate between remaining cores, but we guarantee resources for the most important ones.

@Shopify/moneyscale

@methodmissing methodmissing changed the title Introduce support for pinning specific analysis threads to a logical CPU core WIP Introduce support for pinning specific analysis threads to a logical CPU core Nov 28, 2017
@methodmissing methodmissing force-pushed the thread-affinity branch 11 times, most recently from 51a336a to 699df4b Compare November 29, 2017 16:41
@trink
Copy link
Contributor

trink commented Nov 29, 2017

I have a number of comments/questions/concerns.

  • What specific performance bottleneck are you trying to address?
    • Can it not be solved by:
      • redistributing work with the current configuration settings?
      • optimizing the existing plugin?
  • I would like to see some test results that show some throughput gains for a specific Hindsight use case
    • How much can it hurt performance as it won't be the only thing running on that thread and now it is pinned?
    • Does it add any value when several large plugins are assigned to a thread (as each will wipe out the others cached data)
  • Input and output plugins usually have heavy load and only run a single plugin why wouldn't they be considered for thread affinity?

@methodmissing
Copy link
Author

Apologies for not circling back earlier. Tracked the performance issues we've seen down to IO throttling and crufty LPEG in an input plugin which stalled the rest of the pipeline for us significantly.

  • redistributing work with the current configuration settings - we use multiple plugin sources (rdkafka) from different locations and tried to work around the issue initially by pinning specific topics to their own analysis plugins (spinning up more of then via config files, route via inject_message's Type), but still couldn't utilise more CPU

  • optimizing the existing plugin as mentioned above :-)

  • Input and output plugins usually have heavy load and only run a single plugin why wouldn't they be considered for thread affinity? I agree, but haven't tested even though our inputs are pretty lopsided to begin with in terms of volume and at least 2 could benefit significantly from affinity as they always work off a very high volume of data in relation to the others. For such strong cases of knowing that ahead of time I think affinity makes sense if there's CPU to spare.

Two things that would have helped with debugging (happy to PR if any makes sense):

  • Support for the librdkafka stats_cb ( https://github.com/edenhill/librdkafka/wiki/Statistics ) as a configuration option (piped through to sandbox logger as with the debug rdkafka directive)
  • Currently plugins.tsv include runtime overhead of process_message, the message matcher and timer_event, but a column for Decoding Avg (ns) and Decoding SD (ns) for input plugins could be very useful for identifying bottlenecks through configuration or less than optimal code in the decode callbacks of input plugins
  • Having runtime stats from all 3 tiers would make it easier to reason about data flow through the system

Thoughts?

@trink
Copy link
Contributor

trink commented Dec 11, 2017

Thanks for the update.

  • I/O throttling will only be caused by slow analysis or output plugins (the plugins and utilization tsv should help here).
  • LPEG grammars can be tested and bench marked here: http://lpeg.trink.com/

Debugging comments

  1. stats_cb - there will be some Kafka updates coming to enhance debugging (the first sprint in January). I will include it with those (issue created and referenced above).
  2. input plugin statistics have always been as issue as most of their execution is opaque to Hindsight. To profile work within the Lua code the inputs/decoders would have to contain their own instrumentation. This can currently be done but not in a standardize way (typically during debugging I use print() and the high resolution time stamp of the message output for profiling). The best solution may be to standardize the log message/output and automatically have tooling report on that data (this would allow numerous profile points e.g. parse, validation, transformation time etc even in chained decoders/sub-decoders). I would also allow custom profiling of any type of plugin.

@methodmissing
Copy link
Author

Sounds great - with IO throttling I'm referring to platform throttling, not backpressure from hindsight :-)

@trink
Copy link
Contributor

trink commented Jan 4, 2018

Having the cpu_affinity setting in the plugin configuration is misleading as it is not a per analysis plugin setting. Also, since a known/expected thread workload is being pinned to a CPU I would recommend excluding those threads from accepting dynamically loaded plugins.

This would produce a hindsight.cfg something like this (the plugin cfg would simply have an explicit thread specification):

-- assume 4 actual CPUs
analysis_threads = 64
analysis_cpu_affinity = {[0] = 0,  [62] = 3, [63] =3} 
-- thread 0 on CPU0, threads 62, 63 on CPU3, analysis threads 1-61 will accept dynamically loaded
-- plugins and will run on any available CPU  

For the input/output plugin configuration we could add a cpu = N option (when specified it assigns CPU affinity).

I realize this is a lot more work, please feel free to open a feature request issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants