Introduce a settable class-level cache for Version#1127
Introduce a settable class-level cache for Version#1127
Conversation
`Version._constructor_cache` is a contextvar (as a classvar) containing a mutable mapping of strings-to-Version-instances, or None. `Version.__new__` now checks the cache and will return the same instance of `Version` for the same string input if there is a cache hit. `Version.__init__` is updated ot check for an existing `_release` attribute in order to determine whether or not it's being called on an already initialized instance and short-circuit appropriately. In order to enable and control the cache, `Version` has a new public API, `Version.set_constructor_cache`, which accepts a mutable mapping or None (with `None` being the indicator for "disable the cache"). `set_constructor_cache` documents that the setting is a contextvar, which provides callers with sufficient information to make reasonable decisions about concurrency. No attempt is made to protect against data races if the same cache is used in multiple threads. However, it is valid to use a single cache in such use cases and to protectively lock around calls to `packaging` functionality. Failing to lock in such usages could result in data races and handling `Version` objects which are in the midst of being initialized in another thread, due to the short-circuiting check in `Version.__init__`. Policies around cache lifetimes and evictions, e.g., an LRU policy, are entirely external to the cache interface, which only requires that a cache is provided.
Because the overhead of contextvar-held caches is too high, this refactor to the settable constructor cache strategy treats the cache as global. Doing so includes new notes in the docstring for `set_constructor_cache` to explain that callers of `Version()` with caching may want to consider whether they want to use locks or a fancy `cache` type (e.g., one that is internally threadlocal/contextvar based). Included with this change, add a new unit test for the caching behavior plus a new dedicated benchmark for it.
|
I've rewritten to a global cache because the penalty paid for a contextvar was so big. The design remains the same, and the docstring for the cache setting method now notes that there is no locking around the cache because it's significant to proper usage. I also added a new benchmark, which runs a copy of the Version constructor benchmark with the cache enabled. Because there are already duplicates in Update: |
Rather than checking that the args are correct for `__new__`, when a cache is enabled it will be *assumed* that the caller is passing good args. Bad args will simply hard error.
|
I pushed a final commit with my last ditch effort on this, but I can already tell from local benchmarks that it's not going to work. I'm self-closing and will try to re-approach the issue without relying on |
|
Thanks a lot for trying, the dispatch to a Python |
Opened as a draft to enable discussion.
I took a crack at caching
Versioninstances using__new__, as suggested by @notatallshaw in some prior conversation.The initial results are promising but also raise questions.
First, there's the question of the interface proposed here.
Allowing the cache to be passed in from the outside has some nice benefits, in that any cache policy is fully in the control of the caller.
But it does force the
Versioncache to operate in terms of some external object -- the caching behavior is only as efficient as the choice of mapping type for the cache.Without caching enabled, there is a measurable slowdown of ~10% for several benchmarks.
With a
Version.set_constructor_cache({})call added, there's an improvement of as much as 80%.The added overhead seems quite high, arguably too high.
EDIT: the CI runs show an even more severe penalty than my local runs. It may be that using a contextvar is just too costly.
Overall, I'm curious where the maintainers think this might go as an idea, and about suggestions for ways to alter the approach.
Version._constructor_cacheis a contextvar (as a classvar) containinga mutable mapping of strings-to-Version-instances, or None.
Version.__new__now checks the cache and will return the same instanceof
Versionfor the same string input if there is a cache hit.Version.__init__is updated ot check for an existing_releaseattribute in order to determine whether or not it's being called on an
already initialized instance and short-circuit appropriately.
In order to enable and control the cache,
Versionhas a new public API,Version.set_constructor_cache, which accepts a mutable mapping or None(with
Nonebeing the indicator for "disable the cache").set_constructor_cachedocuments that the setting is a contextvar, whichprovides callers with sufficient information to make reasonable decisions
about concurrency.
No attempt is made to protect against data races if the same cache is
used in multiple threads. However, it is valid to use a single
cache in such use cases and to protectively lock around calls to
packagingfunctionality. Failing to lock in such usages could resultin data races and handling
Versionobjects which are in the midst ofbeing initialized in another thread, due to the short-circuiting check
in
Version.__init__.Policies around cache lifetimes and evictions, e.g., an LRU policy, are
entirely external to the cache interface, which only requires that a
cache is provided.