Skip to content

Cache lookups to IEnumerable<IYamlTypeConverter>#956

Merged
EdwardCooke merged 3 commits intoaaubry:masterfrom
MattKotsenas:refactor/visitor-allocs
Sep 1, 2024
Merged

Cache lookups to IEnumerable<IYamlTypeConverter>#956
EdwardCooke merged 3 commits intoaaubry:masterfrom
MattKotsenas:refactor/visitor-allocs

Conversation

@MattKotsenas
Copy link
Contributor

@MattKotsenas MattKotsenas commented Aug 16, 2024

Several types accept an IEnumerable<IYamlTypeConverter>, and then repeatedly traverse the list searching for a converter that accepts the type to be serialized. Traversing an IEnumerable with LINQ results in an enumerator (to walk the IEnumerable) and a Func (to filter the enumerable) allocation per visitor per object serialized. In the benchmark setup this results in 12 allocations per object serialized.

To avoid the repeated lookups and allocations, I introduced an internal TypeConverterCache that accepts the collection of type converters and maps the type to convert to the first IYamlTypeConverter that accepts the type (the same as the previous behavior). This eliminates all 12 allocations and has a small CPU time win as well.

To avoid a breaking change, I left the protected IEnumerable<IYamlTypeConverter> and added a private TypeConverterCache on PreProcessingPhaseObjectGraphVisitorSkeleton. If / when we want to remove cruft and can accept breaking changes we can remove this field.

Note that each type that previously was doing list traversal has its own cache. If it's OK to change the constructor signature on these types, we could instead construct a single cache in the builder (e.g. in BuilderSkeleton.BuildTypeConverters) and pass it in. However, I assume the builder is designed for public extension, so I avoided that in this change.

In the provided benchmark, this results in a 15% - 25% reduction in allocations:

Before

// * Summary *

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22635.4010)
Intel Core i9-10940X CPU 3.30GHz, 1 CPU, 28 logical and 14 physical cores
.NET SDK 9.0.100-preview.7.24407.12
  [Host]                       : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
  MediumRun-.NET 8.0           : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
  MediumRun-.NET Framework 4.7 : .NET Framework 4.8.1 (4.8.9261.0), X64 RyuJIT VectorSize=256

IterationCount=15  LaunchCount=2  WarmupCount=10

| Method     | Job                          | Runtime            | Mean      | Error    | StdDev   | Gen0      | Gen1     | Gen2     | Allocated |
|----------- |----------------------------- |------------------- |----------:|---------:|---------:|----------:|---------:|---------:|----------:|
| Serializer | MediumRun-.NET 8.0           | .NET 8.0           |  49.16 ms | 1.436 ms | 2.105 ms | 2000.0000 | 500.0000 |        - |  24.44 MB |
| Serializer | MediumRun-.NET Framework 4.7 | .NET Framework 4.7 | 109.79 ms | 2.546 ms | 3.569 ms | 7800.0000 | 600.0000 | 200.0000 |     48 MB |

// * Warnings *
MinIterationTime
  SerializationBenchmarks.Serializer: MediumRun-.NET 8.0 -> The minimum observed iteration time is 91.906ms which is very small. It's recommended to increase it to at least 100ms using more operations.

// * Hints *
Outliers
  SerializationBenchmarks.Serializer: MediumRun-.NET 8.0           -> 1 outlier  was  removed (57.76 ms)
  SerializationBenchmarks.Serializer: MediumRun-.NET Framework 4.7 -> 2 outliers were removed (121.75 ms, 123.43 ms)

After

// * Summary *

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22635.4010)
Intel Core i9-10940X CPU 3.30GHz, 1 CPU, 28 logical and 14 physical cores
.NET SDK 9.0.100-preview.7.24407.12
  [Host]                       : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
  MediumRun-.NET 8.0           : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
  MediumRun-.NET Framework 4.7 : .NET Framework 4.8.1 (4.8.9261.0), X64 RyuJIT VectorSize=256

IterationCount=15  LaunchCount=2  WarmupCount=10

| Method     | Job                          | Runtime            | Mean      | Error    | StdDev   | Gen0      | Gen1     | Gen2     | Allocated |
|----------- |----------------------------- |------------------- |----------:|---------:|---------:|----------:|---------:|---------:|----------:|
| Serializer | MediumRun-.NET 8.0           | .NET 8.0           |  46.67 ms | 1.620 ms | 2.425 ms | 1500.0000 | 500.0000 |        - |  17.95 MB |
| Serializer | MediumRun-.NET Framework 4.7 | .NET Framework 4.7 | 108.68 ms | 3.342 ms | 5.002 ms | 6600.0000 | 600.0000 | 200.0000 |  41.49 MB |

// * Warnings *
MultimodalDistribution
  SerializationBenchmarks.Serializer: MediumRun-.NET 8.0 -> It seems that the distribution can have several modes (mValue = 2.83)
MinIterationTime
  SerializationBenchmarks.Serializer: MediumRun-.NET 8.0 -> The minimum observed iteration time is 85.323ms which is very small. It's recommended to increase it to at least 100ms using more operations.

// * Hints *
Outliers
  SerializationBenchmarks.Serializer: MediumRun-.NET 8.0           -> 1 outlier  was  removed (52.89 ms)
  SerializationBenchmarks.Serializer: MediumRun-.NET Framework 4.7 -> 1 outlier  was  removed (122.13 ms)

@MattKotsenas
Copy link
Contributor Author

Note that this PR uses the same benchmark introduced in #953 and #954. So if / when one of these PRs merge, the other should be rebased on top of latest main to remove the duplicate commit.

I'm happy to do this housekeeping. I just mention it for clarity, and so that each PR can be reviewed independently, rather than forcing them to all be chained together.

@MattKotsenas
Copy link
Contributor Author

MattKotsenas commented Aug 16, 2024

Alternatively, there aren't many IYamlTypeConverters built-in, and the Accept() call is cheap, thus we can get similar benefit by changing the IEnumerable to be an array instead and manually unrolling the LINQ calls.

I didn't go that route initially because:

  1. It's as disruptive of a change
  2. It's less DRY, as each consumer (and any new consumers in the future) would need to also know to avoid LINQ

If you have a strong preference of either path let me know and I'm happy to make updates here.

@MattKotsenas MattKotsenas force-pushed the refactor/visitor-allocs branch from 97fbcfb to 657c784 Compare August 31, 2024 23:50
@EdwardCooke
Copy link
Collaborator

I'm happier with no breaking changes, even in the constructors, but yes, this is public so it can be extended. In fact I saw a consumer of yamldotnet the other day that inherited some of these low level classes and made minor changes. Like the kubernetes client.

@EdwardCooke EdwardCooke merged commit 3e15cee into aaubry:master Sep 1, 2024
internal sealed class TypeConverterCache
{
private readonly IYamlTypeConverter[] typeConverters;
private readonly Dictionary<Type, (bool HasMatch, IYamlTypeConverter TypeConverter)> cache = new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to make a PR to fix this, but this should have been a concurrent dictionary. It's failing on my laptop because it's fast enough to catch it where the build system is slow enought that it didn't. Found it while testing your other PR for the build properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, sorry that I missed this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants