Micro-optimization for IsPrimitiveType#2810
Conversation
chrisspre
left a comment
There was a problem hiding this comment.
The linear scan over a bunch of types (the lengthy chain of the short circuiting or's is suboptimal.
A lot of primitive CLR types have a so called TypeCode , I suggest to use a HashSet of these TypeCodes first.
| { | ||
| // Directly check the most common types for first | ||
| // for better performance. | ||
| return clrType == typeof(string) |
There was a problem hiding this comment.
instead of this "linear scan" over all bunch of types, I suggest to create a static HashSet of TypeCode and use GetTypeCode on the clrType
There was a problem hiding this comment.
Thanks @chrisspre. I'm running your suggestion as well as other variations I have thought about as a result of your suggestion under a benchmark (e.g. using an array map with the type codes as indices, using an int bitmap with the type codes as bit positions). I'll report back with my findings.
There was a problem hiding this comment.
I test the following approaches:
- using a pre-initialized
HashSet<TypeCode>lookup first then the fallback to the dictionary when the type code is not in the hashset:return typeCodeSet.Contains(typeCode) || primitiveMap.ContainsKey(type) ... - start with a lookup in a pre-initialized
bool[]array where the type code is the index then fallback to the dictionary if the typecode is not in the array:return typeCodeArray[typeCode] || primitiveMap.ContainsKey(type) ... - use an int bitmap instead of a bool array, where a bit is set to one when its position in the int matches a supported type code:
return typeCodeBitMap & (1 << typeCode) == (1 << typeCode) || primitiveMap.ContainsKey(type) ...
See full implementations here: https://github.com/habbes/odata.net/blob/entity-type-perf-tests/src/Microsoft.OData.Core/Metadata/EdmLibraryExtensions.cs
Find full benchmarks here: https://github.com/habbes/odata.net/blob/entity-type-perf-tests/test/PerformanceTests/ComponentTests/CoreLib/PrimitiveValueTests.cs
I found that the approach using HashSet<TypeCode> slower than my initial PR ("linear scan" using a chain of type comparisons) for the most common types (string, int, bool, Guid, DateTimeOffset), but slightly faster for the lesser common types (the ones looked up in the dictionary.
I found that the using bool array and int bitmap were much faster than using the HashSet on all counts. The array and int bitmap approaches had similar performances. For the bool array and int map approaches, I also found that it may be beneficial to add direct checks for DateTimeOffset (and Guid) which are common types for which don't have dedicated TypeCodes. It drastically speeds checks for DateTimeOffset and Guid while slightly slowing down types in the dictionary or spatial types.
The bool array and int bitmap are slightly slower than the original PR (using direct comparison chains) for strings and ints.
| Method | String | Int | Bool | DateTimeOffset | Guid | ByteArray | NullableInt | Uint64 | TimeOfDay | Point |
|---|---|---|---|---|---|---|---|---|---|---|
| Baseline (existing impl) | 23.318 ns | 22.829 ns | 22.224 ns | 22.544 ns | 22.688 ns | 24.875 ns | 78.282 ns | 6.486 ns | 22.577 ns | 34.164 ns |
| Original PR (chain of conditions) | 4.205 ns | 5.338 ns | 6.994 ns | 11.713 ns | 11.734 ns | 42.085 ns | 53.779 ns | 21.584 ns | 44.069 ns | 51.081 ns |
| HashSet | 17.417 ns | 16.077 ns | 17.013 ns | 15.116 ns | 18.298 ns | 36.603 ns | 64.565 ns | 16.758 ns | 35.429 ns | 48.322 ns |
| bool[] map | 6.454 ns | 6.377 ns | 6.846 ns | 20.962 ns | 21.927 ns | 22.911 ns | 56.591 ns | 6.085 ns | 22.108 ns | 32.827 ns |
| bool[] map with checks for DateTimeOffset and Guid | 6.832 ns | 6.477 ns | 6.25 ns | 7.846 ns | 10.712 ns | 26.788 ns | 55.688 ns | 6.252 ns | 28.462 ns | 40.168 ns |
| int bitmap | 6.203 ns | 6.015 ns | 6.013 ns | 20.36 ns | 24.865 ns | 22.623 ns | 61.293 ns | 7.822 ns | 20.267 ns | 36.182 ns |
| int bitmap with checks for DateTimeOffset and Guid | 6.982 ns | 6.859 ns | 6.25 ns | 7.069 ns | 8.218 ns | 24.432 ns | 50.542 ns | 6.191 ns | 24.601 ns | 39.070 ns |
There was a problem hiding this comment.
I also did a final variation of the int bitmap approach to do a direct comparison for strings first before the type code lookup. It slightly improved performance for string, at a slight cost to everything else:
| Method | String | Int | Bool | DateTimeOffset | Guid | ByteArray | NullableInt | Uint64 | TimeOfDay | Point |
|---|---|---|---|---|---|---|---|---|---|---|
| Baseline (existing impl) | 23.318 ns | 22.829 ns | 22.224 ns | 22.544 ns | 22.688 ns | 24.875 ns | 78.282 ns | 6.486 ns | 22.577 ns | 34.164 ns |
| Original PR (chain of conditions) | 4.205 ns | 5.338 ns | 6.994 ns | 11.713 ns | 11.734 ns | 42.085 ns | 53.779 ns | 21.584 ns | 44.069 ns | 51.081 ns |
| int bitmap | 6.203 ns | 6.015 ns | 6.013 ns | 20.36 ns | 24.865 ns | 22.623 ns | 61.293 ns | 7.822 ns | 20.267 ns | 36.182 ns |
| int bitmap with checks for DateTimeOffset and Guid | 6.982 ns | 6.859 ns | 6.25 ns | 7.069 ns | 8.218 ns | 24.432 ns | 50.542 ns | 6.191 ns | 24.601 ns | 39.070 ns |
| int bitmap with checks for strings, DateTimeOffset and guid | 4.565 ns | 7.974 ns | 7.391 ns | 8.165 ns | 10.618 ns | 26.414 ns | 58.034 ns | 7.105 ns | 26.250 ns | 37.706 ns |
There was a problem hiding this comment.
Based on these results, I've settled for the int bitmap approach with extra checks for Guid and DateTimeOffset. I think it has reasonable trade-offs. It could also consider the array approach for arguably better readability, but I thought we don't need that extra allocation.
|
|
||
| if (clrType == typeof(UInt16) || clrType == typeof(UInt32) || clrType == typeof(UInt64)) | ||
| { | ||
| // Directly check the most common types for first |
There was a problem hiding this comment.
| // Directly check the most common types for first | |
| // Directly check the most common types first | |
nit
|
This PR has Quantification details
Why proper sizing of changes matters
Optimal pull request sizes drive a better predictable PR flow as they strike a
What can I do to optimize my changes
How to interpret the change counts in git diff output
Was this comment helpful? 👍 :ok_hand: :thumbsdown: (Email) |
Issues
*This pull request fixes #2806
Description
This PR improves the performance of creating an
ODataPrimitiveValueinstance for arguably the most common types (int, bool, string, dates, Guid). It does this by refactoringIsPrimitiveType. In the existing implementation,IsPrimitiveTypechecks whether the type is in thePrimitveTypeReferenceMapdictionary. This lookup is relatively costly. For most common types, I compare the type directly with the input CLR type using a compound||expression. Initially I had tried to put all the types in a long||expression, but what I found while benchmarking is that testing for types that were in the last predicates of the expression would take longer than looking them in the dictionary. So I tried to strike a balance by limiting the number of types I check outside the dictionary to only what I consider to be the most common ones. The Uint types are checked outside the dictionary regardless because they are not in that map. Maybe I should move them to the end of the expression since I suspect they are not that common. This "optimization" may make the dictionary look up more expensive for some types since it has do to more work before getting to the dictionary, but I think it's a reasonable trade off to have slightly lower perf for less frequent types in favour of the more frequent ones. I'm also open to suggestions on whether the list of types I've considered the most common is reasonable.Since this micro-optimization might be harder to assess in larger benchmarks, I created some micro-benchmarks here:
https://github.com/habbes/odata.net/blob/entity-type-perf-tests/test/PerformanceTests/ComponentTests/CoreLib/PrimitiveValueTests.cs
ODataPrimitiveValueperformance beforeODataPrimitiveValueperformance afterI compare runningd
IsPrimitiveTypeagainst different types. I tested three implementations of the method:IsPrimitiveType_methods use the original implementationIsPrimitiveTypeOptimized_do not use the dictionary, all the types are checked directly in a long expression with predicates combined by||IsPrimitiveTypeOptimized2_is hybrid implementation that checks the most common types directly and uses the dictionary for the rest. This is the version adopted in this PR.Here are the results:
Checklist (Uncheck if it is not completed)
Additional work necessary
If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.