Skip to content

Micro-optimization for IsPrimitiveType#2810

Merged
KenitoInc merged 3 commits intoOData:masterfrom
habbes:perf/2806-is-primitive-value-perf
Dec 8, 2023
Merged

Micro-optimization for IsPrimitiveType#2810
KenitoInc merged 3 commits intoOData:masterfrom
habbes:perf/2806-is-primitive-value-perf

Conversation

@habbes
Copy link
Copy Markdown
Contributor

@habbes habbes commented Nov 29, 2023

Issues

*This pull request fixes #2806

Description

This PR improves the performance of creating an ODataPrimitiveValue instance for arguably the most common types (int, bool, string, dates, Guid). It does this by refactoring IsPrimitiveType. In the existing implementation, IsPrimitiveType checks whether the type is in the PrimitveTypeReferenceMap dictionary. This lookup is relatively costly. For most common types, I compare the type directly with the input CLR type using a compound || expression. Initially I had tried to put all the types in a long || expression, but what I found while benchmarking is that testing for types that were in the last predicates of the expression would take longer than looking them in the dictionary. So I tried to strike a balance by limiting the number of types I check outside the dictionary to only what I consider to be the most common ones. The Uint types are checked outside the dictionary regardless because they are not in that map. Maybe I should move them to the end of the expression since I suspect they are not that common. This "optimization" may make the dictionary look up more expensive for some types since it has do to more work before getting to the dictionary, but I think it's a reasonable trade off to have slightly lower perf for less frequent types in favour of the more frequent ones. I'm also open to suggestions on whether the list of types I've considered the most common is reasonable.

Since this micro-optimization might be harder to assess in larger benchmarks, I created some micro-benchmarks here:
https://github.com/habbes/odata.net/blob/entity-type-perf-tests/test/PerformanceTests/ComponentTests/CoreLib/PrimitiveValueTests.cs

ODataPrimitiveValue performance before

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
CreateIntValue 40.711 ns 0.8791 ns 1.7146 ns 40.630 ns 0.0148 - - 64 B
CreateBoolValue 40.263 ns 0.7957 ns 0.6644 ns 40.101 ns 0.0148 - - 64 B
CreateStringValue 38.614 ns 0.8505 ns 1.7564 ns 38.873 ns 0.0092 - - 40 B
CreateDateValue 39.929 ns 0.5751 ns 0.5098 ns 39.754 ns 0.0167 - - 72 B
CreateByteArrayValue 38.615 ns 0.8148 ns 0.8003 ns 38.254 ns 0.0092 - - 40 B
CreateGuidValue 45.000 ns 0.9237 ns 1.4651 ns 45.196 ns 0.0167 - - 72 B
CreateNullableIntValue 96.605 ns 2.0524 ns 5.9545 ns 94.499 ns 0.0148 - - 64 B
CreateUint64Value 16.575 ns 0.4084 ns 0.6595 ns 16.511 ns 0.0148 - - 64 B
CreateTimeOfDayValue 40.141 ns 0.8394 ns 0.8981 ns 40.247 ns 0.0148 - - 64 B

ODataPrimitiveValue performance after

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
CreateIntValue 15.09 ns 0.151 ns 0.118 ns 15.08 ns 0.0148 - - 64 B
CreateBoolValue 17.41 ns 0.414 ns 0.632 ns 17.26 ns 0.0148 - - 64 B
CreateStringValue 11.11 ns 0.060 ns 0.047 ns 11.13 ns 0.0093 - - 40 B
CreateDateValue 21.31 ns 0.492 ns 0.567 ns 21.07 ns 0.0167 - - 72 B
CreateByteArrayValue 50.69 ns 0.647 ns 0.540 ns 50.59 ns 0.0092 - - 40 B
CreateGuidValue 23.20 ns 0.337 ns 0.388 ns 23.05 ns 0.0167 - - 72 B
CreateNullableIntValue 67.13 ns 1.408 ns 2.844 ns 65.68 ns 0.0148 - - 64 B
CreateUint64Value 30.49 ns 0.489 ns 0.458 ns 30.50 ns 0.0148 - - 64 B
CreateTimeOfDayValue 58.25 ns 1.211 ns 2.274 ns 57.89 ns 0.0148 - - 64 B

I compare runningd IsPrimitiveType against different types. I tested three implementations of the method:

  • The IsPrimitiveType_ methods use the original implementation
  • IsPrimitiveTypeOptimized_ do not use the dictionary, all the types are checked directly in a long expression with predicates combined by ||
  • IsPrimitiveTypeOptimized2_ is hybrid implementation that checks the most common types directly and uses the dictionary for the rest. This is the version adopted in this PR.

Here are the results:

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
IsPrimitiveType_Int 22.601 ns 0.4719 ns 0.5969 ns 22.376 ns - - - -
IsPrimitiveType_String 22.289 ns 0.2468 ns 0.2308 ns 22.209 ns - - - -
IsPrimitiveType_Bool 21.033 ns 0.1256 ns 0.1114 ns 21.008 ns - - - -
IsPrimitiveType_Date 22.475 ns 0.4751 ns 0.6814 ns 22.361 ns - - - -
IsPrimitiveType_ByteArray 24.060 ns 0.4876 ns 0.4323 ns 24.266 ns - - - -
IsPrimitiveType_Guid 22.531 ns 0.4556 ns 0.4475 ns 22.513 ns - - - -
IsPrimitiveType_NullableInt 78.908 ns 1.6046 ns 2.9341 ns 78.545 ns 0.0055 - - 24 B
IsPrimitiveType_Uint64 5.689 ns 0.1445 ns 0.3111 ns 5.615 ns - - - -
IsPrimitiveType_TimeOfDay 22.624 ns 0.4729 ns 0.7221 ns 22.705 ns - - - -
IsPrimitiveTypeOptimized_Int 4.901 ns 0.1214 ns 0.1077 ns 4.882 ns - - - -
IsPrimitiveTypeOptimized_String 3.780 ns 0.0754 ns 0.0705 ns 3.759 ns - - - -
IsPrimitiveTypeOptimized_Bool 6.699 ns 0.1548 ns 0.1720 ns 6.682 ns - - - -
IsPrimitiveTypeOptimized_Date 8.810 ns 0.1957 ns 0.2743 ns 8.751 ns - - - -
IsPrimitiveTypeOptimized_ByteArray 29.467 ns 0.3959 ns 0.3510 ns 29.443 ns - - - -
IsPrimitiveTypeOptimized_Guid 21.046 ns 0.4198 ns 0.3721 ns 21.104 ns - - - -
IsPrimitiveTypeOptimized_NullableInt 51.598 ns 1.0570 ns 2.0864 ns 50.789 ns 0.0055 - - 24 B
IsPrimitiveTypeOptimized_Uint64 61.068 ns 0.4529 ns 0.4237 ns 61.058 ns - - - -
IsPrimitiveTypeOptimized_TimeOfDay 55.378 ns 1.1278 ns 1.2988 ns 54.893 ns - - - -
IsPrimitiveTypeOptimized2_Int 4.850 ns 0.0787 ns 0.0698 ns 4.862 ns - - - -
IsPrimitiveTypeOptimized2_String 3.784 ns 0.0403 ns 0.0377 ns 3.772 ns - - - -
IsPrimitiveTypeOptimized2_Bool 6.568 ns 0.1424 ns 0.1332 ns 6.532 ns - - - -
IsPrimitiveTypeOptimized2_Date 9.076 ns 0.1464 ns 0.1298 ns 9.123 ns - - - -
IsPrimitiveTypeOptimized2_ByteArray 41.829 ns 0.6950 ns 0.7137 ns 41.517 ns - - - -
IsPrimitiveTypeOptimized2_Guid 10.579 ns 0.2309 ns 0.5488 ns 10.348 ns - - - -
IsPrimitiveTypeOptimized2_NullableInt 55.496 ns 1.1484 ns 3.3499 ns 55.042 ns 0.0055 - - 24 B
IsPrimitiveTypeOptimized2_Uint64 19.755 ns 0.4191 ns 0.5301 ns 19.552 ns - - - -
IsPrimitiveTypeOptimized2_TimeOfDay 39.894 ns 0.6160 ns 0.5762 ns 39.639 ns - - - -

Checklist (Uncheck if it is not completed)

  • Test cases added
  • Build and test with one-click build and test script passed

Additional work necessary

If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.

xuzhg
xuzhg previously approved these changes Nov 29, 2023
Copy link
Copy Markdown
Contributor

@chrisspre chrisspre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linear scan over a bunch of types (the lengthy chain of the short circuiting or's is suboptimal.
A lot of primitive CLR types have a so called TypeCode , I suggest to use a HashSet of these TypeCodes first.

{
// Directly check the most common types for first
// for better performance.
return clrType == typeof(string)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this "linear scan" over all bunch of types, I suggest to create a static HashSet of TypeCode and use GetTypeCode on the clrType

Copy link
Copy Markdown
Contributor Author

@habbes habbes Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chrisspre. I'm running your suggestion as well as other variations I have thought about as a result of your suggestion under a benchmark (e.g. using an array map with the type codes as indices, using an int bitmap with the type codes as bit positions). I'll report back with my findings.

Copy link
Copy Markdown
Contributor Author

@habbes habbes Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I test the following approaches:

  • using a pre-initialized HashSet<TypeCode> lookup first then the fallback to the dictionary when the type code is not in the hashset: return typeCodeSet.Contains(typeCode) || primitiveMap.ContainsKey(type) ...
  • start with a lookup in a pre-initialized bool[] array where the type code is the index then fallback to the dictionary if the typecode is not in the array: return typeCodeArray[typeCode] || primitiveMap.ContainsKey(type) ...
  • use an int bitmap instead of a bool array, where a bit is set to one when its position in the int matches a supported type code: return typeCodeBitMap & (1 << typeCode) == (1 << typeCode) || primitiveMap.ContainsKey(type) ...

See full implementations here: https://github.com/habbes/odata.net/blob/entity-type-perf-tests/src/Microsoft.OData.Core/Metadata/EdmLibraryExtensions.cs

Find full benchmarks here: https://github.com/habbes/odata.net/blob/entity-type-perf-tests/test/PerformanceTests/ComponentTests/CoreLib/PrimitiveValueTests.cs

I found that the approach using HashSet<TypeCode> slower than my initial PR ("linear scan" using a chain of type comparisons) for the most common types (string, int, bool, Guid, DateTimeOffset), but slightly faster for the lesser common types (the ones looked up in the dictionary.

I found that the using bool array and int bitmap were much faster than using the HashSet on all counts. The array and int bitmap approaches had similar performances. For the bool array and int map approaches, I also found that it may be beneficial to add direct checks for DateTimeOffset (and Guid) which are common types for which don't have dedicated TypeCodes. It drastically speeds checks for DateTimeOffset and Guid while slightly slowing down types in the dictionary or spatial types.

The bool array and int bitmap are slightly slower than the original PR (using direct comparison chains) for strings and ints.

Method String Int Bool DateTimeOffset Guid ByteArray NullableInt Uint64 TimeOfDay Point
Baseline (existing impl) 23.318 ns 22.829 ns 22.224 ns 22.544 ns 22.688 ns 24.875 ns 78.282 ns 6.486 ns 22.577 ns 34.164 ns
Original PR (chain of conditions) 4.205 ns 5.338 ns 6.994 ns 11.713 ns 11.734 ns 42.085 ns 53.779 ns 21.584 ns 44.069 ns 51.081 ns
HashSet 17.417 ns 16.077 ns 17.013 ns 15.116 ns 18.298 ns 36.603 ns 64.565 ns 16.758 ns 35.429 ns 48.322 ns
bool[] map 6.454 ns 6.377 ns 6.846 ns 20.962 ns 21.927 ns 22.911 ns 56.591 ns 6.085 ns 22.108 ns 32.827 ns
bool[] map with checks for DateTimeOffset and Guid 6.832 ns 6.477 ns 6.25 ns 7.846 ns 10.712 ns 26.788 ns 55.688 ns 6.252 ns 28.462 ns 40.168 ns
int bitmap 6.203 ns 6.015 ns 6.013 ns 20.36 ns 24.865 ns 22.623 ns 61.293 ns 7.822 ns 20.267 ns 36.182 ns
int bitmap with checks for DateTimeOffset and Guid 6.982 ns 6.859 ns 6.25 ns 7.069 ns 8.218 ns 24.432 ns 50.542 ns 6.191 ns 24.601 ns 39.070 ns

Copy link
Copy Markdown
Contributor Author

@habbes habbes Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also did a final variation of the int bitmap approach to do a direct comparison for strings first before the type code lookup. It slightly improved performance for string, at a slight cost to everything else:

Method String Int Bool DateTimeOffset Guid ByteArray NullableInt Uint64 TimeOfDay Point
Baseline (existing impl) 23.318 ns 22.829 ns 22.224 ns 22.544 ns 22.688 ns 24.875 ns 78.282 ns 6.486 ns 22.577 ns 34.164 ns
Original PR (chain of conditions) 4.205 ns 5.338 ns 6.994 ns 11.713 ns 11.734 ns 42.085 ns 53.779 ns 21.584 ns 44.069 ns 51.081 ns
int bitmap 6.203 ns 6.015 ns 6.013 ns 20.36 ns 24.865 ns 22.623 ns 61.293 ns 7.822 ns 20.267 ns 36.182 ns
int bitmap with checks for DateTimeOffset and Guid 6.982 ns 6.859 ns 6.25 ns 7.069 ns 8.218 ns 24.432 ns 50.542 ns 6.191 ns 24.601 ns 39.070 ns
int bitmap with checks for strings, DateTimeOffset and guid 4.565 ns 7.974 ns 7.391 ns 8.165 ns 10.618 ns 26.414 ns 58.034 ns 7.105 ns 26.250 ns 37.706 ns

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on these results, I've settled for the int bitmap approach with extra checks for Guid and DateTimeOffset. I think it has reasonable trade-offs. It could also consider the array approach for arguably better readability, but I thought we don't need that extra allocation.

gathogojr
gathogojr previously approved these changes Dec 1, 2023
Copy link
Copy Markdown
Contributor

@gathogojr gathogojr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:


if (clrType == typeof(UInt16) || clrType == typeof(UInt32) || clrType == typeof(UInt64))
{
// Directly check the most common types for first
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Directly check the most common types for first
// Directly check the most common types first

nit

odero
odero previously approved these changes Dec 1, 2023
@habbes habbes dismissed stale reviews from odero, gathogojr, and xuzhg via b01543d December 1, 2023 14:24
@pull-request-quantifier-deprecated
Copy link
Copy Markdown

This PR has 36 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Small
Size       : +33 -3
Percentile : 14.4%

Total files changed: 1

Change summary by file extension:
.cs : +33 -3

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@KenitoInc KenitoInc merged commit 29c3fe3 into OData:master Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perf: ODataPrimitiveValue constructor is relatively expensive

7 participants