You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the course of optimizing sorting short vectors in #12029, I noticed that the maximum run length that merge sort uses has an impact on performance. Increasing the length to 32 improved performance when sorting u64s, but sorting larger types incurred the cost of the additional writes that insertion sort does.
There is definitely room for tuning this parameter, possibly varying it with the size of the type being sorted. It should probably be based on more than what happens to run fast on my specific computer.
Below are benchmarks:
in-place insertion (32 elem max run):
In the course of optimizing sorting short vectors in #12029, I noticed that the maximum run length that merge sort uses has an impact on performance. Increasing the length to 32 improved performance when sorting u64s, but sorting larger types incurred the cost of the additional writes that insertion sort does.
There is definitely room for tuning this parameter, possibly varying it with the size of the type being sorted. It should probably be based on more than what happens to run fast on my specific computer.
Below are benchmarks:
in-place insertion (32 elem max run):
in-place insertion (16 elem max run)
in-place insertion (8 elem max run):
out-of-place (32 run insertion sort)
out-of-place (16 run insertion sort)
out-of-place (8 run insertion sort)
The text was updated successfully, but these errors were encountered: