-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Specialize slice::fill with Copy type and u8/i8/bool #81874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The job Click to see the possible cause of the failure (guessed by this bot)
|
for item in self.iter_mut() { | ||
*item = value; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of specializing separately on [u8]
below you can do a if mem::size_of::<T>() == 1
here and call memset for any type that's 1 byte wide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that's another option. Will do that! Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value
now has 1-byte size. ptr::write_bytes
only accepts u8 as second argument.
Should I transmute value
to u8 ? Is it sound ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if I want to transmute, the compiler forbids me: https://rust.godbolt.org/z/a3bsYs
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> <source>:38:33
|
38 | let value: u8 = std::mem::transmute(value);
| ^^^^^^^^^^^^^^^^^^^
|
= note: source type: `T` (this type does not have a fixed size)
= note: target type: `u8` (8 bits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transmute_copy
should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! That's really work! But I wonder if that those specific specialization for [u8] matters much with #81874 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage is that it doesn't rely on the optimizer, so it'll work in debug mode too or with less powerful backends.
With the size_of
approach it only has to be implemented once instead of one specialization for each T.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong opinion on this. I defer that to reviewer to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bjorn3. the8472 and I were discussing if other codegen backends could handle straight
for loop
to copy item and make it to memcpy when item is 1-byte size.
Do you think cranelift could be benefited from this optimization ^^?
c566a2e
to
010e194
Compare
} | ||
} | ||
|
||
impl SpecFill<u8> for [u8] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference when not specialize for [u8]
is that compiler emits a check whether the slice is empty
before the call to memcpy.
https://rust.godbolt.org/z/z9fbcT.
Maybe it is not worth to keep those specs for u8/i8/bool slice.
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 010e194 with merge 194efb9e2ffaa05d0d10f8cb6c97e5a83f763f5d... |
☀️ Try build successful - checks-actions |
Queued 194efb9e2ffaa05d0d10f8cb6c97e5a83f763f5d with parent b86674e, future comparison URL. |
Finished benchmarking try commit (194efb9e2ffaa05d0d10f8cb6c97e5a83f763f5d): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Note that |
Friendly ping. |
@bors r+ |
📌 Commit 010e194 has been approved by |
☀️ Test successful - checks-actions |
Thanks for the merge. But quick note when things slipped is that some comments are not resolved. |
I don't expect rustperf could measure any perf improvements with this changes
since
slice::fill
is newly added.Godbolt link for this change: https://rust.godbolt.org/z/r3fzee.
r? @matthewjasper since this patch added new specialization.