Skip to content

Extra null check on as_bytes() iteration with break #25306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
raphlinus opened this issue May 11, 2015 · 5 comments
Closed

Extra null check on as_bytes() iteration with break #25306

raphlinus opened this issue May 11, 2015 · 5 comments
Labels
A-codegen Area: Code generation

Comments

@raphlinus
Copy link
Contributor

Code generation is less than ideal for a function which iterates over a string as_bytes() and has a break; there is a null check of the pointer to the bytes before loading.

Here's the function and the asm:

pub fn find_nl(s: &str) -> usize {
    let mut i = 0;
    for &c in s.as_bytes() {
        if c == b'\n' { break; }
        i += 1;
    }
    i
}
.LBB0_4:
    movq    %rdx, %rsi
    addq    %rax, %rsi
    je  .LBB0_7
    movzbl  (%rdx,%rax), %esi
    cmpl    $10, %esi
    je  .LBB0_7
    incq    %rax
    cmpq    %rax, %rcx
    jne .LBB0_4
.LBB0_7:

The function and two other versions (which generate better code) are at http://is.gd/E4i7z7. A version which uses enumerate doesn't have the null check, but has an extra movq. A version which uses position rather than writing the loop generates perfect code. Just for context, this issue arose while writing a Markdown parser. The current code uses position(), and the speedup was significant.

@bluss
Copy link
Member

bluss commented May 11, 2015

cc @huonw

@steveklabnik steveklabnik added the A-codegen Area: Code generation label May 13, 2015
@steveklabnik
Copy link
Member

Triage: today, this seems much better:

.LBB0_2:
	cmpb	$10, (%rdi,%rax)
	je	.LBB0_4
	incq	%rax
	cmpq	%rax, %rsi
	jne	.LBB0_2

The other two functions actually look worse, but I am not an expert here. @raphlinus what do you think?

@raphlinus
Copy link
Contributor Author

Yeah, all good today. I haven't done deep performance analysis, but from examination of the assembly code they all look good. There is a slight difference, in the first version the check against s.len() is at the bottom, so there needs to be an early exit in the empty case, in the other two (which are identical) it's the at top of the loop, but there's just a bit more register shuffling.

This function will become less important in the rewrite of pulldown-cmark. I'm toying with ideas to use SIMD to find "exceptional" bytes in the input, similar to bytecount.

@bluss
Copy link
Member

bluss commented Jan 3, 2017

Because the loop in Iterator::position is normally not unrolled by the optimizer, explicit unrolling like #37972 can help.

@shepmaster
Copy link
Member

I'm toying with ideas to use SIMD to find "exceptional" bytes in the input

@raphlinus perhaps https://github.com/shepmaster/jetscii would be of interest then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation
Projects
None yet
Development

No branches or pull requests

4 participants