Skip to content

make tidy-alphabetical use a natural sort #141311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

folkertdev
Copy link
Contributor

The idea here is that these lines should be correctly sorted, even though a naive string comparison would say they are not:

foo2
foo10

This is the "natural sort order".

There is more discussion in #t-compiler/help > tidy natural sort

Unfortunately, no standard sorting tools are smart enough to to this automatically (casting some doubt on whether we should make this change). Here are some sort outputs:

> cat foo.txt | sort
foo
foo1
foo10
foo2
mp
mp1e2
np", 
np1e2", 
> cat foo.txt | sort -n
foo
foo1
foo10
foo2
mp
mp1e2
np", 
np1e2", 
> cat foo.txt | sort -V
foo
foo1
foo2
foo10
mp
mp1e2
np1e2", 
np", 

Disappointingly, "numeric" sort does not actually have the behavior we want. It only sorts by numeric value if the line starts with a number. The "version" sort looks promising, but does something very unintuitive if you look at the final 4 values. None of the other options seem to have the desired behavior in all cases:

  -b, --ignore-leading-blanks  ignore leading blanks
  -d, --dictionary-order      consider only blanks and alphanumeric characters
  -f, --ignore-case           fold lower case to upper case characters
  -g, --general-numeric-sort  compare according to general numerical value
  -i, --ignore-nonprinting    consider only printable characters
  -M, --month-sort            compare (unknown) < 'JAN' < ... < 'DEC'
  -h, --human-numeric-sort    compare human readable numbers (e.g., 2K 1G)
  -n, --numeric-sort          compare according to string numerical value
  -R, --random-sort           shuffle, but group identical keys.  See shuf(1)
      --random-source=FILE    get random bytes from FILE
  -r, --reverse               reverse the result of comparisons
      --sort=WORD             sort according to WORD:
                                general-numeric -g, human-numeric -h, month -M,
                                numeric -n, random -R, version -V
  -V, --version-sort          natural sort of (version) numbers within text

r? @Noratrieb (it sounded like you know this code?)

@rustbot rustbot added A-tidy Area: The tidy tool S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 20, 2025
@joshtriplett
Copy link
Member

@folkertdev We have a lot of redundant versions of this sort. See the algorithm documented in the style guide; is that algorithm sufficient here? If so, that algorithm is implemented in rustfmt.

@folkertdev
Copy link
Contributor Author

Oh, right, https://github.com/rust-lang/rustfmt/blob/master/src/sort.rs is probably more robust (e.g. it handles leading/repeated zeros, which my version does not). Can we achieve code sharing in a reasonable way between tidy and rustfmt? Or should I just copy-paste that implementation?

@Noratrieb
Copy link
Member

Sharing the code is quite annoying due to how rustfmt is set up with its subtree, copying the code sounds better. It wouldn't be that bad if it diverges.

@folkertdev
Copy link
Contributor Author

I think because this algorithm was designed for rustfmt, it has some weird behavior for things that are not rust identifiers. For instance it sorts these examples like so

foo_bar
foo-bar

Contrary to alphabetical sorting; _ is sorted before all non-whitespace characters. This also causes this:

        rustc_ast_lowering = { path = \"../rustc_ast_lowering\" }
        rustc_ast = { path = \"../rustc_ast\" }

So, idk, this algorithm has some nice features, but I think it needs some tweaking for this use case.

@folkertdev folkertdev force-pushed the tidy-natural-sort branch from 1f057ed to f81c2f0 Compare May 21, 2025 11:32
@rustbot rustbot added the O-windows Operating system: Windows label May 21, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 21, 2025

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

@folkertdev
Copy link
Contributor Author

After looking at this a bit, I think for the sort of thing that tidy is used for, the segment-based sorting method is a poor fit. The method really overfits on sequences of rust identifiers, and in many cases produces an ordering that is different from standard string sort. Staying as close a possible to standard string sort is convenient (e.g. you can use your editor to sort lines).

So, I've updated the algorithm to 1) consider leading zeros and 2) perform a case-sensitive sort. In most cases that means your editor's sorting functionality will produce the correct result, except when there are numbers in the identifiers.

Comment on lines 298 to +303
// FEAT_SME_F8F32
("sme-f8f32", Unstable(sym::aarch64_unstable_target_feature), &["sme2", "fp8"]),
// FEAT_SME_F16F16
("sme-f16f16", Unstable(sym::aarch64_unstable_target_feature), &["sme2"]),
// FEAT_SME_F64F64
("sme-f64f64", Unstable(sym::aarch64_unstable_target_feature), &["sme"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvement.

@saethlin
Copy link
Member

saethlin commented May 21, 2025

FWIW I don't use my editor to sort these things, I just do a decent enough job by eye then comply when tidy tells me to change something. So any change that looks more sensible and/or facilitates eyeball-binary-search makes me happy.

@folkertdev folkertdev force-pushed the tidy-natural-sort branch from f81c2f0 to 50f22f4 Compare May 22, 2025 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tidy Area: The tidy tool O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants