Skip to content

Reorganize the various encodings in libextra #8310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alexcrichton opened this issue Aug 5, 2013 · 6 comments
Closed

Reorganize the various encodings in libextra #8310

alexcrichton opened this issue Aug 5, 2013 · 6 comments

Comments

@alexcrichton
Copy link
Member

Currently (pending #8287), libextra will have the following encodings

  • json
  • ebml
  • base64
  • hex

And in theory we should add these in the future:

  • xml
  • base32
  • ASN.1

(or at least those are the ones I could think of off the top of my head). Right now they're all placed directly inside the libextra folder, but they would probably benefit from a better organization scheme.

Currently all of the "encodings" can generally fall into two categories. One is "bytes to strings and back" and the other is a "generic serialization format".

Right now I believe that the "generic serialization format" is captured nicely via the serialize::{Encoder, Decoder} traits, and it unifies json/ebml. @omasanori has brought up concerns about the naming scheme, and my proposal below might alleviate these concerns.

Otherwise, the "bytes to strings and back" type of encoding isn't very well unified at this point. There should be a trait (in possibly a new extra::encoding module) which unifies base64/base32/hex and whatever else we come across. In my opinion, the trait should look like:

trait Text {
  fn encode(&[u8]) -> ~str;
  fn decode(&str) -> ~[u8];
}

This makes it clear what the "encode" procedure is and what the "decode" procedure is. Additionally, it's explicitly clear that bytes live on one side, and strings live on the other, there's no intermingling between the two of them.

Finally, I think we should move everything around into:

  • extra::encoding::base64
  • extra::encoding::base32
  • extra::encoding::hex
  • extra::encoding::json
  • extra::encoding::ebml

And then encoding/mod.rs would define the traits

  • Text (is the name too short? if you reference it by encoding::Text I think it's clear but that's not always guaranteed)
  • Serializer (serialize::Encoder today)
  • Deserializer (serialize::Decoder today)

What do others think about this? If this looks good, I'd be more than happy to reorganize everything!

@sfackler
Copy link
Member

sfackler commented Aug 5, 2013

I'm probably missing something here, but how would the Text trait actually be used? If we implement it in extra::encoding::Base64, extra::encoding::Hex, etc, I would think that it would cause a duplicate trait implementation error, and even if it didn't, how would the user select the encoding they want? You could differentiate with a dummy type parameter, but I'm not sure why str.encode::<Base64Tag>(...) is any better than str.to_base64(...).

@alexcrichton
Copy link
Member Author

That is a good point yeah, I suppose it would look something like:

// src/libextra/encoding/base64.rs
struct Base64;
impl encoding::Text for Base64 { ... }

// src/libextra/encoding/hex.rs
struct Hex;
impl encoding::Text for Hex { ... }

// src/libextra/encoding/mod.rs

impl SomeHelperTrait for &'self str {
  fn decode<E: Text>(self) -> ~[u8] {
    Text::decode(self)
  }
}

impl SomeHelperTrait for &'self [u8] {
  fn encode<E: Text>(self) -> ~str {
    Text::encode(self)
  }
}

Albeit that is a bit roundabout, and to use those methods you'd have to both use encoding::Base64 and encoding::SomeHelperTrait. Perhaps this is a situation where a trait defining things just adds a burden to everyone using it instead of making things easier. When actually thinking it through and seeing the overhead, I think I'd be in favor of just some to_base64/to_hex methods (as is the current situation).

Regardless though I think the reorganization would be nice (into an encoding super-module)

@sfackler
Copy link
Member

sfackler commented Aug 5, 2013

Looking in src/libextra/extra.rs, there's a couple of other super-modules we could create: concurrency, collections and crypto, though crypto is already in it's own module but reexported in the base extra module.

I think there will need to be some changes made to rustdoc before this is a viable proposal. Submodules are currently only shown at the bottom of the table of contents of their super model (e.g. str::raw). For searchability's sake, I think we'd want a heirarchical display more like Go's package documentation. Rustdoc-ng may fix this already, though.

@omasanori
Copy link
Contributor

I doubt whether we should unify "bytes to strings and back" ones and "generic serialization format" ones into one modules, say extra::encoding.

(Summary: Base64 handles octets but JSON does numbers, strings, arrays, etc.)

On the one hand, both share the same essential concept and use case: transforming data and they are used for e.g. saving data in a file, sending data through the Internet, etc.
In fact, ASN.1 defines its serialization formats as encoding rules. (a half of ASN.1 defines an abstract syntax of various data and the other defines how we represent data using bits. I refer the latter parts here.)

On the other hand, "bytes to strings and back" ones assume that there is already a representation of data and transform it but "generic serialization format" ones handle objects in the environment where a program is running: of course they may be represented as sequences of bytes (or words) in the real world but in theory they are abstract objects.
We don't care how a JavaScript engine represents [1, 2, 3] when we serialize it into JSON. We do care whether a JavaScript engine receives 89 50 4e 47 0d 0a 1a 0a when we encode it using Base64 and send it to a browser.

@omasanori
Copy link
Contributor

The name encoding::Text is OK but I feel it's not the best name we could use.
If I'm new to Rust and install a new version of Rust including extra::encoding as in current draft, I would think extra::encoding::Text handles character encodings such as UTF-8. (it's not a joke; Haskell's Data.Text.Encoding does.)

@alexcrichton
Copy link
Member Author

Out of a lack of impetus and enthusiasm for this, I'm going to close this. We can revisit this if it becomes a problem later.

flip1995 pushed a commit to flip1995/rust that referenced this issue Jan 27, 2022
Remove unnecessary `Symbol` stringification

This should (slightly) improve performance and enhance code quality.

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants