Reorganize the various encodings in libextra #8310

alexcrichton · 2013-08-05T16:50:20Z

Currently (pending #8287), libextra will have the following encodings

json
ebml
base64
hex

And in theory we should add these in the future:

xml
base32
ASN.1

(or at least those are the ones I could think of off the top of my head). Right now they're all placed directly inside the libextra folder, but they would probably benefit from a better organization scheme.

Currently all of the "encodings" can generally fall into two categories. One is "bytes to strings and back" and the other is a "generic serialization format".

Right now I believe that the "generic serialization format" is captured nicely via the serialize::{Encoder, Decoder} traits, and it unifies json/ebml. @omasanori has brought up concerns about the naming scheme, and my proposal below might alleviate these concerns.

Otherwise, the "bytes to strings and back" type of encoding isn't very well unified at this point. There should be a trait (in possibly a new extra::encoding module) which unifies base64/base32/hex and whatever else we come across. In my opinion, the trait should look like:

trait Text {
  fn encode(&[u8]) -> ~str;
  fn decode(&str) -> ~[u8];
}

This makes it clear what the "encode" procedure is and what the "decode" procedure is. Additionally, it's explicitly clear that bytes live on one side, and strings live on the other, there's no intermingling between the two of them.

Finally, I think we should move everything around into:

extra::encoding::base64
extra::encoding::base32
extra::encoding::hex
extra::encoding::json
extra::encoding::ebml

And then encoding/mod.rs would define the traits

Text (is the name too short? if you reference it by encoding::Text I think it's clear but that's not always guaranteed)
Serializer (serialize::Encoder today)
Deserializer (serialize::Decoder today)

What do others think about this? If this looks good, I'd be more than happy to reorganize everything!

The text was updated successfully, but these errors were encountered:

sfackler · 2013-08-05T16:57:03Z

I'm probably missing something here, but how would the Text trait actually be used? If we implement it in extra::encoding::Base64, extra::encoding::Hex, etc, I would think that it would cause a duplicate trait implementation error, and even if it didn't, how would the user select the encoding they want? You could differentiate with a dummy type parameter, but I'm not sure why str.encode::<Base64Tag>(...) is any better than str.to_base64(...).

alexcrichton · 2013-08-05T17:14:09Z

That is a good point yeah, I suppose it would look something like:

// src/libextra/encoding/base64.rs
struct Base64;
impl encoding::Text for Base64 { ... }

// src/libextra/encoding/hex.rs
struct Hex;
impl encoding::Text for Hex { ... }

// src/libextra/encoding/mod.rs

impl SomeHelperTrait for &'self str {
  fn decode<E: Text>(self) -> ~[u8] {
    Text::decode(self)
  }
}

impl SomeHelperTrait for &'self [u8] {
  fn encode<E: Text>(self) -> ~str {
    Text::encode(self)
  }
}

Albeit that is a bit roundabout, and to use those methods you'd have to both use encoding::Base64 and encoding::SomeHelperTrait. Perhaps this is a situation where a trait defining things just adds a burden to everyone using it instead of making things easier. When actually thinking it through and seeing the overhead, I think I'd be in favor of just some to_base64/to_hex methods (as is the current situation).

Regardless though I think the reorganization would be nice (into an encoding super-module)

sfackler · 2013-08-05T17:21:45Z

Looking in src/libextra/extra.rs, there's a couple of other super-modules we could create: concurrency, collections and crypto, though crypto is already in it's own module but reexported in the base extra module.

I think there will need to be some changes made to rustdoc before this is a viable proposal. Submodules are currently only shown at the bottom of the table of contents of their super model (e.g. str::raw). For searchability's sake, I think we'd want a heirarchical display more like Go's package documentation. Rustdoc-ng may fix this already, though.

omasanori · 2013-08-05T22:16:17Z

I doubt whether we should unify "bytes to strings and back" ones and "generic serialization format" ones into one modules, say extra::encoding.

(Summary: Base64 handles octets but JSON does numbers, strings, arrays, etc.)

On the one hand, both share the same essential concept and use case: transforming data and they are used for e.g. saving data in a file, sending data through the Internet, etc.
In fact, ASN.1 defines its serialization formats as encoding rules. (a half of ASN.1 defines an abstract syntax of various data and the other defines how we represent data using bits. I refer the latter parts here.)

On the other hand, "bytes to strings and back" ones assume that there is already a representation of data and transform it but "generic serialization format" ones handle objects in the environment where a program is running: of course they may be represented as sequences of bytes (or words) in the real world but in theory they are abstract objects.
We don't care how a JavaScript engine represents [1, 2, 3] when we serialize it into JSON. We do care whether a JavaScript engine receives 89 50 4e 47 0d 0a 1a 0a when we encode it using Base64 and send it to a browser.

omasanori · 2013-08-05T22:25:36Z

The name encoding::Text is OK but I feel it's not the best name we could use.
If I'm new to Rust and install a new version of Rust including extra::encoding as in current draft, I would think extra::encoding::Text handles character encodings such as UTF-8. (it's not a joke; Haskell's Data.Text.Encoding does.)

alexcrichton · 2013-08-07T07:20:09Z

Out of a lack of impetus and enthusiasm for this, I'm going to close this. We can revisit this if it becomes a problem later.

Remove unnecessary `Symbol` stringification This should (slightly) improve performance and enhance code quality. changelog: none

alexcrichton mentioned this issue Aug 5, 2013

Added hexadecimal encoding module #8287

Closed

alexcrichton closed this as completed Aug 7, 2013

sfackler mentioned this issue Sep 15, 2013

RFC: make the stdlib more hierarchical #9208

Closed

flip1995 pushed a commit to flip1995/rust that referenced this issue Jan 27, 2022

Auto merge of rust-lang#8310 - camelid:rm-stringification, r=Manishearth

16032c5

Remove unnecessary `Symbol` stringification This should (slightly) improve performance and enhance code quality. changelog: none

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorganize the various encodings in libextra #8310

Reorganize the various encodings in libextra #8310

alexcrichton commented Aug 5, 2013

sfackler commented Aug 5, 2013

alexcrichton commented Aug 5, 2013

sfackler commented Aug 5, 2013

omasanori commented Aug 5, 2013

omasanori commented Aug 5, 2013

alexcrichton commented Aug 7, 2013

Reorganize the various encodings in libextra #8310

Reorganize the various encodings in libextra #8310

Comments

alexcrichton commented Aug 5, 2013

sfackler commented Aug 5, 2013

alexcrichton commented Aug 5, 2013

sfackler commented Aug 5, 2013

omasanori commented Aug 5, 2013

omasanori commented Aug 5, 2013

alexcrichton commented Aug 7, 2013