Struct bytekey::Encoder [−] [src]

pub struct Encoder<W> where W: Write {
    // some fields omitted
}

An encoder for serializing data to a byte format that preserves lexicographic sort order.

The byte format is designed with a few goals:

Supported Data Types

Unsigned Integers

u8, u16, u32, and u64 are encoded into 1, 2, 4, and 8 bytes of output, respectively. Order is preserved by encoding the bytes in big-endian (most-significant bytes first) format.

usize is variable-length encoded into between 1 and 9 bytes. Smaller magnitude values (closer to 0) will encode into fewer bytes. See emit_var_u64 for details on serialization size and format.

Signed Integers

i8, i16, i32, and i64 are encoded into 1, 2, 4, and 8 bytes of output, respectively. Order is preserved by taking the bitwise complement of the value, and encoding the resulting bytes in big-endian format.

isize is variable-length encoded into between 1 and 9 bytes. Smaller magnitude values (closer to 0) will encode into fewer bytes. See emit_var_i64 for details on serialization size and format.

Floating Point Numbers

f32 and f64 are encoded into 4 and 8 bytes of output, respectively. Order is preserved by encoding the value, or the bitwise complement of the value if negative, into bytes in big-endian format. NAN values will sort after all other values. In general, it is unwise to use IEEE 754 floating point values in keys, because rounding errors are pervasive. It is typically hard or impossible to use an approximate 'epsilon' approach when using keys for lookup.

Characters

Characters are serialized into between 1 and 4 bytes of output.

Booleans

Booleans are serialized into a single byte of output. false values will sort before true values.

Strings

Strings are encoded into their natural UTF8 representation plus a single null byte suffix. In general, strings should not contain null bytes. The encoder will not check for null bytes, however their presence will break lexicographic sorting. The only exception to this rule is the case where the string is the final (or only) component of the key. If the string field is the final component of a tuple, enum-struct, or struct, then it may contain null bytes without breaking sort order.

Options

An optional wrapper type adds a 1 byte overhead to the wrapped data type. None values will sort before Some values.

Structs & Tuples

Structs and tuples are encoded by serializing their consituent fields in order with no prefix, suffix, or padding bytes.

Enums

Enums are encoded with a variable-length unsigned-integer variant tag, plus the consituent fields in the case of an enum-struct. The tag adds an overhead of between 1 and 9 bytes (it will be a single byte for up to 16 variants). This encoding allows more enum variants to be added in a backwards-compatible manner, as long as variants are not removed and the variant order does not change.

Unsupported Data Types

Sequences and maps are unsupported at this time. Sequences and maps could probably be implemented with a single byte overhead per item, key, and value, but these types are not typically useful in keys.

Raw byte arrays are unsupported. The Rust Encoder/Decoder mechanism makes no distinction between byte arrays and sequences, and thus the overhead for encoding a raw byte array would be 1 byte per input byte. The theoretical best-case overhead for serializing a raw (null containing) byte array in order-preserving format is 1 bit per byte, or 9 bytes of output for every 8 bytes of input.

Methods

impl<W> Encoder<W> where W: Write

fn new(writer: W) -> Encoder<W>

Creates a new ordered bytes encoder whose output will be written to the provided writer.

fn emit_var_u64(&mut self, val: u64) -> Result<()>

Encode a u64 into a variable number of bytes.

The variable-length encoding scheme uses between 1 and 9 bytes depending on the value. Smaller magnitude (closer to 0) u64s will encode to fewer bytes.

Encoding

The encoding uses the first 4 bits to store the number of trailing bytes, between 0 and 8. Subsequent bits are the input value in big-endian format with leading 0 bytes removed.

Encoded Size
range size (bytes)
[0, 24) 1
[24, 212) 2
[212, 220) 3
[220, 228) 4
[228, 236) 5
[236, 244) 6
[244, 252) 7
[252, 260) 8
[260, 264) 9

fn emit_var_i64(&mut self, v: i64) -> Result<()>

Encode an i64 into a variable number of bytes.

The variable-length encoding scheme uses between 1 and 9 bytes depending on the value. Smaller magnitude (closer to 0) i64s will encode to fewer bytes.

Encoding

The encoding uses the first bit to encode the sign: 0 for negative values and 1 for positive values. The following 4 bits store the number of trailing bytes, between 0 and 8. Subsequent bits are the absolute value of the input value in big-endian format with leading 0 bytes removed. If the original value was negative, than 1 is subtracted from the absolute value before encoding. Finally, if the value is negative, all bits except the sign bit are flipped (1s become 0s and 0s become 1s).

Encoded Size
negative range positive range size (bytes)
[-23, 0) [0, 23) 1
[-211, -23) [23, 211) 2
[-219, -211) [211, 219) 3
[-227, -219) [219, 227) 4
[-235, -227) [227, 235) 5
[-243, -235) [235, 243) 6
[-251, -243) [243, 251) 7
[-259, -251) [251, 259) 8
[-263, -259) [259, 263) 9

Trait Implementations

impl<W> Encoder for Encoder<W> where W: Write

type Error = Error

fn emit_nil(&mut self) -> Result<()>

fn emit_u8(&mut self, v: u8) -> Result<()>

fn emit_u16(&mut self, v: u16) -> Result<()>

fn emit_u32(&mut self, v: u32) -> Result<()>

fn emit_u64(&mut self, v: u64) -> Result<()>

fn emit_usize(&mut self, v: usize) -> Result<()>

fn emit_i8(&mut self, v: i8) -> Result<()>

fn emit_i16(&mut self, v: i16) -> Result<()>

fn emit_i32(&mut self, v: i32) -> Result<()>

fn emit_i64(&mut self, v: i64) -> Result<()>

fn emit_isize(&mut self, v: isize) -> Result<()>

fn emit_bool(&mut self, v: bool) -> Result<()>

fn emit_f32(&mut self, v: f32) -> Result<()>

fn emit_f64(&mut self, v: f64) -> Result<()>

fn emit_char(&mut self, v: char) -> Result<()>

fn emit_str(&mut self, v: &str) -> Result<()>

fn emit_enum<F>(&mut self, _name: &str, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_enum_variant<F>(&mut self, _name: &str, id: usize, _len: usize, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_enum_variant_arg<F>(&mut self, _idx: usize, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_enum_struct_variant<F>(&mut self, _name: &str, id: usize, _len: usize, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_enum_struct_variant_field<F>(&mut self, _name: &str, _idx: usize, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_struct<F>(&mut self, _name: &str, _len: usize, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_struct_field<F>(&mut self, _name: &str, _idx: usize, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_tuple<F>(&mut self, _len: usize, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_tuple_arg<F>(&mut self, _idx: usize, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_tuple_struct<F>(&mut self, name: &str, len: usize, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_tuple_struct_arg<F>(&mut self, idx: usize, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_option<F>(&mut self, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_option_none(&mut self) -> Result<()>

fn emit_option_some<F>(&mut self, f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_seq<F>(&mut self, _len: usize, _f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_seq_elt<F>(&mut self, _idx: usize, _f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_map<F>(&mut self, _len: usize, _f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>

fn emit_map_elt_key<F>(&mut self, _idx: usize, _f: F) -> Result<()>

fn emit_map_elt_val<F>(&mut self, _idx: usize, _f: F) -> Result<()> where F: FnOnce(&mut Self) -> Result<()>