-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Describe the bug
Bug Report: vlang/msgpack — Minimal Integer Encoding & Fixint Decoding
Repository: https://github.com/vlang/msgpack
Affects: All versions as of commit 55d09a0
Severity: High — produces wire-format bytes incompatible with every other MessagePack implementation
Summary
The vlang/msgpack library has three related bugs that together break interoperability with any spec-compliant MessagePack implementation (Python, Rust, Go, etc.):
-
encode.v— Encoder always uses the widest integer type.encode(0)produces 5 bytes (d2 00 00 00 00) instead of 1 byte (00). This violates the MessagePack specification's requirement for minimal encoding. -
config.v—positive_int_unsigneddefaults tofalse. Non-negative integers in the range 128–255 are encoded as 3-byte signed int16 instead of the correct 2-byte unsigned uint8. -
decode.v— Decoder ignores positive and negative fixint bytes.decode_integeranddecode_to_jsonhave nomatcharm formp_pos_fix_int(0x00–0x7f) ormp_neg_fix_int(0xe0–0xff). Data encoded by any conforming library cannot be decoded by V's library.
The encoder bug was even acknowledged in the original source with a // TODO comment that was never completed:
// TODO: if int encode_int, if uint encode_uint
// instead of needing to check each type, also
// then we will be using the smallest storageBackground: The MessagePack Integer Format Family
The MessagePack specification defines the following integer formats, listed from smallest to largest:
| Format | Byte(s) | Range |
|---|---|---|
| positive fixint | 0xxxxxxx (1 byte, 0x00–0x7f) |
0 to 127 |
| negative fixint | 111xxxxx (1 byte, 0xe0–0xff) |
-32 to -1 |
| uint 8 | 0xcc + 1 byte |
0 to 255 |
| uint 16 | 0xcd + 2 bytes (big-endian) |
0 to 65535 |
| uint 32 | 0xce + 4 bytes (big-endian) |
0 to 2^32−1 |
| uint 64 | 0xcf + 8 bytes (big-endian) |
0 to 2^64−1 |
| int 8 | 0xd0 + 1 byte |
-128 to 127 |
| int 16 | 0xd1 + 2 bytes (big-endian) |
-32768 to 32767 |
| int 32 | 0xd2 + 4 bytes (big-endian) |
-2147483648 to 2147483647 |
| int 64 | 0xd3 + 8 bytes (big-endian) |
-2^63 to 2^63−1 |
The spec mandates minimal encoding: an encoder MUST use the smallest format that can represent the value. Using int 32 (0xd2) to encode 0 when positive fixint (0x00) is available is a spec violation.
How This Was Discovered
These bugs were discovered while building a trading bot in V that communicates with the Hyperliquid perpetuals exchange. Hyperliquid requires order actions to be serialized with MessagePack before being hashed for EIP-712 Ethereum signing:
order_bytes = msgpack.encode(action)
action_hash = keccak256(order_bytes + nonce_bytes + vault_byte)
signature = secp256k1_sign(eip712_hash(action_hash))
The exchange's server re-serializes the same action with a spec-compliant library and computes the same hash independently. If the bytes differ — even by a single byte in a single field — the hashes differ and the signature is invalid. Once the builder field (f: 0, a zero integer) was present in the serialized action, the mismatch became apparent: V encoded 0 as d2 00 00 00 00 (5 bytes) while the server expected 00 (1 byte), producing a completely different hash and an HTTP 422 rejection on every request.
The same failure will occur with any V program communicating with Python's msgpack, Rust's rmp/rmp-serde, Go's vmihailenco/msgpack, or any other spec-compliant implementation.
Reproduction Steps
Bug 1: Encoder — Fixed-Width Instead of Minimal Encoding
File: encode.v
Reproduction Steps
import msgpack
import encoding.hex
fn main() {
// Encode a plain integer
result := msgpack.encode(0)
println(result.hex()) // prints: d200000000
// should print: 00
result2 := msgpack.encode(42)
println(result2.hex()) // prints: d20000002a
// should print: 2a
// Encode a struct with an integer field
result3 := msgpack.encode(struct{ age int }{age: 30})
println(result3.hex()) // field value encoded as int32 (d2 00 00 00 1e)
// should be positive fixint (1e)
}Current Behaviour
The encode[T]() generic function dispatches on V's compile-time type and calls a fixed-width helper directly, completely bypassing the existing encode_int and encode_uint functions that already implement minimal encoding:
// Current (buggy) code in encode.v
$else $if T is i8 {
e.encode_i8(data) // always writes 0xd0 + 1 byte (2 bytes total), even for value 0
} $else $if T is i16 {
e.encode_i16(data) // always writes 0xd1 + 2 bytes (3 bytes total)
} $else $if T is int {
e.encode_i32(data) // always writes 0xd2 + 4 bytes (5 bytes total)
} $else $if T is i64 {
e.encode_i64(data) // always writes 0xd3 + 8 bytes (9 bytes total)
} $else $if T is u8 {
e.encode_u8(data) // always writes 0xcc + 1 byte (2 bytes total)
} $else $if T is u16 {
e.encode_u16(data) // always writes 0xcd + 2 bytes (3 bytes total)
} $else $if T is u32 {
e.encode_u32(data) // always writes 0xce + 4 bytes (5 bytes total)
} $else $if T is u64 {
e.encode_u64(data) // always writes 0xcf + 8 bytes (9 bytes total)
}Actual byte output:
encode(0) → d2 00 00 00 00 (5 bytes: int32 format)
encode(42) → d2 00 00 00 2a (5 bytes: int32 format)
encode(-123) → d2 ff ff ff 85 (5 bytes: int32 format)
encode(1) → d2 00 00 00 01 (5 bytes: int32 format)
Note also that i32 has no case at all — it silently falls through to an unhandled branch.
Expected Behaviour
Per the MessagePack specification, the encoder MUST select the smallest format that can represent the value:
encode(0) → 00 (1 byte: positive fixint)
encode(42) → 2a (1 byte: positive fixint)
encode(-123) → d0 85 (2 bytes: int8 format)
encode(1) → 01 (1 byte: positive fixint)
The encode_int and encode_uint functions in the same file already implement this logic correctly — they just are not being called.
Possible Solution
Route all integer types through encode_int or encode_uint, which already select the minimal format, and add the missing i32 case:
$else $if T is i8 {
e.encode_int(i64(data))
} $else $if T is i16 {
e.encode_int(i64(data))
} $else $if T is int {
e.encode_int(i64(data))
} $else $if T is i32 {
e.encode_int(i64(data)) // previously missing case
} $else $if T is i64 {
e.encode_int(data)
} $else $if T is u8 {
e.encode_uint(u64(data))
} $else $if T is u16 {
e.encode_uint(u64(data))
} $else $if T is u32 {
e.encode_uint(u64(data))
} $else $if T is u64 {
e.encode_uint(data)
}The // TODO comment acknowledging this fix should be removed at the same time.
Bug 2: Config — positive_int_unsigned Defaults to false
File: config.v
Reproduction Steps
import msgpack
fn main() {
// Value in the range 128-255 (fits in u8, not in i8)
result := msgpack.encode(200)
println(result.hex()) // prints: d100c8 (3 bytes: signed int16)
// should print: ccc8 (2 bytes: unsigned uint8)
}Current Behaviour
The default_config() function returns positive_int_unsigned: false. The encode_int function has a branch that checks this flag:
pub fn (mut e Encoder) encode_int(i i64) {
if e.config.positive_int_unsigned && i >= 0 {
e.encode_uint(u64(i)) // compact unsigned path — NOT TAKEN by default
} else if i > max_i8 { // 127
if i <= max_i16 { // 32767
e.encode_i16(i16(i)) // 200 ends up here: 3 bytes as signed int16
}
...
}
}With the flag false, a non-negative value of 200 is larger than max_i8 (127), so it takes the signed path and is encoded as a 3-byte signed int16 (d1 00 c8) even though it fits in a 2-byte unsigned uint8 (cc c8). Values 0–127 happen to encode correctly by coincidence (they fall into the negative fixint check i >= -32 and write a single byte), masking the bug for the most common values.
Expected Behaviour
Any non-negative integer should be encoded using the unsigned format family, which is always more compact than or equal in size to the signed family for non-negative values. The value 200 fits in uint 8 and should produce 2 bytes: cc c8.
Possible Solution
Enable positive_int_unsigned in the default config:
pub fn default_config() Config {
return Config{
write_ext: true
positive_int_unsigned: true // non-negative integers always use unsigned format
}
}Bug 3: Decoder — Fixint Bytes Not Handled
File: decode.v
Reproduction Steps
import msgpack
fn main() {
// Encode with any compliant library, or just construct the bytes manually.
// Positive fixint for value 42 is a single byte: 0x2a
data := [u8(0x2a)]
mut val := 0
mut decoder := msgpack.new_decoder()
decoder.decode(data, mut val) or { println('error: ${err}') }
// Prints: error (or panics) — should print 42
// Same problem for negative fixint: -5 is 0xfb
data2 := [u8(0xfb)]
mut val2 := 0
decoder.decode(data2, mut val2) or { println('error: ${err}') }
// Prints: error — should print -5
}This also means that after Bug 1 is fixed, V cannot even decode its own output: msgpack.encode(0) produces [0x00], and decode_integer has no handler for 0x00.
Expected Behavior
Expected Behaviour
For a positive fixint byte (0x00–0x7f), the format byte itself IS the integer value — no additional bytes follow. For a negative fixint byte (0xe0–0xff), the format byte reinterpreted as a signed i8 IS the value. Both ranges must be handled.
decode([0x00]) → 0
decode([0x2a]) → 42
decode([0x7f]) → 127
decode([0xe0]) → -32
decode([0xfb]) → -5
decode([0xff]) → -1
Current Behavior
Current Behaviour
Both decode_integer and decode_to_json use a match on the format byte d.bd. Neither has arms for the positive fixint range (0x00–0x7f) or the negative fixint range (0xe0–0xff):
// Current (buggy) decode_integer
pub fn (mut d Decoder) decode_integer[T](mut val T) ! {
data := d.buffer
match d.bd {
// 0x00–0x7f: positive fixint — NO HANDLER, falls to else/error
// 0xe0–0xff: negative fixint — NO HANDLER, falls to else/error
mp_u8 { val = data[d.pos]; d.pos++ }
mp_u16 { ... }
// ...
}
}
// Current (buggy) decode_to_json
match d.bd {
// 0x00–0x7f: positive fixint — NO HANDLER
// 0xe0–0xff: negative fixint — NO HANDLER
mp_u8, mp_u16, mp_u32, mp_u64, mp_i8, mp_i16, mp_i32, mp_i64 {
...
}
}Any integer value in the range -32 to 127, when encoded by a compliant library (including the fixed V encoder), produces a single-byte fixint. That byte is completely unrecognised by V's decoder.
Possible Solution
Possible Solution
Add match arms for both fixint ranges in both decode_integer and decode_to_json:
// Fixed decode_integer
pub fn (mut d Decoder) decode_integer[T](mut val T) ! {
data := d.buffer
match d.bd {
mp_pos_fix_int_min...mp_pos_fix_int_max {
val = d.bd // format byte IS the value (0–127), no extra bytes
}
mp_neg_fix_int_min...mp_neg_fix_int_max {
val = i8(d.bd) // format byte reinterpreted as signed (-32 to -1)
}
mp_u8 { val = data[d.pos]; d.pos++ }
// ... rest unchanged
}
}
// Fixed decode_to_json
match d.bd {
mp_pos_fix_int_min...mp_pos_fix_int_max {
int_val := int(d.bd)
unsafe { result.push_many(int_val.str().str, int_val.str().len) }
}
mp_neg_fix_int_min...mp_neg_fix_int_max {
int_val := int(i8(d.bd))
unsafe { result.push_many(int_val.str().str, int_val.str().len) }
}
mp_u8, mp_u16, mp_u32, mp_u64, mp_i8, mp_i16, mp_i32, mp_i64 {
// ... unchanged
}
}Test Expectations Must Be Updated
File: encode_test.v
The existing test assertions were written to match the buggy output. They must be corrected:
// BEFORE — asserting the buggy, non-spec-compliant output
assert msgpack.encode(0) == hex.decode('d200000000')! // int32: 5 bytes
assert msgpack.encode(42) == hex.decode('d20000002a')! // int32: 5 bytes
assert msgpack.encode(-123) == hex.decode('d2ffffff85')! // int32: 5 bytes
assert msgpack.encode([0]) == hex.decode('91d200000000')!
assert msgpack.encode([1, 2, 3]) == hex.decode('93d200000001d200000002d200000003')!
assert msgpack.encode(Struct{'John', 30}) == hex.decode('82a161a44a6f686ea162d20000001e')!
// AFTER — asserting correct, spec-compliant minimal encoding
assert msgpack.encode(0) == hex.decode('00')! // positive fixint: 1 byte
assert msgpack.encode(42) == hex.decode('2a')! // positive fixint: 1 byte
assert msgpack.encode(-123) == hex.decode('d085')! // int8: 2 bytes
assert msgpack.encode([0]) == hex.decode('9100')! // array + fixint
assert msgpack.encode([1, 2, 3]) == hex.decode('93010203')! // 3× fixint
assert msgpack.encode(Struct{'John', 30}) == hex.decode('82a161a44a6f686ea1621e')!Complete Diff
config.v
pub fn default_config() Config {
return Config{
- write_ext: true
+ write_ext: true
+ positive_int_unsigned: true
}
}encode.v
- // TODO: if int encode_int, if uint encode_uint
- // instead of needing to check each type, also
- // then we will be using the smallest storage
$else $if T is i8 {
- e.encode_i8(data)
+ e.encode_int(i64(data))
} $else $if T is i16 {
- e.encode_i16(data)
+ e.encode_int(i64(data))
} $else $if T is int {
- e.encode_i32(data)
+ e.encode_int(i64(data))
+ } $else $if T is i32 {
+ e.encode_int(i64(data))
} $else $if T is i64 {
- e.encode_i64(data)
+ e.encode_int(data)
} $else $if T is u8 {
- e.encode_u8(data)
+ e.encode_uint(u64(data))
} $else $if T is u16 {
- e.encode_u16(data)
+ e.encode_uint(u64(data))
} $else $if T is u32 {
- e.encode_u32(data)
+ e.encode_uint(u64(data))
} $else $if T is u64 {
- e.encode_u64(data)
+ e.encode_uint(data)
}decode.v
+ mp_pos_fix_int_min...mp_pos_fix_int_max {
+ int_val := int(d.bd)
+ unsafe { result.push_many(int_val.str().str, int_val.str().len) }
+ }
+ mp_neg_fix_int_min...mp_neg_fix_int_max {
+ int_val := int(i8(d.bd))
+ unsafe { result.push_many(int_val.str().str, int_val.str().len) }
+ }
mp_u8, mp_u16, mp_u32, mp_u64, mp_i8, mp_i16, mp_i32, mp_i64 { pub fn (mut d Decoder) decode_integer[T](mut val T) ! {
data := d.buffer
match d.bd {
+ mp_pos_fix_int_min...mp_pos_fix_int_max {
+ val = d.bd
+ }
+ mp_neg_fix_int_min...mp_neg_fix_int_max {
+ val = i8(d.bd)
+ }
mp_u8 {encode_test.v
- assert msgpack.encode(0) == hex.decode('d200000000')!
- assert msgpack.encode(42) == hex.decode('d20000002a')!
- assert msgpack.encode(-123) == hex.decode('d2ffffff85')!
+ assert msgpack.encode(0) == hex.decode('00')!
+ assert msgpack.encode(42) == hex.decode('2a')!
+ assert msgpack.encode(-123) == hex.decode('d085')!
- assert msgpack.encode([0]) == hex.decode('91d200000000')!
+ assert msgpack.encode([0]) == hex.decode('9100')!
- assert msgpack.encode([1, 2, 3]) == hex.decode('93d200000001d200000002d200000003')! // REVIEW
+ assert msgpack.encode([1, 2, 3]) == hex.decode('93010203')!
- assert msgpack.encode(Struct{'John', 30}) == hex.decode('82a161a44a6f686ea162d20000001e')!
+ assert msgpack.encode(Struct{'John', 30}) == hex.decode('82a161a44a6f686ea1621e')!Verification
After applying all fixes, all three existing test suites pass with no modifications beyond the corrected assertions in encode_test.v:
---- Testing... ----------------------------------------------------------------
OK [1/3] decode_test.v
OK [2/3] encode_test.v
OK [3/3] decode_to_json_test.v
--------------------------------------------------------------------------------
Summary: 3 passed, 3 total.
Cross-Language Validation
The fixed V output matches Python's msgpack library byte-for-byte:
import msgpack
assert msgpack.packb(0) == b'\x00'
assert msgpack.packb(42) == b'\x2a'
assert msgpack.packb(-123) == b'\xd0\x85'
assert msgpack.packb([0]) == b'\x91\x00'
assert msgpack.packb([1,2,3]) == b'\x93\x01\x02\x03'References
- MessagePack specification: https://github.com/msgpack/msgpack/blob/master/spec.md#int-format-family
- Original
// TODOcomment inencode.v: commit55d09a0(and prior) - Python reference implementation: https://github.com/msgpack/msgpack-python
Additional Information/Context
No response
V version
0.5
Environment details (OS name and version, etc.)
CachyOS
Note
You can use the 👍 reaction to increase the issue's priority for developers.
Please note that only the 👍 reaction to the issue itself counts as a vote.
Other reactions and those to comments will not be taken into account.