Skip to content

Commit dacb93f

Browse files
committed
Remove irrelevant footer
1 parent 8c881df commit dacb93f

File tree

7 files changed

+138
-147
lines changed

7 files changed

+138
-147
lines changed

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

SPEC.md

Lines changed: 18 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -748,45 +748,35 @@ The FST maps full paths (using `\x1F` unit separator between components, as desc
748748

749749
```
750750
+------------------+ Offset = trailer_end
751-
| Header | 16 bytes
751+
| Header | 24 bytes
752752
+------------------+
753753
| Node Index | node_count × 8 bytes
754754
+------------------+
755755
| Hot Section | Variable size (compact node headers)
756756
+------------------+
757757
| Cold Section | Variable size (edge data)
758-
+------------------+
759-
| Footer | 16 bytes
760758
+------------------+ EOF
761759
```
762760

763761
### 14.3 Header
764762

765-
The FST Header is 16 bytes and located at the start of the FST section.
763+
The FST Header is 24 bytes and located at the start of the FST section.
766764

767765
| Offset | Size | Field | Description |
768766
|--------|------|-------|-------------|
769767
| 0x00 | 4 | `magic` | Magic bytes: `BFST` (0x42 0x46 0x53 0x54) |
770768
| 0x04 | 1 | `version` | Format version (currently 1) |
771-
| 0x05 | 3 | `reserved` | Reserved bytes (must be 0) |
772-
| 0x08 | 8 | `entry_count` | Number of paths indexed (`u64`) |
773-
774-
Implementations MUST reject FST data where magic bytes do not match or version is unsupported.
775-
776-
### 14.4 Footer
769+
| 0x05 | 1 | `flags` | Reserved flags (must be 0) |
770+
| 0x06 | 2 | `reserved` | Reserved bytes (must be 0) |
771+
| 0x08 | 4 | `node_count` | Total number of nodes (`u32`) |
772+
| 0x0C | 8 | `entry_count` | Number of paths indexed (`u64`) |
773+
| 0x14 | 4 | `cold_offset` | Cold section start offset (`u32`) |
777774

778-
The FST Footer is 16 bytes and located at the end of the FST section (last 16 bytes of the file).
775+
The Hot Section starts at offset `24 + node_count × 8` (immediately after the Node Index). The root node is always node 0.
779776

780-
| Offset | Size | Field | Description |
781-
|--------|------|-------|-------------|
782-
| 0x00 | 4 | `root_id` | Root node ID (`u32`, always 0) |
783-
| 0x04 | 4 | `node_count` | Total number of nodes (`u32`) |
784-
| 0x08 | 4 | `hot_offset` | Hot section start offset (`u32`) |
785-
| 0x0C | 4 | `cold_offset` | Cold section start offset (`u32`) |
786-
787-
The `hot_offset` and `cold_offset` are absolute byte offsets from the start of the FST section.
777+
Implementations MUST reject FST data where magic bytes do not match or version is unsupported.
788778

789-
### 14.5 Node Index
779+
### 14.4 Node Index
790780

791781
The Node Index is an array of `node_count` entries, with each entry being 8 bytes.
792782

@@ -799,7 +789,7 @@ The Node Index is an array of `node_count` entries, with each entry being 8 byte
799789

800790
Offsets are relative to the start of their respective sections. Node IDs are indices into this array (node 0 = first entry, node 1 = second entry, etc.).
801791

802-
### 14.6 Hot Section
792+
### 14.5 Hot Section
803793

804794
The Hot Section contains compact node headers optimized for cache efficiency. Each node's hot data has the following structure:
805795

@@ -830,7 +820,7 @@ The lookup data format depends on the `INDEXED` flag:
830820

831821
Array of `edge_count` entries, each a `u16`. Each offset points to the corresponding edge's data within this node's cold section data block.
832822

833-
### 14.7 Cold Section
823+
### 14.6 Cold Section
834824

835825
The Cold Section contains edge data and final output values. Each node's cold data has the following structure:
836826

@@ -858,7 +848,7 @@ If IS_FINAL flag is set:
858848

859849
If the `IS_FINAL` flag is set, the final output value is stored at the end of the node's cold data block. This value is added to the accumulated output when the traversal terminates at this node.
860850

861-
### 14.8 Path Encoding in FST
851+
### 14.7 Path Encoding in FST
862852

863853
Paths stored in the FST use the same encoding as BoxPath (Section 9):
864854

@@ -872,7 +862,7 @@ Paths stored in the FST use the same encoding as BoxPath (Section 9):
872862
- Duplicate keys are not permitted
873863
- Values are `u64` record indices (as returned by `RecordIndex.get()`)
874864

875-
### 14.9 Output Value Accumulation
865+
### 14.8 Output Value Accumulation
876866

877867
The FST stores output values along edges and at final nodes. The final lookup value is computed by:
878868

@@ -882,7 +872,7 @@ The FST stores output values along edges and at final nodes. The final lookup va
882872

883873
The result is the `RecordIndex` value for the path.
884874

885-
### 14.10 Implementation Notes
875+
### 14.9 Implementation Notes
886876

887877
**Adaptive Node Format:**
888878

@@ -903,13 +893,12 @@ Implementations MAY use SIMD instructions for compact node lookups:
903893
- x86_64: SSE2 `_mm_cmpeq_epi8` for 16-byte parallel comparison
904894
- aarch64: NEON `vceqq_u8` for similar parallel comparison
905895

906-
### 14.11 Constants
896+
### 14.10 Constants
907897

908898
| Constant | Value | Description |
909899
|----------|-------|-------------|
910900
| Magic | `BFST` | FST section identifier |
911-
| Version | 3 | Current format version |
912-
| Header size | 16 bytes | Fixed header size |
913-
| Footer size | 16 bytes | Fixed footer size |
901+
| Version | 1 | Current format version |
902+
| Header size | 24 bytes | Fixed header size |
914903
| Index entry size | 8 bytes | Per-node index entry |
915904
| Indexed threshold | 17 | Minimum edges for indexed format |

crates/fst/src/builder.rs

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
use crate::error::BuildError;
22
use crate::node::{
3-
EdgeData, FOOTER_SIZE, Footer, HEADER_SIZE, INDEX_ENTRY_SIZE, INDEXED_THRESHOLD, NodeData,
4-
NodeIndex, write_footer, write_header, write_node_cold, write_node_hot,
3+
EdgeData, HEADER_SIZE, Header, INDEX_ENTRY_SIZE, INDEXED_THRESHOLD, NodeData, NodeIndex,
4+
write_header, write_node_cold, write_node_hot,
55
};
66
use fastvlq::encode_vu64;
77

@@ -74,14 +74,13 @@ impl FstBuilder {
7474

7575
// Step 1: Assign node IDs and collect NodeData (depth-first)
7676
let mut nodes: Vec<NodeData> = Vec::new();
77-
let root_id = self.collect_nodes(&self.root, &mut nodes);
77+
let _root_id = self.collect_nodes(&self.root, &mut nodes);
7878

7979
let node_count = nodes.len() as u32;
8080

8181
// Step 2: Calculate layout
82-
let index_start = HEADER_SIZE;
8382
let index_size = node_count as usize * INDEX_ENTRY_SIZE;
84-
let hot_start = index_start + index_size;
83+
let hot_start = HEADER_SIZE + index_size;
8584

8685
// Step 3: Write hot section, tracking offsets
8786
let mut hot_buf = Vec::new();
@@ -119,11 +118,16 @@ impl FstBuilder {
119118
}
120119

121120
// Step 5: Assemble final buffer
122-
let total_size = HEADER_SIZE + index_size + hot_buf.len() + cold_buf.len() + FOOTER_SIZE;
121+
let total_size = HEADER_SIZE + index_size + hot_buf.len() + cold_buf.len();
123122
let mut buf = Vec::with_capacity(total_size);
124123

125-
// Header
126-
write_header(self.len, &mut buf);
124+
// Header (includes node_count and cold_offset)
125+
let header = Header {
126+
entry_count: self.len,
127+
node_count,
128+
cold_offset: cold_start as u32,
129+
};
130+
write_header(&header, &mut buf);
127131

128132
// Index
129133
for i in 0..nodes.len() {
@@ -140,15 +144,6 @@ impl FstBuilder {
140144
// Cold section
141145
buf.extend_from_slice(&cold_buf);
142146

143-
// Footer
144-
let footer = Footer {
145-
root_node_id: root_id,
146-
node_count,
147-
hot_start: hot_start as u32,
148-
cold_start: cold_start as u32,
149-
};
150-
write_footer(&footer, &mut buf);
151-
152147
Ok(buf)
153148
}
154149
}

crates/fst/src/fst.rs

Lines changed: 25 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,5 @@
11
use crate::error::FstError;
2-
use crate::node::{
3-
FOOTER_SIZE, Footer, HEADER_SIZE, INDEX_ENTRY_SIZE, NodeIndex, NodeRef, read_footer,
4-
read_header,
5-
};
2+
use crate::node::{HEADER_SIZE, Header, INDEX_ENTRY_SIZE, NodeIndex, NodeRef, read_header};
63

74
/// Prefetch memory for read.
85
#[inline(always)]
@@ -105,23 +102,21 @@ struct EachState {
105102
/// Generic over `D: AsRef<[u8]>` to support zero-copy from memory-mapped data.
106103
pub struct Fst<D> {
107104
data: D,
108-
footer: Footer,
109-
len: u64,
105+
header: Header,
110106
}
111107

112108
impl<D: AsRef<[u8]>> Fst<D> {
113109
/// Open an FST from bytes.
114110
pub fn new(data: D) -> Result<Self, FstError> {
115111
let bytes = data.as_ref();
116112

117-
if bytes.len() < HEADER_SIZE + FOOTER_SIZE {
113+
if bytes.len() < HEADER_SIZE {
118114
return Err(FstError::TooShort);
119115
}
120116

121-
let len = read_header(bytes)?;
122-
let footer = read_footer(bytes).ok_or(FstError::TooShort)?;
117+
let header = read_header(bytes)?;
123118

124-
Ok(Self { data, footer, len })
119+
Ok(Self { data, header })
125120
}
126121

127122
/// Get node index entry by node ID.
@@ -138,8 +133,8 @@ impl<D: AsRef<[u8]>> Fst<D> {
138133
let bytes = self.data.as_ref();
139134
let idx = self.get_node_index(node_id);
140135

141-
let hot_start = self.footer.hot_start as usize + idx.hot_offset as usize;
142-
let cold_start = self.footer.cold_start as usize + idx.cold_offset as usize;
136+
let hot_start = self.header.hot_offset() + idx.hot_offset as usize;
137+
let cold_start = self.header.cold_offset as usize + idx.cold_offset as usize;
143138

144139
let hot_data = &bytes[hot_start..];
145140
let cold_data = &bytes[cold_start..];
@@ -150,7 +145,7 @@ impl<D: AsRef<[u8]>> Fst<D> {
150145
/// Get the value for an exact key match.
151146
pub fn get(&self, key: &[u8]) -> Option<u64> {
152147
let bytes = self.data.as_ref();
153-
let mut current_node_id = self.footer.root_node_id;
148+
let mut current_node_id = 0u32; // Root is always node 0
154149
let mut remaining = key;
155150
let mut output_sum: u64 = 0;
156151

@@ -165,7 +160,7 @@ impl<D: AsRef<[u8]>> Fst<D> {
165160
prefetch_read(
166161
bytes
167162
.as_ptr()
168-
.wrapping_add(self.footer.hot_start as usize + next_idx.hot_offset as usize),
163+
.wrapping_add(self.header.hot_offset() + next_idx.hot_offset as usize),
169164
);
170165

171166
// Check if edge label matches
@@ -198,12 +193,12 @@ impl<D: AsRef<[u8]>> Fst<D> {
198193

199194
/// Number of entries in the FST.
200195
pub fn len(&self) -> u64 {
201-
self.len
196+
self.header.entry_count
202197
}
203198

204199
/// Check if the FST is empty.
205200
pub fn is_empty(&self) -> bool {
206-
self.len == 0
201+
self.header.entry_count == 0
207202
}
208203

209204
/// Iterate all entries with a given prefix.
@@ -269,9 +264,9 @@ impl<D: AsRef<[u8]>> Fst<D> {
269264
// Prefetch child node
270265
let child_idx = self.get_node_index(child_node_id);
271266
prefetch_read(
272-
bytes.as_ptr().wrapping_add(
273-
self.footer.hot_start as usize + child_idx.hot_offset as usize,
274-
),
267+
bytes
268+
.as_ptr()
269+
.wrapping_add(self.header.hot_offset() + child_idx.hot_offset as usize),
275270
);
276271

277272
key_buffer.extend_from_slice(edge.label);
@@ -296,7 +291,7 @@ impl<D: AsRef<[u8]>> Fst<D> {
296291
/// Navigate to prefix node.
297292
fn navigate_to_prefix(&self, prefix: &[u8], key_buffer: &mut Vec<u8>) -> Option<(u32, u64)> {
298293
let bytes = self.data.as_ref();
299-
let mut current_node_id = self.footer.root_node_id;
294+
let mut current_node_id = 0u32; // Root is always node 0
300295
let mut remaining = prefix;
301296
let mut output_sum: u64 = 0;
302297

@@ -324,7 +319,7 @@ impl<D: AsRef<[u8]>> Fst<D> {
324319
prefetch_read(
325320
bytes
326321
.as_ptr()
327-
.wrapping_add(self.footer.hot_start as usize + next_idx.hot_offset as usize),
322+
.wrapping_add(self.header.hot_offset() + next_idx.hot_offset as usize),
328323
);
329324

330325
if match_len >= remaining.len() {
@@ -387,7 +382,7 @@ impl<'a, D: AsRef<[u8]>> PrefixIter<'a, D> {
387382

388383
fn navigate_to_prefix(&mut self, prefix: &[u8]) -> Option<(u32, u64)> {
389384
let bytes = self.fst.data.as_ref();
390-
let mut current_node_id = self.fst.footer.root_node_id;
385+
let mut current_node_id = 0u32; // Root is always node 0
391386
let mut remaining = prefix;
392387
let mut output_sum: u64 = 0;
393388

@@ -413,9 +408,9 @@ impl<'a, D: AsRef<[u8]>> PrefixIter<'a, D> {
413408
// Prefetch next node
414409
let next_idx = self.fst.get_node_index(edge.target_node_id);
415410
prefetch_read(
416-
bytes.as_ptr().wrapping_add(
417-
self.fst.footer.hot_start as usize + next_idx.hot_offset as usize,
418-
),
411+
bytes
412+
.as_ptr()
413+
.wrapping_add(self.fst.header.hot_offset() + next_idx.hot_offset as usize),
419414
);
420415

421416
if match_len >= remaining.len() {
@@ -467,9 +462,11 @@ impl<D: AsRef<[u8]>> Iterator for PrefixIter<'_, D> {
467462

468463
// Prefetch child node
469464
let child_idx = self.fst.get_node_index(child_node_id);
470-
prefetch_read(bytes.as_ptr().wrapping_add(
471-
self.fst.footer.hot_start as usize + child_idx.hot_offset as usize,
472-
));
465+
prefetch_read(
466+
bytes
467+
.as_ptr()
468+
.wrapping_add(self.fst.header.hot_offset() + child_idx.hot_offset as usize),
469+
);
473470

474471
self.key_buffer.extend_from_slice(edge.label);
475472
let child_key_len = self.key_buffer.len();

0 commit comments

Comments
 (0)