Skip to content

Conversation

@MilkBlock
Copy link
Contributor

opt DashMap<Value,IndexSet> by allowing lazily insert when read

opt DashMap<Value,IndexSet<Value>> by allowing lazily insert when read
@MilkBlock MilkBlock requested a review from a team as a code owner October 14, 2025 01:53
@MilkBlock MilkBlock requested review from oflatt and removed request for a team October 14, 2025 01:53
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 14, 2025

CodSpeed Performance Report

Merging #708 will improve performances by ×2.9

Comparing MilkBlock:im_main (8d6167d) with main (5678c6c)

Summary

⚡ 2 improvements
✅ 18 untouched
⏩ 190 skipped1

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
WallTime tests[repro-665-set-union] 1,031.8 ms 358.1 ms ×2.9
Simulation tests[repro-665-set-union] 781.7 ms 541 ms +44.49%

Footnotes

  1. 190 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@MilkBlock
Copy link
Contributor Author

@saulshanabrook

@saulshanabrook
Copy link
Member

Thanks for this change! Looks like it does improve performance significantly. @ezrosent said he can take a look, he is more familiar with this code.

@MilkBlock
Copy link
Contributor Author

MilkBlock commented Oct 14, 2025

I don't quite understand what happended to [test]map. I simply refuse lazy insertion when the size of keys are too small.

https://codspeed.io/egraphs-good/egglog/runs/compare/68edafb692213580f5f86323..68edc1970851b28d201a68e7

@MilkBlock
Copy link
Contributor Author

No idea why string_quotes get slower. It's not even relevent to containers.

@saulshanabrook
Copy link
Member

I am going to go back to hiding all the fast running benchmarks. They generally have had too much uncertainty to be that helpful to us in the past.

/// Flushes all pending lazy insertions to the underlying map.
fn flush_pending_operations_for_key(&self, key: &Value) {
let mut pending_ops = self.pending_operations.lock().unwrap();
if !pending_ops.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this if statement is redundant?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, pending_operations never shrinks, even when the actual val_index is small.

We can maybe do a pending_ops.retain(|(keys, op)| !keys.is_empty()) when we find there are too many empty sets during the enumeration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plausible, added

index.swap_remove(&old_val);
index.insert(result);
}
self.val_index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swap_remove is always used together with an insert, so maybe we can have an update_for_all_keys that takes both the old value and the new value. This way you don't need to materialize the container twice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As another optimization, currently we always materialize the container (container.iter().collect()), but when the container only has a fewer entries than (LAZY_BOUND), the materialized container is immediately discarded. Can we make insert/remove_for_all_keys take an iterator instead of an IndexSet?

Copy link
Contributor Author

@MilkBlock MilkBlock Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 ways to implement this.

pub trait ContainerValue: Hash + Eq + Clone + Send + Sync + 'static {
    /// Rebuild an additional container in place according the the given [`Rebuilder`].
    ///
    /// If this method returns `false` then the container must not have been modified (i.e. it must
    /// hash to the same value, and compare equal to a copy of itself before the call).
    fn rebuild_contents(&mut self, rebuilder: &dyn Rebuilder) -> bool;

    /// Iterate over the contents of the container.
    ///
    /// Note that containers can be more structured than just a sequence of values. This iterator
    /// is used to populate an index that in turn is used to speed up rebuilds. If a value in the
    /// container is eligible for a rebuild and it is not mentioned by this iterator, the outer
    /// [`Containers`] registry may skip rebuilding this container.
    fn iter(&self) -> impl Iterator<Item = Value> + '_;
}
  1. trait revision: add fn len()to this trait
  2. make iterator sized_iterator

Maybe we could consider it in next PR this may require many changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just change the interface of iter to return an ExtractSizeIterator?

// keys and value to insert
// if user want to insert same value for all keys in IndexSet<Value>, LazyMap will put them
// in pending_insert and do the insertion for single key and remove this key in pending_insert when user want to read LazyMap
pending_operations: Arc<Mutex<Vec<(IndexSet<Value>, InsertOrRemove)>>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it does not need to be an IndexSet, and a HashSet should be good enough?

val_index: LazyMapOfIndexSet,
}
#[derive(Clone)]
struct LazyMapOfIndexSet {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider renaming to just LazyValIndex or LazyContainerIndex?


const LAZY_BOUND: usize = 30;
use dashmap::mapref::one::{Ref, RefMut};
#[allow(dead_code)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to remove the dead code? Unused functions can be re-implemented later easily if we want to.

@yihozhang
Copy link
Collaborator

Also, are you sure this PR makes #665 faster?

➜  egglog git:(main) git checkout im_main
branch 'im_main' set up to track 'milkblock/im_main'.
Switched to a new branch 'im_main'
➜  egglog git:(im_main) cargo build --release
   Compiling egglog v1.0.0 (/home/yz489/egglog)
   Compiling egglog-core-relations v1.0.0 (/home/yz489/egglog/core-relations)
   Compiling egglog-bridge v1.0.0 (/home/yz489/egglog/egglog-bridge)
    Finished `release` profile [optimized] target(s) in 37.08s
➜  egglog git:(im_main) time target/release/egglog tests/repro-665-set-union.egg
target/release/egglog tests/repro-665-set-union.egg  0.53s user 0.02s system 99% cpu 0.554 total
➜  egglog git:(im_main) gco main
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
➜  egglog git:(main) cargo build --release
   Compiling egglog v1.0.0 (/home/yz489/egglog)
   Compiling egglog-numeric-id v1.0.0 (/home/yz489/egglog/numeric-id)
   Compiling egraph-serialize v0.3.0
   Compiling egglog-union-find v1.0.0 (/home/yz489/egglog/union-find)
   Compiling egglog-core-relations v1.0.0 (/home/yz489/egglog/core-relations)
   Compiling egglog-bridge v1.0.0 (/home/yz489/egglog/egglog-bridge)
    Finished `release` profile [optimized] target(s) in 38.57s
➜  egglog git:(main) time target/release/egglog tests/repro-665-set-union.egg
target/release/egglog tests/repro-665-set-union.egg  0.37s user 0.02s system 99% cpu 0.390 total

If you look at the flamegraph on codspeed, most of the time is spent in dropping the E-graph... But in CLI mode, the E-graph is mem::forget, so that cost is avoided.

@MilkBlock
Copy link
Contributor Author

MilkBlock commented Oct 23, 2025

➜  egglog git:(im_main) ✗ gco im_main                                                 
Already on 'im_main'
Your branch is up to date with 'myfork/im_main'.
➜  egglog git:(im_main) ✗ cargo build --release                                       
    Finished `release` profile [optimized] target(s) in 0.14s
➜  egglog git:(im_main) ✗ time ./target/release/egglog ./tests/repro-665-set-union.egg
./target/release/egglog ./tests/repro-665-set-union.egg  0.07s user 0.01s system 85% cpu 0.090 total
➜  egglog git:(im_main) ✗ gco main                                                    
Switched to branch 'main'
Your branch is up to date with 'myfork/main'.
➜  egglog git:(main) ✗ cargo build --release                                       
   Compiling egglog-core-relations v1.0.0 (/Users/mineralsteins/Repos/stable/egglog/core-relations)
   Compiling egglog v1.0.0 (/Users/mineralsteins/Repos/stable/egglog)
   Compiling egglog-bridge v1.0.0 (/Users/mineralsteins/Repos/stable/egglog/egglog-bridge)
    Finished `release` profile [optimized] target(s) in 22.64s
➜  egglog git:(main) ✗ time ./target/release/egglog ./tests/repro-665-set-union.egg
./target/release/egglog ./tests/repro-665-set-union.egg  0.12s user 0.01s system 92% cpu 0.149 total

emmm seems different on my machine.
I would have a check.

main branch : d3c80a4 (HEAD -> main, myfork/main) container_to_value in top level EGraph and comment revision
im_main branch : d10d939 (HEAD -> im_main, myfork/im_main) nit

image Also, on this graph you can see the performance is improved in run_schedule but not egraph struct memory drop.

@MilkBlock
Copy link
Contributor Author

MilkBlock commented Oct 23, 2025

The graph you saw might be a function name display bug on codspeed. You can see the drop_in_place function takes 80% of time, which is hard to believe.

I think the performance also depends on the number of cores and this is my hardware.

  Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: Mac15,9
      Model Number: MUW63CH/A
      Chip: Apple M3 Max
      Total Number of Cores: 16 (12 performance and 4 efficiency)
      Memory: 48 GB

I don't quite understand performance regression on your machine, maybe you could perf it?

@saulshanabrook
Copy link
Member

But in CLI mode, the E-graph is mem::forget, so that cost is avoided.

I opened #718 to track forgetting in the benchmark as well to get more similar performance

@saulshanabrook
Copy link
Member

The graph you saw might be a function name display bug on codspeed. You can see the drop_in_place function takes 80% of time, which is hard to believe.

I opened a support request for this in the codspeed discord.

@yihozhang
Copy link
Collaborator

I can confirm similar speedups after rebasing from main. I think it's because of the #709


/// Lazily removes a value for all keys in the given index set.
pub fn remove_for_all_keys(&self, keys: HashSet<Value>, value: Value) {
if keys.len() < LAZY_BOUND {
Copy link
Collaborator

@yihozhang yihozhang Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to flush all the updates before eagerly removing?

update: I think you also need to do flush_pending_operations_for_key for eager insertion, not just eager removal

// keys and value to insert
// if user want to insert same value for all keys in IndexSet<Value>, LazyMap will put them
// in pending_insert and do the insertion for single key and remove this key in pending_insert when user want to read LazyMap
pending_operations: Arc<Mutex<Vec<(HashSet<Value>, InsertOrRemove)>>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very slow when many threads are contending to insert to the index

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eli suggested crossbeam_queue for concurrent access.

@saulshanabrook
Copy link
Member

From codspeed on discord fyi:

Hey, indeed there is an ongoing issue on our side with walltime flamegraphs. We are currently fixing. I will let you know when it is fixed!

@MilkBlock MilkBlock marked this pull request as draft November 7, 2025 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants