Skip to content

Verify heterogeneous keys hash to the same value in set/map#6950

Open
dwblaikie wants to merge 1 commit intocarbon-language:trunkfrom
dwblaikie:map_hash_matching
Open

Verify heterogeneous keys hash to the same value in set/map#6950
dwblaikie wants to merge 1 commit intocarbon-language:trunkfrom
dwblaikie:map_hash_matching

Conversation

@dwblaikie
Copy link
Contributor

This doesn't feel like the nicest way to do it (adding the check in 2
different places per lookup function) but since it has to wait until
after the insertion callback has constructed the key, with the current
structure I don't see a great alternative.

The return operation could be restructured into a function that DCHECKs
and then constructs the key value result type, and some of this code
could be commoned into an intermediate base (between BaseImpl and
MapBase/SetBase)?

This only tests one of the 6 checks introduced in this patch, because
death tests are especially expensive (partly innate, and partly due to
the cost of symbolizing failures - if we figure out how to disable the
symbolizer in death tests (llvm has this problem too/could benefit from
a solution) then maybe we could more comprehensively test these paths)

(the test could be made more explicit by using custom types that
intentionally hash differently - but that seems more verbose/perhaps
harder to follow that relying on the pointer conversion that was where
the problem was initially observed)

Discovered while working on #6940

This doesn't feel like the nicest way to do it (adding the check in 2
different places per lookup function) but since it has to wait until
after the insertion callback has constructed the key, with the current
structure I don't see a great alternative.

The return operation could be restructured into a function that DCHECKs
and then constructs the key value result type, and some of this code
could be commoned into an intermediate base (between BaseImpl and
MapBase/SetBase)?

This only tests one of the 6 checks introduced in this patch, because
death tests are especially expensive (partly innate, and partly due to
the cost of symbolizing failures - if we figure out how to disable the
symbolizer in death tests (llvm has this problem too/could benefit from
a solution) then maybe we could more comprehensively test these paths)

(the test could be made more explicit by using custom types that
intentionally hash differently - but that seems more verbose/perhaps
harder to follow that relying on the pointer conversion that was where
the problem was initially observed)

Discovered while working on carbon-language#6940
@dwblaikie dwblaikie requested a review from a team as a code owner March 23, 2026 06:46
@dwblaikie dwblaikie requested review from geoffromer and removed request for a team March 23, 2026 06:46
@danakj danakj changed the title Verify heterogeneous keys has to the same value in set/map Verify heterogeneous keys hash to the same value in set/map Mar 23, 2026
static_cast<void*>(base_pointer));

Map<Base2*, int> m;
ASSERT_DEATH(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will fail in opt builds. I'm not sure we need this test anyway; we generally don't test the behavior of violating an invariant, and I'm not sure this invariant is so critical that it needs to be an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair point indeed.

(I'm a bit more on the fence when it comes to invariant violation checking in core/reusable components where its ability to not be misused is a feature, etc - compared to more opportunistic assertions/checks we might litter throughout the codebase - but happy to go with your leaning here)

CARBON_DCHECK(entry, "Should always result in a valid index.");

if (LLVM_LIKELY(!inserted)) {
CARBON_DCHECK(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about the performance impact of this. I know it's disabled in opt mode, but inefficiency in fastbuild is still a problem if it slows down the edit-compile-test loop. I think we can and should solve this statically rather than dynamically.

I think the derived/base pointer issue shows that heterogeneous pointer == is "not suitable for hash tables", so the prescribed fix is to add a CarbonHashtableEq overload for heterogeneous pointer types, which presumably casts to void* before performing the comparison.

However, I think we chose the wrong default here: we shouldn't be supporting heterogeneous lookup between arbitrary types just because they're equality-comparable; instead, we should only support it for pairs of types where we affirmatively know that equal values will have equal hashes. In other words, we should get rid of the generic heterogeneous overload of CarbonHashtableEq, and replace it with a generic homogeneous overload, plus heterogeneous overload pairs for whatever types we need to support heterogeneous lookup on (I think that's just StringRef/std::string and ArrayRef/MutableArrayRef).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair points all. I'll look into that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants