rfc(indexer): Do not use cache when publishing claims #58

alanshaw · 2025-06-13T11:41:56Z

Proposes using a separate, non-size bounded cache when claims are published.

hannahhoward

Reviewing this and #59 together, (#57 is more of a no-brainer)

I'm open to both of these suggestions, but I feel there are issues we need to understand better about the performance problems. I think charging ahead without this understanding this is cart before the horse, and may end up simply hiding underlying perf issues or causing them to show up in different unexpected ways.

So, for example:
The indexer cache being full at 18,000 keys -- something is weird here. Each cache in prod is 13GB large. That would make for an average key sizes of 720KB -- that seems off. Why is that?

Why is a localized IPNI in the same region with almost no latency falling way behind on indexing? Why would multiple chains solve this problem? What's our volume of uploads?

I would like to dig a little deeper with tracing on the indexer and our IPNI.

also even with these it would be helpful to do a staged roll out and look at different perf issues -- my suggestion if it's not terrible would be to go forward with the original suggestion to just publish the location claims to IPNI from the storage node for now, while writing the other claims (index/equals) to content claims. We can then setup a percentage feature flag for gradually moving these over to the indexer while using tracing to dig into what is happening in each case.

Anyway, I think this RFC (#58) is the right thing anyway ultimately, so I'm ok to implement it, but I would like to pair it with:

tracing in IPNI
tracing in the indexer write side
#57
deploying the upload service to prod with index/equals still publishing to content claims
setting up a ramp up parameter to slowly move over to using the indexer service
digging in to what's happening with traces and making further optimizations till we get to 100 percent.

rfc(indexer): Do not use cache when publishing claims

6a156e5

alanshaw requested a review from a team June 13, 2025 13:01

hannahhoward reviewed Jun 16, 2025

View reviewed changes

hannahhoward mentioned this pull request Jun 16, 2025

Pre-cached claims should be its own cache/database storacha/indexing-service#217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc(indexer): Do not use cache when publishing claims #58

rfc(indexer): Do not use cache when publishing claims #58

Uh oh!

alanshaw commented Jun 13, 2025 •

edited

Loading

Uh oh!

hannahhoward left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rfc(indexer): Do not use cache when publishing claims #58

Are you sure you want to change the base?

rfc(indexer): Do not use cache when publishing claims #58

Uh oh!

Conversation

alanshaw commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hannahhoward left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alanshaw commented Jun 13, 2025 •

edited

Loading