Skip to content

Conversation

@alanshaw
Copy link
Member

@alanshaw alanshaw commented Jun 13, 2025

Proposes using a separate, non-size bounded cache when claims are published.

📚 Preview

@alanshaw alanshaw requested a review from a team June 13, 2025 13:01
Copy link
Member

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing this and #59 together, (#57 is more of a no-brainer)

I'm open to both of these suggestions, but I feel there are issues we need to understand better about the performance problems. I think charging ahead without this understanding this is cart before the horse, and may end up simply hiding underlying perf issues or causing them to show up in different unexpected ways.

So, for example:
The indexer cache being full at 18,000 keys -- something is weird here. Each cache in prod is 13GB large. That would make for an average key sizes of 720KB -- that seems off. Why is that?

Why is a localized IPNI in the same region with almost no latency falling way behind on indexing? Why would multiple chains solve this problem? What's our volume of uploads?

I would like to dig a little deeper with tracing on the indexer and our IPNI.

also even with these it would be helpful to do a staged roll out and look at different perf issues -- my suggestion if it's not terrible would be to go forward with the original suggestion to just publish the location claims to IPNI from the storage node for now, while writing the other claims (index/equals) to content claims. We can then setup a percentage feature flag for gradually moving these over to the indexer while using tracing to dig into what is happening in each case.

Anyway, I think this RFC (#58) is the right thing anyway ultimately, so I'm ok to implement it, but I would like to pair it with:

  1. tracing in IPNI
  2. tracing in the indexer write side
  3. #57
  4. deploying the upload service to prod with index/equals still publishing to content claims
  5. setting up a ramp up parameter to slowly move over to using the indexer service
  6. digging in to what's happening with traces and making further optimizations till we get to 100 percent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants