Hi,
I am interested in producing a cluster simmilar to the one you did with arxiv. I'm working with a set of web pages from Common Crawl ~6M urls. I have them reduced to embeddings using this. How did you decide for the arxiv project the config of node_embedding_dim, neighbor_scale, and n_neighbors or at least what are rational ranges so and I can search on that areas. because currently I end with ~65% of points not being noise in no cluster. even using noise_level=0
thanks
Hi,
I am interested in producing a cluster simmilar to the one you did with arxiv. I'm working with a set of web pages from Common Crawl ~6M urls. I have them reduced to embeddings using this. How did you decide for the arxiv project the config of
node_embedding_dim,neighbor_scale, andn_neighborsor at least what are rational ranges so and I can search on that areas. because currently I end with ~65% of points not being noise in no cluster. even usingnoise_level=0thanks