Skip to content

feat(LLMO-3074): add offsite-brand-presence audit#2002

Open
rarescheseli wants to merge 17 commits intomainfrom
rcheseli/offsite-brand-presence
Open

feat(LLMO-3074): add offsite-brand-presence audit#2002
rarescheseli wants to merge 17 commits intomainfrom
rcheseli/offsite-brand-presence

Conversation

@rarescheseli
Copy link
Contributor

@rarescheseli rarescheseli commented Feb 17, 2026

Main changes:

  • new audit that fetches query-index.json, parses it and fetches latest brand presence files
  • parses the brand presence files, extracts reddit, youtube and wikipedia URLs from prompts' sources, counting how many times each URL is cited.
  • add top 100 most cited URLs (per platform) to the URL store and sends them to DRS for scraping.
  • top 100 URLs (excluding reddit, youtube and wikipedia) are sent to the URL store as well. (will be sent to DRS too when it can handle them).

Please ensure your pull request adheres to the following guidelines:

  • make sure to link the related issues in this description
  • when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes
  • If data sources for any opportunity has been updated/added, please update the wiki for same opportunity.

Related Issues

Thanks for contributing!

@github-actions
Copy link

This PR will trigger no release when merged.

@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@rarescheseli rarescheseli marked this pull request as ready for review February 18, 2026 09:47
Comment on lines +538 to +543
for (let i = 0; i < matchedFiles.length; i += FETCH_CONCURRENCY) {
const batch = matchedFiles.slice(i, i + FETCH_CONCURRENCY);
// eslint-disable-next-line no-await-in-loop
const fetchResults = await Promise.allSettled(
batch.map((filePath) => fetchBrandPresenceData(siteId, filePath, env, log)),
);
Copy link
Contributor Author

@rarescheseli rarescheseli Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to do parallel fetches or is sequential fetching enough? Sequential fetching code would be a bit easier to read and maintain. Maybe we add a small delay between requests to not overload the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead with sequential since the code is more readable and easier to maintain. The gain from parallel fetches isn't that relevant anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the audit is subject to change.

const rows = data.data;
for (const row of rows) {
const sources = row.Sources?.trim();
if (sources && row.Region === 'US' && row.Mentions === 'true' && row.Citations === 'true') {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if the region could be configurable (maybe configured by the client somewhere?)

* governing permissions and limitations under the License.
*/

export const DRS_TOP_URLS_LIMIT = 100;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBD if we should increase/decrease this.

const rows = data.data;
for (const row of rows) {
const sources = row.Sources?.trim();
if (sources && row.Region === 'US') {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense? We're limited to english-only content at the moment anyway.


return {
auditResult: {
success: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case all the urls failed to be saved to the DB, the audit is still marked as success, but we don't know if the urls have been saved or not

* @param {object} site - The site being audited
* @returns {Promise<object>} Audit result
*/
export async function offsiteBrandPresenceRunner(finalUrl, context, site) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the runner-audit.js has 4 params, here we can add auditContext = {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It breaks the linter since it would be unused.

@HollywoodTonight
Copy link
Contributor

I suggest we rename it to offsite-brand-presence-analysis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants