Spotify: Batch Spotify spotifysync API calls#6485
Spotify: Batch Spotify spotifysync API calls#6485arsaboo wants to merge 4 commits intobeetbox:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
PR make Spotify spotifysync do fewer HTTP call. It batch track metadata and audio-features fetch, so big library not hit rate limit so fast.
Changes:
- Add batch fetch for
/v1/tracks(50 ids per request) and/v1/audio-features(100 ids per request). - Deduplicate repeated Spotify track IDs within one run of
_fetch_info. - Add tests for chunking, endpoint usage, dedupe, and 403-disable behavior; add changelog entry.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
beetsplug/spotify.py |
Implement batch endpoints + ID chunking/dedupe and shared disable logic for audio features. |
test/plugins/test_spotify.py |
Add response mocks + assertions for batching/dedupe/403 behavior. |
docs/changelog.rst |
Add Unreleased changelog entry for Spotify batching change. |
beetsplug/spotify.py
Outdated
| @staticmethod | ||
| def _chunked(ids: Sequence[str], chunk_size: int) -> list[list[str]]: | ||
| """Split IDs into deterministic chunks for Spotify batch endpoints.""" | ||
| return [ids[i : i + chunk_size] for i in range(0, len(ids), chunk_size)] |
There was a problem hiding this comment.
grug see type lie. _chunked say it return list[list[str]], but slice of Sequence[str] is Sequence[str] (often tuple). mypy likely yell. make _chunked build real list for each chunk (e.g., wrap slice with list(...)) or change return type to list[Sequence[str]] and adjust callers.
| return [ids[i : i + chunk_size] for i in range(0, len(ids), chunk_size)] | |
| return [ | |
| list(ids[i : i + chunk_size]) | |
| for i in range(0, len(ids), chunk_size) | |
| ] |
beetsplug/spotify.py
Outdated
| try: | ||
| for chunk in self._chunked(track_ids, 100): | ||
| features_data = self._handle_response( | ||
| "get", | ||
| self.audio_features_batch_url, | ||
| params={"ids": ",".join(chunk)}, | ||
| ) | ||
|
|
||
| for idx, feature_data in enumerate( | ||
| features_data.get("audio_features", []) | ||
| ): | ||
| if feature_data is None: | ||
| continue | ||
| track_id = feature_data.get("id") or chunk[idx] | ||
| features_by_id[track_id] = feature_data | ||
| return features_by_id | ||
| except AudioFeaturesUnavailableError: | ||
| self._disable_audio_features() | ||
| return {} | ||
| except APIError as e: | ||
| self._log.debug("Spotify API error: {}", e) | ||
| return {} | ||
|
|
There was a problem hiding this comment.
grug think batch audio feature fetch change behavior on error. old per-track track_audio_features() catch APIError and just skip that one track. new track_audio_features_batch() catch APIError and return {} for whole run, also drop any features already fetched for earlier chunks. better: keep features_by_id so far and continue next chunk (or at least return partial dict) so one bad request no nuke all tracks.
| try: | |
| for chunk in self._chunked(track_ids, 100): | |
| features_data = self._handle_response( | |
| "get", | |
| self.audio_features_batch_url, | |
| params={"ids": ",".join(chunk)}, | |
| ) | |
| for idx, feature_data in enumerate( | |
| features_data.get("audio_features", []) | |
| ): | |
| if feature_data is None: | |
| continue | |
| track_id = feature_data.get("id") or chunk[idx] | |
| features_by_id[track_id] = feature_data | |
| return features_by_id | |
| except AudioFeaturesUnavailableError: | |
| self._disable_audio_features() | |
| return {} | |
| except APIError as e: | |
| self._log.debug("Spotify API error: {}", e) | |
| return {} | |
| for chunk in self._chunked(track_ids, 100): | |
| try: | |
| features_data = self._handle_response( | |
| "get", | |
| self.audio_features_batch_url, | |
| params={"ids": ",".join(chunk)}, | |
| ) | |
| except AudioFeaturesUnavailableError: | |
| # audio features API now unavailable; stop further requests | |
| self._disable_audio_features() | |
| break | |
| except APIError as e: | |
| # one bad request should not nuke all tracks; log and continue | |
| self._log.debug("Spotify API error: {}", e) | |
| continue | |
| for idx, feature_data in enumerate( | |
| features_data.get("audio_features", []) | |
| ): | |
| if feature_data is None: | |
| continue | |
| track_id = feature_data.get("id") or chunk[idx] | |
| features_by_id[track_id] = feature_data | |
| return features_by_id |
docs/changelog.rst
Outdated
| Other changes | ||
| ~~~~~~~~~~~~~ | ||
|
|
||
| - :doc:`plugins/spotify`: Batch ``spotifysync`` track and audio-features API | ||
| requests and deduplicate repeated Spotify track IDs within a run. | ||
|
|
There was a problem hiding this comment.
grug see changelog entry sit under Bug fixes because Other changes heading still commented out (..). batching API calls not bug fix, and heading not show in built docs. please uncomment the Other changes subsection (or move entry to right Unreleased subsection) so changelog structure correct.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6485 +/- ##
==========================================
+ Coverage 69.75% 70.20% +0.44%
==========================================
Files 145 146 +1
Lines 18508 18558 +50
Branches 3011 3022 +11
==========================================
+ Hits 12911 13029 +118
+ Misses 4969 4884 -85
- Partials 628 645 +17
🚀 New features to boost your workflow:
|
This changes the Spotify plugin to batch
spotifysynclookups instead of making per-track API calls. It now:The previous implementation made one metadata request and one audio-features request per track, which was inefficient for larger libraries and increased the chance of hitting rate limits. Batching reduces request volume substantially while keeping the stored fields and user-facing behavior the same.
docs/changelog.rstto the bottom of one of the lists near the top of the document.)