Conversation
|
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
There was a problem hiding this comment.
Pull request overview
grug see PR add genre ignorelist to lastgenre plugin, so wrong Last.fm tags can be blocked global (*) or per artist. this fit plugin job: pick good genre tags and keep user control.
Changes:
- add ignorelist file parsing + regex/literal compile, plus shared
is_ignored()helper - apply ignorelist both right after Last.fm fetch and again during genre resolve/whitelist/c14n
- add docs + tests for ignorelist matching and file format/errors
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
beetsplug/lastgenre/__init__.py |
load ignorelist from config file; thread artist through resolve path; filter ignored tags during c14n + final validation |
beetsplug/lastgenre/client.py |
filter ignored tags immediately after Last.fm lookup (while keeping cache) |
beetsplug/lastgenre/utils.py |
new shared is_ignored() helper + ignorelist typing |
docs/plugins/lastgenre.rst |
document ignorelist feature + config option |
test/plugins/test_lastgenre.py |
add ignorelist tests (resolve behavior, matcher behavior, file format + errors) |
Comments suppressed due to low confidence (1)
test/plugins/test_lastgenre.py:293
- grug see this
configfixture just return upstream module-scoped config. it not reset state per test, soconfig['lastgenre']['ignorelist'](and other keys) can leak between tests and even breakLastGenrePlugin()init when prior test set ignorelist path then delete file. grug suggest make this fixture actually reset needed keys (ex: setlastgenre.ignorelist = Falsehere) or use function-scoped fresh config object.
@pytest.fixture
def config(config):
"""Provide a fresh beets configuration for every test/parameterize call
This is necessary to prevent the following parameterized test to bleed
config test state in between test cases.
"""
return config
1400171 to
c17eebb
Compare
|
@snejus this is now ready for a first (human) review except the copilot comments I will address. Thanks in advance :-) |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6449 +/- ##
==========================================
+ Coverage 70.05% 70.19% +0.13%
==========================================
Files 147 148 +1
Lines 18560 18612 +52
Branches 3026 3035 +9
==========================================
+ Hits 13003 13064 +61
+ Misses 4924 4913 -11
- Partials 633 635 +2
🚀 New features to boost your workflow:
|
|
Just had a quick look at the description and wanted to suggest that ideally we want to stay within the beets configuration YAML for the regex patterns. See my configuration: #6269 (comment), specifically What specific issue YAML poses in your case? |
|
Cant recall specifically, it was a year ago already. Loads of loads of excaping this and that. regexes bacame tedious to write and even read. Will take a look at how you implemented regex patterns within yaml. What oposes not a separate format? whitelist and c14n is something else anyway. I'd say it's simpler to use. Main reason. I gave up back then trying to use ini or yaml for regexes. It was simply painful to use 🤷 this new format I will use as-is for genre aliases too btw. also a reason why I think it's the best way forward for the project. |
Personally it's about the convenience - I want all my configuration to live in a single file. And if it's getting unwieldy, I can move, say, For example, my directory: /mnt/music/Music
library: ~/.music/beets/library.db
include:
- replacements.yaml |
|
Also I don't see how a separate file improve the regex definitions. Note that artwork:
pattern: |
\b(?i:art(?:work)?|cover)\b
[^|:,.\n-]*?
(?:[-:]\ ?|by:?\ )
(
https:[^ ]+
| (
[\w@]
(?:[^|:,.\n-]|[.]\w)+
[^|:,.\n -]
)+
)
([|:,.\n \"-]|$)Then when you read such configuration in python, make sure to specify |
|
Please don't get me wrong: I understand that yaml is our config format and my first versions of this feature when I started about a year ago was of course trying to make it work with yaml. And yes I am certainly aware of yaml includes and I do use them too :-) Problems I had were mainly escaping issues, yaml treating regexes that work elsewhere in python do not work when put into yaml. I went to escaping hell and back. I also tried ini as an alternative and that was a little better but still painful. At some point I decided that it is not worth ot and won't ever be userfriendly, so I put thought and effort into finding the most simple solution from a user perspective. Open file - add regex from regex101 or self constructed - see if it compiles - and done. No special treatment so yaml accepts it. That is user friendly and overrules "having everything in one file", which you say for yourself you don't do anyway when stuff gets too big. For last.fm genre ignores, believe me you will have a separate file because it gets big. So your argument is not too strong here. Anyway I'm not at home/at computer these days and I won't argue further here, so if you insist that yaml is the better way to go, I will do it. It will take some time and it's a bit of a bummer that sets me back from finally getting this over with... If you help me with escaping my regexes for yaml I will be grateful I see weird things in your config, what is this for example? |
multiline regexes dont solve the escaping problems, also I don't really need them for my usecase. What are you trying to say here? |
|
And just so we know what we are actually talking about, this format is really straight forward and a pleasure to read and edit. Not even any quoting required ❤️ |
|
To further give you the full picture of my reasoning: I took quite some ideas for the aliases and iignores features from discontinued whatlastgwnre 3rd party plugin. It had curated lists of very useful patterns for the aliases features. They worked well for me back then when I used the plugin regularly, so I wanted to continue use them and also include them in my lastgenre implementation of these feature ideas. I still plan to ship a curated list of aliases with beets! Stuff that works for everyone. Shipping in a sep file is easy. I tried hard to make them work in yaml and failed. It was a never ending story escaping everything so yaml accepts it: I'm giving you all this so you might better understand that I did my due diligence, made my homework and had a long and thoughtful path until I decided that a custom format is the best and most of all userfriendly way. If you still insist that yaml is the best way I will start doing it. But please do note that I put a lot of thought, work and experimentation into my decisions and it's not something I just decided out of the blue. |
I want to understand this better. Can you give me an example where this is an issue? |
66a7d98 to
1785dab
Compare
|
Hi again @snejus, I refactored this to have everything right within the beets config. I added some details to docs and docstrings user's should be cautious about when putting there regex into yaml. I won't be able to recall excatly errors from when I first started with this feature. In the end it doesnt matter. I can only suppose that I might have tried to put regex into double quotes which is a very bad idea. Also I might have had a lot of errors with ini as well and all that added up to the conclusion that yaml and ini are bad -> Invent something that just works and most of all is more user friendly. I still find my first version the most user friendly and it didn't require any "caution don't do this, don't do that" stuff in the docs ;-) But I now learned more about it and it is not too much to worry about. In most cases with regex in yaml it is just best to not use any single or double quotes at all and most stuff will work. So @snejus the 3 things I'd appreciate if you took special attention to in another review are:
Currently existing genres check happens in resolve_genre, and it is applied possibly more than once. I'm just running through that again to see if there is a redundant check in there. |
1785dab to
c5a14d3
Compare
You can use them as long as you use yaml literal (or scalar) style blocks: configuration: |
you can use
any
characters
'"
you likeWill review this tomorrow! |
snejus
left a comment
There was a problem hiding this comment.
Added a couple of comments and will have another look once these are addressed!
You misunderstood and I worded poorly: I mean: It is actually quite good / best to not try to use any "" or '' for trying to escape something. Leaving them out completely in one-line-yaml-entries works surprisingly well and that is also what I'd like to transport to users via the docs: Best not use them for escaping :-) |
9d13053 to
4731a1c
Compare
ba99310 to
1cfe0c9
Compare
|
Hi @snejus thanks for the feedback, I tried to address it. I think it's time for another look. I'm not happy with all of it yet but maybe need guidance if I'm on the right track. Delegating config check to confuseI have two versions: This first version in 636aca7 throws a weird error when an It's hard to guess what the problem is for a user with that error so I tried to handle that here with a slightly more complex version: 1cfe0c9 Generally, I am not sure if catching those errors and then raising a I just realized that `UserError" alsod doesnt actually exit beets but that's what I actually want to prevent "surprising" results since the error message gets washed away quickly by other beets output! After all I'd like to be consisten with how behave elsewhere. If that means: huge tracebacks and a tiny error message that is relevant, I'm ok with it. Version 1 error looks like this: Version 2 now accepts a numeric artist but with a config like this would clearly notify about the actual issue:
|
3866f73 to
d2b0e00
Compare
- Test file format (valid and error cases) - Test regex pattern matching (_is_ignored) - Test _resolve_genres: ignored genres filtered - Test _resolve_genres: c14n ancestry walk blocked for ignored tags - Test _resolve_genres: without whitelist, oldest ancestor is kept - Test _resolve_genres: whithout whitelist, ignored oldest ancestor is removed
- Prevents wrong last.fm genres based on a per artist (or global) list of regex
patterns that should be ignored.
- Genre _ignoring_ happens in two places: Right after fetching the
last.fm genre and in _resolve_genres.
- As a fallback literal string matching can be used instead of
supplying a regex pattern
- Includes docs for the new feature.
and fix fallback types
and compile regex inline. extra_debug is now better readable
6536937 to
46bde5e
Compare
snejus
left a comment
There was a problem hiding this comment.
Mostly looks good, added several comments
| genre | ||
| for genre in cleaned | ||
| if not self.whitelist or genre.lower() in self.whitelist |
There was a problem hiding this comment.
Keep the simpler variable g to slightly reduce verbosity here.
| return ( | ||
| getattr(obj, "artist", None) | ||
| if isinstance(obj, library.Item) | ||
| else getattr(obj, "albumartist", None) | ||
| or getattr(obj, "artist", None) | ||
| ) |
There was a problem hiding this comment.
I think this should be just
| return ( | |
| getattr(obj, "artist", None) | |
| if isinstance(obj, library.Item) | |
| else getattr(obj, "albumartist", None) | |
| or getattr(obj, "artist", None) | |
| ) | |
| return ( | |
| obj.artist | |
| if isinstance(obj, library.Item) | |
| else obj.albumartist or obj.artist | |
| ) |
| return [ | ||
| genre | ||
| for genre in genres | ||
| if not is_ignored(logger, ignorelist, genre, artist) | ||
| ] |
There was a problem hiding this comment.
| return [ | |
| genre | |
| for genre in genres | |
| if not is_ignored(logger, ignorelist, genre, artist) | |
| ] | |
| return [ | |
| g | |
| for g in genres | |
| if not is_ignored(logger, ignorelist, g, artist) | |
| ] |
| if not ignorelist: | ||
| return False |
There was a problem hiding this comment.
No need for this
| if not ignorelist: | |
| return False |
|
|
||
| from beets.logging import BeetsLogger | ||
|
|
||
| Ignorelist = dict[str, list[re.Pattern[str]]] |
There was a problem hiding this comment.
I think the name Ignorelist is a bit misleading given the shape of it. It should rather be called IgnoreMap: it maps artists to their respective ignorelists.
Probably worth renaming the configuration to ignoremap as well, potentially even artist_ignoremap.
| Filtering is done in two places: when fetching genres from Last.fm and when | ||
| resolving to a final genre list (during canonicalization and whitelisting). | ||
|
|
||
| This means that existing genres are also filtered when the ``force`` and |
There was a problem hiding this comment.
| This means that existing genres are also filtered when the ``force`` and | |
| This means that existing genres are also filtered when ``force`` and |
| ``keep_existing`` options are enabled (or ``cleanup_existing`` is enabled with | ||
| ``force: no``). |
There was a problem hiding this comment.
For consistency:
| ``keep_existing`` options are enabled (or ``cleanup_existing`` is enabled with | |
| ``force: no``). | |
| ``keep_existing`` options are enabled (or ``cleanup_existing`` is enabled with | |
| ``force`` disabled). |
| key applies globally to all artists — use it to block genres you never want, | ||
| regardless of artist. Patterns are matched against the full genre string, so a | ||
| plain ``metal`` will not match ``heavy metal`` unless you write a regex like | ||
| ``.*metal.*``. |
There was a problem hiding this comment.
| ``.*metal.*``. | |
| ``.*metal``. |
|
|
||
| - The global key ``'*'`` **must** be surrounded by single quotes so that | ||
| YAML does not interpret it as an anchor. | ||
| - Any regex pattern that starts with special YAML characters (especially |
There was a problem hiding this comment.
| - Any regex pattern that starts with special YAML characters (especially | |
| - Any regex pattern that starts with a special YAML character (especially |
| # Ignorelist pattern matching tests for _is_ignored, independent of _resolve_genres | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Place these ignorelist tests under a class, e.g. TestIgnoreList
Description
Adds a global and artist-specific genre ignorelist to lastgenre. Ignorelist entries can use regex patterns or literal genre names and are configurable per artist or globally. For config examples see submitted docs and
_load_ignorelist()docstring.Additional minor refactoring
.get()._artist_for_helperFurther notes
My personal roadmap for the
lastgenreplugin: #5915Feature idea originated from: #5721 (comment).
To Do