You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this. I am just checking docs and it says "All file names within the same directory MUST be unique following Unicode canonical normalization and then full case folding". I am not that good with Unicode and I will have to read a bit more about it but do you know what would be this "full case folding" they are talking about?
@aerkalov Thank you for your interest and reply. Allow me to explain the reasons behind the changes I made to the code.
I utilized the ebooklib library to process Arabic EPUBs and extract essential information from the opf file, such as the spine, manifest, publisher name, and read the content for each chapter. However, I encountered an issue with the file-name/chapter name, which was نهائي_الخبر_الرشيد. The library requires that the file name used to access items in the EPUB archive must match the actual file name present in the archive.
The error I faced was due to the presence of certain Arabic characters that required normalization, such as 'ئ' and 'ئ', to ensure consistency in the file names. Therefore, I implemented normalization for Arabic letters to handle these characters appropriately.
In Arabic, there are different ways to represent characters with diacritics, like Hamza and Madda, which can lead to inconsistencies in file names. The normalization process involves converting these characters to their base forms with specific diacritics, ensuring that the file names are standardized.
By normalizing the file names, I was able to resolve the error encountered while accessing items in the EPUB archive. This solution ensures that the specified file name in the code matches the actual file name in the archive, thus enabling smooth processing of Arabic EPUBs with accurate and consistent file names.
Thanks for this. I am just checking docs and it says "All file names within the same directory MUST be unique following Unicode canonical normalization and then full case folding". I am not that good with Unicode and I will have to read a bit more about it but do you know what would be this "full case folding" they are talking about?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
solve normalization issue