symbols.py for Arabic letters

I developed my own dataset ~9.5 hours for the Arabic Bahraini dialect.
My validation loss is around 1.5 .
I think this is partly due to how I defined the Arabic symbols.
Is my implementation correct?
Could someone please help?

_pad = '_'
_punctuation = '.!,؟*: '
_special = '-'

# Phonemes
_vowels = 'واي'
_non_pulmonic_consonants = ''
_pulmonic_consonants = 'لإإلأابتثجحخدذرزسشصضطظعغفقكلمنهويءؤآ'
_suprasegmentals = 'ˈˌːˑ'
_other_symbols = ''
_diacrilics = 'ّ'
_extra_phons = []  # some extra symbols that I found in from wiktionary ipa annotations
#_extra_phons = ['g', 'ɝ', '̃', '̍', '̥', '̩', '̯', '͡']  # some extra symbols that I found in from wiktionary ipa annotations

phonemes = list(
   _pad + _punctuation + _special + _vowels + _non_pulmonic_consonants
   + _pulmonic_consonants + _suprasegmentals + _other_symbols + _diacrilics) + _extra_phons

phonemes_set = set(phonemes)
silent_phonemes_indices = [i for i, p in enumerate(phonemes) if p in _pad + _punctuation]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

symbols.py for Arabic letters #104

Phonemes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

symbols.py for Arabic letters #104

Description

Phonemes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions