Refactor parsing of numeric ASCII lists-of-lists#772
Refactor parsing of numeric ASCII lists-of-lists#772RamogninoF wants to merge 1 commit intogerlero:mainfrom
Conversation
…y length of sub-lists (instead of hardcoded 3 and 4 length values). This is required to parse faceLists in ascii format which can have arbitrary number of vertices for poly meshes (often 5+)
|
@RamogninoF thanks! I'll take a look. In principle this shouldn't be possible with regular expressions alone (which foamlib's parser uses to be fast enough), but I'll look at the code to see what it's doing |
| + _SKIP.pattern | ||
| + rb")?\)" | ||
| _SUB_LIST_LIKE = re.compile( | ||
| rb"(?:" + _POSSIBLE_INTEGER.pattern + rb")(?:" + _SKIP.pattern + rb")?\([^()]*?\)" |
There was a problem hiding this comment.
@RamogninoF problem to me is that this doesn't actually check that the sublist is well-formed. E.g. this will readily accept a list with a wrong count like 2 (1 2 3)...
There was a problem hiding this comment.
Unfortunately I have no background in parsing logics etc. and all this is far beyond my capabilities, I just hope this can be a useful starting point for you. At the current state parsing of meshes in ascii format is just straight impossible due to the time required for parsing (I gave up even on a 10k cells mesh after it was taking more then 10 minutes parsing the faces file). I would like to be able to support handling also ascii meshes rather then only binary
There was a problem hiding this comment.
A simple alternative could be just to add hardcoded parser for up to 10-vertex faces or so, which I think would be more then enough for most cases
There was a problem hiding this comment.
Or what if the string is parsed twice via regex, one to retrieve the list and one to get the prefix marking it's length, and these quantities are compared to validate the parsed data before returning?
Updated parsing of numeric ascii lists-of-lists to extend to arbitrary length of sub-lists (instead of hardcoded 3 and 4 length values).
This is required to parse faceLists stored in ASCII format such as below, which can have arbitrary number of vertices for poly meshes (often 5+). These would fallback to the default parser, leading to really long parsing time even for really small meshes.