Skip to content

Format mismatch in print_examples() prevents example collection for lexical and syntactic grammars #14

@cameron-morin

Description

@cameron-morin

The print_examples() function fails to find examples for lexical and syntactic grammars due to a format mismatch between construction learning and example collection phases.

During learning (process_grammar()):

Lexical grammar: converts enriched data to [(lex, -1, -1), ...]
Syntactic grammar: converts to [(-1, syn, -1), ...]
Full grammar: keeps as [(lex, syn, sem), ...]

During example collection (print_examples()):

All grammar types: uses unmodified enriched format [(lex, syn, sem), ...]

Without examples, get_token_similarity() fails (requires 75% character overlap between examples), causing all constructions to collapse into token cluster "1" regardless of their actual similarity.

Suggested fix: apply the same format masking in print_examples().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions