Skip to content

xmed-lab/SemKey

Repository files navigation

Beyond LLM Priors: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

📌 Overview

structure

Architecture of the SemKey framework

SemKey a novel multi-stage framework that enforces signal-grounded generation through four decoupled semantic objectives: sentiment, topic, length, and surprisal. By utilizing these semantic attributes in conjunction with encoded EEG signals, we achieve state-of-the-art (SOTA) performance in EEG-to-text generation.

🛠️ Installation & Setup

🖥️ Environment Setup

Tip

You can find all required packages in ./environment.yml

# Create environment
conda env create -f environment.yml
# Activate environment
conda activate semkey
# Additionally, removal of environment
conda env remove -n semkey

📊 Data Preparation

1.Download ZuCo Dataset

Please download ZuCo 1.0 and 2.0 from their official site:

ZuCo1: link
ZuCo2: link

Important

Please rename ZuCo2 directories (follows ZuCo1 task naming):
"task1 - NR" -> "task2-NR"
"task2 - TSR" -> "task3-TSR"

Please also remove extra spaces in directories' names (i.e. "task1- SR" -> "task1-SR") and rename "Matlab files" -> "Matlab_files"

Please manually check csv errors in ZuCo1/task_materials/*.csv and put them in ZuCo1/revised_csv or copy the provided folder from ./preprocess/resource/revised_csv

Please place necessary files under the following tree structure:

SemKey
└── datasets
    └── ZuCo
        ├── ZuCo1
        │    ├── revised_csv
        │    ├── task1-SR
        │    ├── task2-NR
        │    └── task3-TSR
        └── ZuCo2
             ├── task_materials
             ├── task2-NR
             └── task3-TSR
...

2.Preprocess

Please run the followings as instructed to setup datasets for SemKey stage 1 (parallel) training

Tip

Please run from project's root directory (i.e. SemKey/ )

Parse ZuCo sentences
Run ./preprocess/preprocess_label.py

Generate topic/sentiment/length/surprisal labels
Run ./label_generation/generate_all_labels.py

Load EEG data
Run ./preprocess/preprocess_mat.py

Merge EEG with labels
Run ./preprocess/preprocess_merge.py

Merge MTV
Copy ./preprocess/resource/zuco_label_8variants.df to ./data/zuco_preprocessed_dataframe
Run ./preprocess/preprocess_merge_MTV.py

🔄 Upgrade package: Transformers

Please run (This upgrade brings cosine learn-rate generation function)
If you directly use this version, you'll encounter safetensor warning during label generation

pip install --upgrade transformers==4.57.6

🔥 Training

Tip

Please run from project's root directory (i.e. SemKey/ )

Stage 1 (Semkey Parallel)

Configure ./run_script/run_parallel.sh
Run ./run_script/run_parallel.sh

Prepare data for Stage 2

Configure ./inference/predict_semkey_parallel_and_pack.sh
-> You need to specify path-to-stage1 (SemKey parallel) checkpoint
Run ./inference/predict_semkey_parallel_and_pack.sh

Stage 2 (Semkey E2E | end-to-end training)

Configure ./run_script/run_e2e.sh
-> You need to specify path-to-stage1 (SemKey parallel) checkpoint
-> You need to specify path-to-stage2dataset (generated by ./inference/predict_semkey_parallel_and_pack.sh)
Run ./run_script/run_e2e.sh

📈 Evaluation

Tip

Please run from project's root directory (i.e. SemKey/ )

Configure ./run_script/run_evaluation_csv.sh

CSV_FILE_PATH: path to the generated csv file when training the SemKey End-to-End (SemKey E2E) model

Run ./run_script/run_evaluation_csv.sh

Tip

The results will be saved next to the csv file path (in json).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors