Beyond LLM Priors: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

📌 Overview

Architecture of the SemKey framework

SemKey a novel multi-stage framework that enforces signal-grounded generation through four decoupled semantic objectives: sentiment, topic, length, and surprisal. By utilizing these semantic attributes in conjunction with encoded EEG signals, we achieve state-of-the-art (SOTA) performance in EEG-to-text generation.

🛠️ Installation & Setup

🖥️ Environment Setup

Tip

You can find all required packages in ./environment.yml

# Create environment
conda env create -f environment.yml
# Activate environment
conda activate semkey

# Additionally, removal of environment
conda env remove -n semkey

📊 Data Preparation

1.Download ZuCo Dataset

Please download ZuCo 1.0 and 2.0 from their official site:

ZuCo1: link
ZuCo2: link

Important

Please rename ZuCo2 directories (follows ZuCo1 task naming):
"task1 - NR" -> "task2-NR"
"task2 - TSR" -> "task3-TSR"

Please also remove extra spaces in directories' names (i.e. "task1- SR" -> "task1-SR") and rename "Matlab files" -> "Matlab_files"

Please manually check csv errors in ZuCo1/task_materials/*.csv and put them in ZuCo1/revised_csv or copy the provided folder from ./preprocess/resource/revised_csv

Please place necessary files under the following tree structure:

SemKey
└── datasets
    └── ZuCo
        ├── ZuCo1
        │    ├── revised_csv
        │    ├── task1-SR
        │    ├── task2-NR
        │    └── task3-TSR
        └── ZuCo2
             ├── task_materials
             ├── task2-NR
             └── task3-TSR
...

2.Preprocess

Please run the followings as instructed to setup datasets for SemKey stage 1 (parallel) training

Tip

Please run from project's root directory (i.e. SemKey/ )

Parse ZuCo sentences
Run ./preprocess/preprocess_label.py

Generate topic/sentiment/length/surprisal labels
Run ./label_generation/generate_all_labels.py

Load EEG data
Run ./preprocess/preprocess_mat.py

Merge EEG with labels
Run ./preprocess/preprocess_merge.py

Merge MTV
Copy ./preprocess/resource/zuco_label_8variants.df to ./data/zuco_preprocessed_dataframe
Run ./preprocess/preprocess_merge_MTV.py

🔄 Upgrade package: `Transformers`

Please run (This upgrade brings cosine learn-rate generation function)
If you directly use this version, you'll encounter safetensor warning during label generation

pip install --upgrade transformers==4.57.6

🔥 Training

Tip

Please run from project's root directory (i.e. SemKey/ )

Stage 1 (Semkey Parallel)

Configure ./run_script/run_parallel.sh
Run ./run_script/run_parallel.sh

Prepare data for Stage 2

Configure ./inference/predict_semkey_parallel_and_pack.sh
-> You need to specify path-to-stage1 (SemKey parallel) checkpoint
Run ./inference/predict_semkey_parallel_and_pack.sh

Stage 2 (Semkey E2E | end-to-end training)

Configure ./run_script/run_e2e.sh
-> You need to specify path-to-stage1 (SemKey parallel) checkpoint
-> You need to specify path-to-stage2dataset (generated by ./inference/predict_semkey_parallel_and_pack.sh)
Run ./run_script/run_e2e.sh

📈 Evaluation

Tip

Please run from project's root directory (i.e. SemKey/ )

Configure ./run_script/run_evaluation_csv.sh

CSV_FILE_PATH: path to the generated csv file when training the SemKey End-to-End (SemKey E2E) model

Run ./run_script/run_evaluation_csv.sh

Tip

The results will be saved next to the csv file path (in json).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond LLM Priors: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

📌 Overview

🛠️ Installation & Setup

🖥️ Environment Setup

📊 Data Preparation

1.Download ZuCo Dataset

2.Preprocess

🔄 Upgrade package: `Transformers`

🔥 Training

Stage 1 (Semkey Parallel)

Prepare data for Stage 2

Stage 2 (Semkey E2E | end-to-end training)

📈 Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
evaluation		evaluation
figure		figure
inference		inference
label_generation		label_generation
model		model
preprocess		preprocess
run_script		run_script
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Folders and files

Latest commit

History

Repository files navigation

Beyond LLM Priors: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

📌 Overview

🛠️ Installation & Setup

🖥️ Environment Setup

📊 Data Preparation

1.Download ZuCo Dataset

2.Preprocess

🔄 Upgrade package: Transformers

🔥 Training

Stage 1 (Semkey Parallel)

Prepare data for Stage 2

Stage 2 (Semkey E2E | end-to-end training)

📈 Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🔄 Upgrade package: `Transformers`

Packages