📚 Table of Contents

A program that automates extraction, cleaning, analysis, and scoring of sentence productivity. This tool is designed to process .cha and .txt transcript files, extract [+rr] utterances, and compute productivity metrics.

📚 Table of Contents

Prerequisites
Setup
Running the Pipeline
Command-Line Interface
Python Interface
Project Structure

🧾 Prerequisites

Python 3.10+
stanza NLP library (pip install stanza)
ffmpeg if processing audio transcripts in future
Optional: GPU for faster NLP processing

⚙️ Setup

Create a virtual enviroment of your choice (conda/venv), then clone repository:

git clone https://github.com/pyraxal/automatic_productivity_scorer
cd automatic_productivity_scorer
pip install -r requirements.txt

Stanza models are automatically downloaded when first running analysis.

🛠 Running the Pipeline

Command-Line Interface (CLI)

Drop your .cha or .txt transcript files in the input/ folder next to main.py. Then run:

python main.py

Processed CSVs are saved in output/
Original files are moved to done/
You can also specify a single file or folder: python main.py -p input/myfile.cha python main.py -p input_folder/

Python Interactive Interface

You can also call the analysis pipeline from Python:

from interface import run_full_pipeline

with open("input/sample.cha", "r", encoding="utf-8") as f:
    text = f.read()

output_csv = "output/sample_results.csv"
run_full_pipeline(text, output_csv)

📁 Project Structure

├── main.py                 # Command-line script
├── interface.py            # Python interface to run full pipeline
├── extract_clean.py        # Extracts [+rr] utterances and cleans text
├── analyze.py              # NLP analysis for articles, auxiliaries, and progressive forms
├── score.py                # Scoring and CSV output
├── input/                  # Place raw transcript files here
    ├── something.cha  
├── output/                 # CSV results are saved here
    ├── something.csv 
├── processed/              # Processed files are moved here
    ├── something.cha 
└── README.txt

⚠️ Notes / WIP

Currently focused on English transcripts
Scoring metrics and technical definitions are a work in progress with SLP collaboration.
Future updates will integrate api access + restructure of cs
Better comments and documentation (speed release/prototype for testing, finalization later)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 Table of Contents

🧾 Prerequisites

⚙️ Setup

🛠 Running the Pipeline

Command-Line Interface (CLI)

Python Interactive Interface

📁 Project Structure

⚠️ Notes / WIP

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
analyze.py		analyze.py
compact.py		compact.py
extract_clean.py		extract_clean.py
interface.py		interface.py
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
score.py		score.py

Folders and files

Latest commit

History

Repository files navigation

📚 Table of Contents

🧾 Prerequisites

⚙️ Setup

🛠 Running the Pipeline

Command-Line Interface (CLI)

Python Interactive Interface

📁 Project Structure

⚠️ Notes / WIP

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages