OCR-Kanji

A hobby project. A light CNN which detects and classifies japanese kanjis from different difficulty levels (N5, N4 etc). It contains a training data generator which creates japanese newspaper-like images with position and class labels.

Training data generation

First have a look through the parameters in the below file and run it in order to generate (recommended) about 1000 training images and labels.

generate_data.py

You can look at the generated labels for an example image by running:

visualise.py

Training the model

Model architecture is found in:

model1.py

Run:

train.py

It utilises a data generator defined in:

dataset.py

Predict

Run:

evaluate.py

The script takes the image defined in the file, splits it into patches to feed to the network and then it predicts the location and difficulty of the kanjis it found. It's not perfect.

Improvements

Probably a good idea to add katakana and hiragana classes. The training data generator should include larger characters, different fonts and diverse backgrounds.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
imgs		imgs
labels		labels
models		models
test_samples		test_samples
README.md		README.md
dataset.py		dataset.py
evaluate.py		evaluate.py
generate_data.py		generate_data.py
model1.py		model1.py
simsun.ttc		simsun.ttc
train.py		train.py
visualise.py		visualise.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR-Kanji

Training data generation

Training the model

Predict

Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OCR-Kanji

Training data generation

Training the model

Predict

Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages