JLPT Vocabulary in JSON and CSV Formats

The original files from: https://www.tanos.co.uk/jlpt/ are formatted in difficult to read file structures like .anki, .mem, .doc, .pdf. This Project aims to provide a clean, lossless and easy format for these files.

You can download the files here: Latest release

The project files include parsed versions of the pdf files both a raw parse with minimal manipulation of the data and a more filtered version called "..._cleaned". The cleaned files have a stricter filter and have had the following manual changes made:

Manual Changes:
n1_vocab_cleaned.csv:
Removed "対立,たいりつ". Reason: Already defined in N2.
n2_vocab_cleaned.csv:
Updated: "=立" to "対立". Reason: Likly a conversion bug.
Removed "あげる (=やる),あげる (=やる)". Reason: Doesnt fit any words that are not already defined.
n3_vocab_cleaned.csv:
Removed: "暖かい,あたたか(い)" Reason: Already defined in N5.

NOTE: NEW IN v1.4:

With the new structuring of the data it is now possible to check the reading of the word. Problems with entries such as "年" where it can have the reading as "とし", "ねん", "とせ". Not to mention all the readings for 生...

Kanji and readings with multiple levels are preserved:

    "挨拶": [
        {
            "reading": "あいさつ",
            "level": 3
        }
    ],
    "あいさつ": [
        {
            "reading": "あいさつ",
            "level": 4
        }
    ],

If used with JMDict, these words are not found:
0: "依" 1: "伊井" 2: "お八" 3: "僅" 4: "巨" 5: "佐" 6: "しいんと" 7: "働" 8: "伐" 9: "不山戯る" 10: "倣" 11: "藍褸" 12: "あひら" 13: "いっていらっしゃい" 14: "おげんきで" 15: "おまちください" 16: "ごぞんじですか" 17: "滑れる" 18: "Ͼ立" 19: "×" 20: "ぺん" 21: "よると"

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pdfParser.py		pdfParser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JLPT Vocabulary in JSON and CSV Formats

NOTE: NEW IN v1.4:

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JLPT Vocabulary in JSON and CSV Formats

NOTE: NEW IN v1.4:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages