Kim's local files for the 2013 computational linguistics Multi-Universty Research Initiative at Carnegie Mellon University. This repository contains the custom files used to crawl the internet for Kinyarwanda text using the Apache Nutch web crawler. This project yielded 5.5 million tokens of 2.5k types. Summer 2013, Carnegie Mellon University, Language Technologies Institute.
spasarok/kinyarwandaCrawler
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|