Dead project? Older version in other github still works, here is a how-to for the record. 

In case someone is interested in running this, as of this day I was able to run the earlier version of this project available here by the same author back in the day:

> https://github.com/StanfordHCI/termite/blob/master/README.old 

While trying to find a solution to make the current project run, this .txt file works as an example input: 

> https://github.com/YingHsuan/termite_data_server/blob/master/apps/mobile_payment_mallet/data/corpus.txt

Was a good example that matches the format of the old version. 

The readme was more friendly on making sense of running the code. One thing to lookout for is that it will throw an error when it gets to the `compiler-latest.zip` saying it couldn't move the file. The file as of today (I was surprised to see all links working for download despite being 4 years later!) will have inside the .zip a jar file containing the expect name+version. Simple extract the closure<version>.jar file, and rename it to `closure.jar` inside the `lib` folder. Re-running the script will then rename it to the intended name, and finish installing. 

For running the script, I had some issues with the config file path, but the script allows to make the 3 paths explicit:

> ./execute.py --corpus-path ~/Desktop/finance_corpus.txt carlos_lda.cfg --model-path example-project2/topic-model/ --data-path example-project2/

It will create the folders for you or overwrite. The paths provided through the command line example above from left to right are the same required by the `.cfg` script from top to bottom. Provide the corpus.txt on the referred link (or any that follows the format `doc-id\ttext` and it should work. In practice, somewhere along the pipeline I experienced errors. This corpus, which is already tokenized ran like a breeze instead, so I imagine it would be best to tokenize using some other library before putting here. 

Finally, there was some issue on the old project where it was pointed out by the author of the code that a small corpus may lead to throwing an error due to running out of vocabulary or something. 

The visualization for this file took about half an hour to get done on a 2016 Macbook on 16 GB Ram in contrast to running LDA on R `topicmodels` package that takes about 3 minutes, plus loading on another visualization work that referred this one ([LDAVis](https://github.com/cpsievert/LDAvis) on github), which is about 1 minute. 

I wish the visualization didn't attempt to do the entire process from start, but rather required the data as the other authors did (i.e. the matrixes and a few vectors). Would facilitate a lot on reusability. 

If anyone is interested in how the output looks like in the end, here it is:

<img width="1396" alt="screenshot 2017-07-07 22 16 56" src="https://user-images.githubusercontent.com/17270563/27983854-d13aeb68-6362-11e7-8a80-1252b0ab81ce.png">

You can also select multiple topics. Sadly, the old version does not include the document view pane and the project seems abandoned now. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dead project? Older version in other github still works, here is a how-to for the record. #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Dead project? Older version in other github still works, here is a how-to for the record. #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions