geekydevu/boolean-retrieval-model
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
A simple Information retrieval engine which works on boolean queries in python. A boolean query contains the operators AND, OR, NOT .
Requirements :
[1.] IPython notebook installed
[2.] Corpus for building the boolean IR system
Notes on how to take input -:
-> On running the ipython notebook, it asks the user to input the number of documents in th training corpora. The documents should be named in the way -> "doc"+str(doc_id)+".txt". For eg. if there are 10 documents, then there names should be "doc1.txt","doc2.txt","doc3.txt","doc4.txt","doc5.txt","doc6.txt","doc7.txt","doc8.txt","doc9.txt","doc1.txt".
-> The documents must be placed in a folder named 'local_corpus' . The folder 'local_corpus' and the ipython file must be in the same directory.
-> The queries supported in my basic model are of the form AND, OR, NEGATION
e.g. 1.) machine AND learning
2.) politics AND football
3.) world AND wide AND web AND internet
4.) politics or football
5.) information OR retrieval OR search OR engines
6.) NEGATION(supervised)
7.) NEGATION(regression)