several bug fixes to twsi_evaluation.py script (see comments near changes) by mpelevina · Pull Request #2 · uhh-lt/context-eval

mpelevina · 2016-01-21T17:20:18Z

please, review my suggestions for bug fixes in the evaluation script.

alexanderpanchenko · 2016-01-21T17:33:46Z

why python engine?

mpelevina · 2016-01-21T23:20:29Z

The original command triggers the following error:

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.
  substitutions = read_csv(INVENTORY_FILE, '/\t+/', encoding='utf8', header=None)

alexanderpanchenko · 2016-01-22T12:00:01Z

the proper fix would be rather to

  substitutions = read_csv(INVENTORY_FILE, separator='\t', encoding='utf8', header=None)

but we must be sure the INVENTORY_FILE has no multiple tabs (which would be actually the bug as well)

can you update and check if it works and make another pull request?

mpelevina · 2016-01-22T12:21:18Z

This (using '\t') is indeed what I've tried at first, but using it would require rewriting parts of this script.
The thing is, the read_csv command doesn't actually separates rows into expected columns (word, sense id, substitutions/related_words). As you can see on line 103 the actual separation of a row with '\t' takes place.

word, t_id, subs = s[0].split('\t')

The same applies to all three calls of read_csv (substitutions, inventory, predictions). It did seem strange to me, but I've decided not to change it without knowing why it had been implemented like this. I think it might be brought up as an "issue".

alexanderpanchenko · 2016-01-22T12:42:35Z

word, t_id, subs = s[0].split('\t')

this is really strange way to do it indeed.

it would be great if you can fix it changing the corresponding code. take your time to do so. we will need this evaluation code a lot so better to have it right, rather than monkeypatching it.

if you need to modify the format of input files go ahead to fix the problem

mpelevina · 2016-01-22T14:11:54Z

Ok, I've put it on my todo list.

alexanderpanchenko · 2016-01-22T16:27:52Z

ok. thank you

several bug fixes (see comments near changes)

a32a33c

alexanderpanchenko reviewed Jan 21, 2016
View reviewed changes

Comment thread twsi_evaluation.py

Copy link
Copy Markdown

Contributor

alexanderpanchenko Jan 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why python engine?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

several bug fixes to twsi_evaluation.py script (see comments near changes)#2

several bug fixes to twsi_evaluation.py script (see comments near changes)#2
mpelevina wants to merge 1 commit intouhh-lt:masterfrom
mpelevina:master

mpelevina commented Jan 21, 2016

Uh oh!

alexanderpanchenko Jan 21, 2016

Uh oh!

mpelevina commented Jan 21, 2016

Uh oh!

alexanderpanchenko commented Jan 22, 2016

Uh oh!

mpelevina commented Jan 22, 2016

Uh oh!

alexanderpanchenko commented Jan 22, 2016

Uh oh!

mpelevina commented Jan 22, 2016

Uh oh!

alexanderpanchenko commented Jan 22, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mpelevina commented Jan 21, 2016

Uh oh!

alexanderpanchenko Jan 21, 2016

Choose a reason for hiding this comment

Uh oh!

mpelevina commented Jan 21, 2016

Uh oh!

alexanderpanchenko commented Jan 22, 2016

Uh oh!

mpelevina commented Jan 22, 2016

Uh oh!

alexanderpanchenko commented Jan 22, 2016

Uh oh!

mpelevina commented Jan 22, 2016

Uh oh!

alexanderpanchenko commented Jan 22, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants