Distributed LDA: checking the length of docs instead of the boolean v… #1191

saparina · 2017-03-07T22:34:45Z

…alue
Possibly solve the issue #911

…alue

tmylk · 2017-03-07T23:13:40Z

Could you please setup the distributed workers on your box and check if it actually solves #911. Have you been able to reproduce #911?

saparina · 2017-03-08T08:07:39Z

@tmylk Yes, I reproduced #911 in the way it's described here :

import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

from gensim import models, corpora
corpus = corpora.MmCorpus('deerwester.mm') # load a corpus of nine documents, from the Tutorials
id2word = corpora.Dictionary.load('deerwester.dict')

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=100, distributed=True)

I got the same error:

2017-03-08 10:40:51,078 : INFO : loaded corpus index from deerwester.mm.index
2017-03-08 10:40:51,078 : INFO : initializing corpus reader from deerwester.mm
2017-03-08 10:40:51,078 : INFO : accepted corpus with 9 documents, 12 features, 28 non-zero entries
2017-03-08 10:40:51,078 : INFO : loading Dictionary object from deerwester.dict
2017-03-08 10:40:51,078 : INFO : loaded deerwester.dict
2017-03-08 10:40:51,079 : INFO : using symmetric alpha at 0.01
2017-03-08 10:40:51,079 : INFO : using symmetric eta at 0.08333333333333333
2017-03-08 10:40:51,147 : INFO : using distributed version with 2 workers
2017-03-08 10:40:51,163 : INFO : running online LDA training, 100 topics, 1 passes over the supplied corpus of 9 documents, updating model once every 9 documents, evaluating perplexity every 9 documents, iterating 50x with a convergence threshold of 0.001000
2017-03-08 10:40:51,163 : WARNING : too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
2017-03-08 10:40:51,163 : INFO : initializing 2 workers
Traceback (most recent call last):
  File "LDA+issue.py", line 9, in <module>
    lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=100, distributed=True)
  File "/home/irina/GSoC/gensim/gensim/models/ldamodel.py", line 334, in __init__
    self.update(corpus, chunks_as_numpy=use_numpy)
  File "/home/irina/GSoC/gensim/gensim/models/ldamodel.py", line 635, in update
    self.log_perplexity(chunk, total_docs=lencorpus)
  File "/home/irina/GSoC/gensim/gensim/models/ldamodel.py", line 526, in log_perplexity
    perwordbound = self.bound(chunk, subsample_ratio=subsample_ratio) / (subsample_ratio * corpus_words)
  File "/home/irina/GSoC/gensim/gensim/models/ldamodel.py", line 727, in bound
    gammad, _ = self.inference([doc])
  File "/home/irina/GSoC/gensim/gensim/models/ldamodel.py", line 428, in inference
    if doc and not isinstance(doc[0][0], six.integer_types):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In distributed mode chunks keeps as np arrays and expression like doc == True is incorrect for doc np array.
I check it on one machine with two workers and now it works in distributed mode.

tmylk · 2017-03-08T20:20:47Z

Thanks for the PR!

…alue, plus int index conversion (piskvorky#1191)

Distributed LDA: checking the length of docs instead of the boolean v…

e6563a4

…alue

tmylk merged commit ed757df into piskvorky:develop Mar 8, 2017

tmylk mentioned this pull request Mar 8, 2017

Distributed LDA "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" #911

Closed

pranaydeeps pushed a commit to pranaydeeps/gensim that referenced this pull request Mar 21, 2017

Distributed LDA: checking the length of docs instead of the boolean v…

3ad90d8

…alue, plus int index conversion (piskvorky#1191)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Distributed LDA: checking the length of docs instead of the boolean v… #1191

Distributed LDA: checking the length of docs instead of the boolean v… #1191

saparina commented Mar 7, 2017

Uh oh!

tmylk commented Mar 7, 2017

Uh oh!

saparina commented Mar 8, 2017 •

edited

Loading

Uh oh!

tmylk commented Mar 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Distributed LDA: checking the length of docs instead of the boolean v… #1191

Distributed LDA: checking the length of docs instead of the boolean v… #1191

Conversation

saparina commented Mar 7, 2017

Uh oh!

tmylk commented Mar 7, 2017

Uh oh!

saparina commented Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tmylk commented Mar 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

saparina commented Mar 8, 2017 •

edited

Loading