Question about reproducibility #2212
-
|
Dear Maarten, Thanks so much for developing this great tool. Recently, I am working on some abstracts downloaded from PubMed using BERTopic. I noticed the number of topics were different on different computers. On my work laptop, which is a DELL windows machine, only 3 topics were detected. However, I run the same code on my own machine, a MacBook Pro with M1 chip, I got 18 topics. The code was shown as the following: `umap_model = UMAP(n_neighbors = 5, n_components = 3, min_dist = 0.0, metric = 'cosine', random_state = 42) vectorizervectorizer_model = CountVectorizer(stop_words="english") Text generation with chatGPTclient = openai.OpenAI(api_key = token) All representation modelsrepresentation_model = { )` I guess maybe different computer architectures caused this problem. May I know your opinion on this problem? Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Ah, different OS will indeed result in different results. I'm not entirely sure but I remember that either was a result of HDBSCAN or UMAP, not any of the other models I believe. |
Beta Was this translation helpful? Give feedback.
Ah, different OS will indeed result in different results. I'm not entirely sure but I remember that either was a result of HDBSCAN or UMAP, not any of the other models I believe.