Question about reproducibility #2212

powerhorse1986 · 2024-11-12T21:09:28Z

powerhorse1986
Nov 12, 2024

Dear Maarten,

Thanks so much for developing this great tool. Recently, I am working on some abstracts downloaded from PubMed using BERTopic. I noticed the number of topics were different on different computers. On my work laptop, which is a DELL windows machine, only 3 topics were detected. However, I run the same code on my own machine, a MacBook Pro with M1 chip, I got 18 topics. The code was shown as the following:

`umap_model = UMAP(n_neighbors = 5, n_components = 3, min_dist = 0.0, metric = 'cosine', random_state = 42)
hdbscan_model = HDBSCAN(min_cluster_size = 15, metric = 'euclidean', cluster_selection_method = 'eom',
prediction_data = True)

vectorizer

vectorizer_model = CountVectorizer(stop_words="english")

Text generation with chatGPT

client = openai.OpenAI(api_key = token)
#chatgpt = OpenAI(client, model = 'gpt-3.5-turbo', prompt = prompt, chat = True)
chatgpt = OpenAI(client, model = 'gpt-4-turbo', prompt = prompt, chat = True)

All representation models

representation_model = {
#'KeyBERT': keybert,
'ChatGPT4_Turbo': chatgpt
#'MMR': mmr
}
topic_model = BERTopic(
# Vectorizer
vectorizer_model = vectorizer_model,

# Sub-models
embedding_model = embedding_model,
umap_model = umap_model,
hdbscan_model = hdbscan_model,
representation_model = representation_model,
calculate_probabilities = True,

# Hyperparameters
top_n_words = 100,
verbose = True

)`

I guess maybe different computer architectures caused this problem. May I know your opinion on this problem? Thank you.

Answered by MaartenGr

Nov 13, 2024

Ah, different OS will indeed result in different results. I'm not entirely sure but I remember that either was a result of HDBSCAN or UMAP, not any of the other models I believe.

View full answer

MaartenGr · 2024-11-13T14:07:38Z

MaartenGr
Nov 13, 2024
Maintainer

Ah, different OS will indeed result in different results. I'm not entirely sure but I remember that either was a result of HDBSCAN or UMAP, not any of the other models I believe.

1 reply

powerhorse1986 Nov 14, 2024
Author

Hi Maarten,

Thank you so much for your response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about reproducibility #2212

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Question about reproducibility #2212

Uh oh!

powerhorse1986 Nov 12, 2024

vectorizer

Text generation with chatGPT

All representation models

Replies: 1 comment · 1 reply

Uh oh!

MaartenGr Nov 13, 2024 Maintainer

Uh oh!

powerhorse1986 Nov 14, 2024 Author

powerhorse1986
Nov 12, 2024

Replies: 1 comment 1 reply

MaartenGr
Nov 13, 2024
Maintainer

powerhorse1986 Nov 14, 2024
Author