Skip to content

Conversation

@Delacrobix
Copy link
Contributor

No description provided.

@gitnotebooks
Copy link

gitnotebooks bot commented Sep 18, 2025



## Stats
✅ Indexed 5 documents in 250ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the indexing differing between models? I would expect indexing is a one-off operation independent to the model. This doesn't make sense to me. Should it be removed or clarified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure why it differs between tests; I think it’s better to remove it. I did.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this example to something generic such as "Why is the sky blue?" or something else. As a developer this comes across as quite cringy to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. I added a new file showing the output response. Related commit.

from openai import OpenAI

ES_URL = "http://localhost:9200"
ES_API_KEY = "your-api-key-here"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the URL, API key and LOCAL_AI_URL values to environment variables that are loaded via something like dotenv and a local .env file. While this is fine for local development, when developers try to move this to production they need to tidy it up. So lets set the example now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! Added dotenv support! Related commit.

ai_client = OpenAI(base_url=LOCAL_AI_URL, api_key="sk-x")


def build_documents(dataset_folder, index_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called load_documents as it's opening text files. It's not really building the documents from scratch which is misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Method renamed!

if filename.endswith(".txt"):
filepath = os.path.join(dataset_folder, filename)

with open(filepath, "r", encoding="utf-8") as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment explaining why you've used utf-8 encoding here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added!

try:
start_time = time.time()

success, _ = helpers.bulk(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add the index creation code either here based on the condition that the index doesn't exist, or in a separate utility function. For semantic text you'll need to specify that mapping when creating the index, and that step is missing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the index creation as a method in the script. I also added a verification step before bulk-indexing the data.

Copy link
Contributor

@carlyrichmond carlyrichmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants