-
Notifications
You must be signed in to change notification settings - Fork 258
Supporting blog content - Local rag with lightweight Elasticsearch #488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Supporting blog content - Local rag with lightweight Elasticsearch #488
Conversation
|
Review these changes at https://app.gitnotebooks.com/elastic/elasticsearch-labs/pull/488 |
.../local-rag-with-lightweight-elasticsearch/app-logs/llama-smoltalk-3.2-1b-instruct_results.md
Show resolved
Hide resolved
.../local-rag-with-lightweight-elasticsearch/app-logs/llama-smoltalk-3.2-1b-instruct_results.md
Show resolved
Hide resolved
|
|
||
|
|
||
| ## Stats | ||
| ✅ Indexed 5 documents in 250ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the indexing differing between models? I would expect indexing is a one-off operation independent to the model. This doesn't make sense to me. Should it be removed or clarified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure why it differs between tests; I think it’s better to remove it. I did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change this example to something generic such as "Why is the sky blue?" or something else. As a developer this comes across as quite cringy to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. I added a new file showing the output response. Related commit.
| from openai import OpenAI | ||
|
|
||
| ES_URL = "http://localhost:9200" | ||
| ES_API_KEY = "your-api-key-here" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change the URL, API key and LOCAL_AI_URL values to environment variables that are loaded via something like dotenv and a local .env file. While this is fine for local development, when developers try to move this to production they need to tidy it up. So lets set the example now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! Added dotenv support! Related commit.
| ai_client = OpenAI(base_url=LOCAL_AI_URL, api_key="sk-x") | ||
|
|
||
|
|
||
| def build_documents(dataset_folder, index_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called load_documents as it's opening text files. It's not really building the documents from scratch which is misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Method renamed!
| if filename.endswith(".txt"): | ||
| filepath = os.path.join(dataset_folder, filename) | ||
|
|
||
| with open(filepath, "r", encoding="utf-8") as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a comment explaining why you've used utf-8 encoding here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added!
supporting-blog-content/local-rag-with-lightweight-elasticsearch/script.py
Show resolved
Hide resolved
supporting-blog-content/local-rag-with-lightweight-elasticsearch/script.py
Show resolved
Hide resolved
| try: | ||
| start_time = time.time() | ||
|
|
||
| success, _ = helpers.bulk( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add the index creation code either here based on the condition that the index doesn't exist, or in a separate utility function. For semantic text you'll need to specify that mapping when creating the index, and that step is missing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the index creation as a method in the script. I also added a verification step before bulk-indexing the data.
ulations
carlyrichmond
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.