This repository contains my hands‑on exploration of LangChain and various vector database integrations.
I experimented with data ingestion, text splitting, embeddings and vector stores (FAISS, ChromaDB) to understand how LLM applications are built end‑to‑end.
dataingestion.ipynb→ Notebook for loading and preparing raw data.Hospital.pdf→ Sample document used for ingestion.speech.txt→ Sample text file for testing.- Transformers & Splitters:
CharacterTextSplitter.ipynbHTMLtextSplitter.ipynbJsonSplitter.ipynbtext_splitter.ipynb
HuggingFace.ipynb→ Using Hugging Face models for embeddings.ollamaembedding.ipynb→ Generating embeddings with Ollama models.
- FAISS
Faiss.ipynb→ Working with FAISS vector store.faiss_index/,index.faiss,index.pkl→ Saved FAISS index files.
- ChromaDB
chroma.ipynb→ Working with ChromaDB vector store.chroma_db/62c68d8c-5165-454a-9838-3417bce0a066/→ Local Chroma database folder.
.gitignore→ Ensures venv, indexes, and DB files aren’t committed unnecessarily.requirements.txt→ Python dependencies for running notebooks and scripts.