Overview
A powerful Retrieval-Augmented Generation (RAG) application that enables users to upload documents and query them using natural language. Built with cutting-edge AI technologies including Streamlit, LangChain, and Groq AI, this system provides intelligent answers backed by source attribution.
Key Features
PDF, CSV, and TXT files
Adjustable text chunking with overlap
ChromaDB integration
Sentence Transformers powered
Groq's Gemma2-9b-it model
Track answer sources
RAG Pipeline Architecture
The system processes documents through a sophisticated pipeline that converts text into embeddings, stores them in a vector database, performs similarity searches, and generates contextual responses using a large language model.
Technology Stack
Installation
# Clone the repository
git clone https://github.com/lakshya1410/RAG_for_your_document.git
cd RAG_for_your_document
# Install dependencies
pip install -r requirements.txt
# Create .env file with your Groq API key
echo "GROQ_API_KEY=your_groq_api_key_here" > .env
            Note: Get your free Groq API key from console.groq.com
Usage
streamlit run main.py
            - Upload documents (PDF, CSV, TXT) via the sidebar
 - Configure chunk size (default: 1000) and overlap (default: 100)
 - Click "Submit & Process"
 - Ask questions and get AI-generated answers with source attribution
 
Programmatic Usage
from rag_pipeline import process_files, ask_question
# Process documents
with open('document.pdf', 'rb') as f:
    process_files([f], chunk_size=1000, chunk_overlap=100)
# Ask questions
answer, sources = ask_question("What is the main topic?", k=3)
            Troubleshooting
- "GROQ_API_KEY not found": Create .env file with valid API key
 - "Vector store is empty": Upload and process documents first
 - "Error processing file": Ensure supported format (PDF, CSV, TXT) and valid encoding