📄 RAG (For Your Document)

Advanced Retrieval-Augmented Generation System

RAG Pipeline Architecture

Overview

A powerful Retrieval-Augmented Generation (RAG) application that enables users to upload documents and query them using natural language. Built with cutting-edge AI technologies including Streamlit, LangChain, and Groq AI, this system provides intelligent answers backed by source attribution.

Key Features

Multi-format Support
PDF, CSV, and TXT files
Configurable Chunking
Adjustable text chunking with overlap
Local Vector Storage
ChromaDB integration
Semantic Search
Sentence Transformers powered
AI-Powered Responses
Groq's Gemma2-9b-it model
Source Attribution
Track answer sources

RAG Pipeline Architecture

Document Upload → Text Splitting → Embeddings → Vector Store → Similarity Search → LLM Context

The system processes documents through a sophisticated pipeline that converts text into embeddings, stores them in a vector database, performs similarity searches, and generates contextual responses using a large language model.

Technology Stack

Streamlit LangChain Groq AI Gemma2-9b-it Sentence Transformers ChromaDB Python

Installation

# Clone the repository
git clone https://github.com/lakshya1410/RAG_for_your_document.git
cd RAG_for_your_document

# Install dependencies
pip install -r requirements.txt

# Create .env file with your Groq API key
echo "GROQ_API_KEY=your_groq_api_key_here" > .env

Note: Get your free Groq API key from console.groq.com

Usage

streamlit run main.py
  1. Upload documents (PDF, CSV, TXT) via the sidebar
  2. Configure chunk size (default: 1000) and overlap (default: 100)
  3. Click "Submit & Process"
  4. Ask questions and get AI-generated answers with source attribution

Programmatic Usage

from rag_pipeline import process_files, ask_question

# Process documents
with open('document.pdf', 'rb') as f:
    process_files([f], chunk_size=1000, chunk_overlap=100)

# Ask questions
answer, sources = ask_question("What is the main topic?", k=3)

Troubleshooting

  • "GROQ_API_KEY not found": Create .env file with valid API key
  • "Vector store is empty": Upload and process documents first
  • "Error processing file": Ensure supported format (PDF, CSV, TXT) and valid encoding