A cutting-edge Retrieval-Augmented Generation system combining LangChain, Ollama, and modern full-stack architecture to deliver intelligent, context-aware responses with real-time streaming capabilities.
A comprehensive AI system with advanced retrieval strategies, real-time capabilities, and production-ready architecture
4 retrieval strategies (Semantic, Hybrid, Multi-Query, Decomposition) with cross-encoder re-ranking for 89% precision
WebSocket-based streaming responses with Socket.IO for instant, progressive AI-generated content delivery
React + TypeScript frontend, Flask API gateway, Express backend, and containerized deployment with Docker
ChromaDB for persistent embeddings, FAISS for fast retrieval, and BM25 index for keyword-based search
Dynamic entity extraction with regex patterns, intelligent API routing, and data enrichment from external sources
Maintains context across conversations with session-based memory management for coherent multi-turn dialogues
Modular, scalable architecture with clear separation of concerns and production-ready design patterns
graph TB
subgraph "Client Layer"
A[React Web App
Port 3000]
B[CLI Interface]
C[Jupyter Notebook]
D[HTTP/WebSocket Client]
end
subgraph "API Gateway Layer"
E[Flask + SocketIO
Port 5000]
F[REST Endpoints]
G[WebSocket Handler]
end
subgraph "Processing Layer"
H[Advanced RAG Engine]
I[4 Retrieval Strategies]
J[Cross-Encoder Reranker]
K[Memory Manager]
end
subgraph "AI/ML Layer"
L[Ollama LLM
llama2]
M[HuggingFace Embeddings
all-MiniLM-L6-v2]
N[Vector Store
ChromaDB + FAISS]
end
subgraph "Backend Services"
O[Express API
Port 3456]
P[MongoDB]
Q[Swagger Docs]
end
A --> E
B --> H
C --> H
D --> E
E --> F
E --> G
F --> H
G --> H
H --> I
H --> J
H --> K
H --> L
H --> M
H --> N
H --> O
O --> P
O --> Q
style A fill:#2196f3,color:#fff
style E fill:#4caf50,color:#fff
style H fill:#ff9800,color:#fff
style L fill:#f44336,color:#fff
style O fill:#00bcd4,color:#fff
graph TB
A[User Query] --> B{Query Type Detection}
B -->|Greeting| C[Return Preset Response]
B -->|Complex Query| D[Document Retrieval]
D --> E[Vector Search
Top-K Similarity]
E --> F[Entity Extraction]
F --> G{Extract Entities}
G -->|Person| H[Call Team API]
G -->|Company| I[Call Investments API]
G -->|Sector| J[Call Sectors API]
G -->|URL| K[Call Scrape API]
H --> L[Aggregate API Data]
I --> L
J --> L
K --> L
L --> M[Build LLM Prompt]
E --> M
N[Conversation History] --> M
M --> O[Ollama LLM
Generate Response]
O --> P[Update History]
P --> Q[Return Response]
C --> Q
style A fill:#e3f2fd,color:#000
style G fill:#fff3e0,color:#000
style O fill:#e8f5e9,color:#000
style Q fill:#f3e5f5,color:#000
graph TD
START[User Query] --> STRATEGY{Select Strategy}
STRATEGY -->|Semantic| S1[Vector Similarity
ChromaDB Search]
STRATEGY -->|Hybrid| S2[Ensemble
50% Vector + 50% BM25]
STRATEGY -->|Multi-Query| S3[Generate Variations
Retrieve All]
STRATEGY -->|Decomposed| S4[Break into Sub-Queries
Retrieve Each]
S1 --> DOCS1[Retrieved Documents]
S2 --> DOCS2[Retrieved Documents]
S3 --> DOCS3[Retrieved Documents]
S4 --> DOCS4[Retrieved Documents]
DOCS1 --> RERANK[Cross-Encoder
Re-Ranking]
DOCS2 --> RERANK
DOCS3 --> RERANK
DOCS4 --> RERANK
RERANK --> TOP[Top-K Documents
Sorted by Relevance]
TOP --> LLM[Generate Response
with Citations]
style START fill:#e3f2fd,color:#000
style STRATEGY fill:#fff3e0,color:#000
style RERANK fill:#fce4ec,color:#000
style LLM fill:#e8f5e9,color:#000
Best for conceptual questions and finding semantically similar content
General PurposeCombines semantic + keyword (BM25) for balanced results
RecommendedGenerates query variations for comprehensive coverage
ExploratoryBreaks complex questions into sub-queries
Complex Queries
mindmap
root((RAG AI System))
Frontend
React 18
TypeScript 5.7
Material-UI 6
Socket.IO Client
Vite
Backend API
Flask 3.1
Flask-SocketIO
Flask-CORS
Python 3.10+
RAG Engine
LangChain 0.3
ChromaDB
Sentence Transformers
Cross-Encoders
BM25Okapi
AI/ML
Ollama
llama2 Model
all-MiniLM-L6-v2
ms-marco-MiniLM
Services
Express.js
MongoDB
Swagger
Node.js
DevOps
Docker
Docker Compose
Nginx
Redis
Built with cutting-edge technologies and industry-standard tools
Get up and running in minutes with Docker Compose or local development setup
git clone https://github.com/hoangsonww/RAG-AI-System-Portfolio-Support.git
cd RAG-AI-System-Portfolio-Support
# Start all services with Docker Compose
docker-compose up -d
# Wait for services to initialize (~2 minutes)
# View logs
docker-compose logs -f
# Test Flask API health
curl http://localhost:5000/health
# Send a chat message
curl -X POST http://localhost:5000/api/chat \
-H "Content-Type: application/json" \
-d '{"query": "What are PeakSpan MasterClasses about?", "strategy": "hybrid"}'
# Install Ollama (macOS/Linux)
curl https://ollama.ai/install.sh | sh
# Start Ollama server
ollama serve &
# Pull llama2 model
ollama pull llama2
cd backend
npm install
# Create .env file
echo "MONGO_URI=mongodb://localhost:27017/rag_db" > .env
echo "PORT=3456" >> .env
# Start backend
npm start
cd ..
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
python app.py
cd frontend
npm install
npm run dev
Open the notebook in Google Colab with GPU runtime (T4 recommended)
Open in Colab# Install colab-xterm for terminal access
!pip install colab-xterm
%load_ext colabxterm
# Launch terminal
%xterm
# In the XTerm terminal
curl https://ollama.ai/install.sh | sh
ollama serve &
ollama pull llama2
# Install Python packages
!pip install langchain_community faiss-cpu sentence-transformers requests flask pyngrok
# Run the RAG script (provided in notebook)
# The script will download documents, build vector store, and start interactive chat
Try the live system or explore example queries and responses
curl -X POST http://localhost:5000/api/chat \
-H "Content-Type: application/json" \
-d '{
"query": "What investment strategies does PeakSpan use?",
"strategy": "hybrid",
"session_id": "your-session-id"
}'
import io from 'socket.io-client';
const socket = io('http://localhost:5000');
socket.on('connect', () => {
socket.emit('chat_message', {
query: 'Tell me about portfolio companies',
strategy: 'hybrid'
});
});
socket.on('response_chunk', (data) => {
console.log(data.chunk); // Streaming response
});
socket.on('response_complete', (data) => {
console.log('Sources:', data.sources);
});
from advanced_rag_engine import AdvancedRAGEngine, RAGConfig, RetrievalStrategy
# Initialize engine
engine = AdvancedRAGEngine(RAGConfig())
engine.initialize_from_api()
# Query with hybrid search
result = engine.query(
"What are the key leadership topics?",
strategy=RetrievalStrategy.HYBRID
)
print(f"Response: {result['response']}")
print(f"Sources: {result['sources']}")
Explore the live demo or deploy your own instance
Comprehensive documentation, guides, and learning materials
Complete project documentation with setup instructions and usage examples
Detailed system architecture, design patterns, and scalability considerations
Fast-track guide to get the system running in minutes
Interactive Google Colab notebook with GPU support
Swagger/OpenAPI documentation for all backend endpoints
Full source code, issues, and contributions
Additional materials to learn AI/ML concepts and techniques: