Production-Ready AI System

Advanced RAG AI System for Portfolio Support

A cutting-edge Retrieval-Augmented Generation system combining LangChain, Ollama, and modern full-stack architecture to deliver intelligent, context-aware responses with real-time streaming capabilities.

4
Retrieval Strategies
89%
Precision @5
Real-time
WebSocket Streaming
Full-Stack
React + Flask + Express

Key Features

A comprehensive AI system with advanced retrieval strategies, real-time capabilities, and production-ready architecture

Advanced RAG Techniques

4 retrieval strategies (Semantic, Hybrid, Multi-Query, Decomposition) with cross-encoder re-ranking for 89% precision

  • Semantic search with embeddings
  • Hybrid BM25 + Vector search
  • Multi-query expansion
  • Query decomposition

Real-Time Streaming

WebSocket-based streaming responses with Socket.IO for instant, progressive AI-generated content delivery

  • Live response streaming
  • Session management
  • Multi-user support
  • Connection resilience

Modern Full-Stack Architecture

React + TypeScript frontend, Flask API gateway, Express backend, and containerized deployment with Docker

  • React 18 + Material-UI
  • Flask + SocketIO
  • Express + MongoDB
  • Docker Compose

Persistent Vector Storage

ChromaDB for persistent embeddings, FAISS for fast retrieval, and BM25 index for keyword-based search

  • ChromaDB persistence
  • FAISS similarity search
  • BM25 keyword search
  • Hybrid retrieval

API Integration & Chaining

Dynamic entity extraction with regex patterns, intelligent API routing, and data enrichment from external sources

  • Entity extraction
  • API chaining
  • Data enrichment
  • Context aggregation

Conversation Memory

Maintains context across conversations with session-based memory management for coherent multi-turn dialogues

  • Session persistence
  • Context preservation
  • Follow-up handling
  • History management

System Architecture

Modular, scalable architecture with clear separation of concerns and production-ready design patterns

High-Level Architecture

graph TB
    subgraph "Client Layer"
        A[React Web App
Port 3000] B[CLI Interface] C[Jupyter Notebook] D[HTTP/WebSocket Client] end subgraph "API Gateway Layer" E[Flask + SocketIO
Port 5000] F[REST Endpoints] G[WebSocket Handler] end subgraph "Processing Layer" H[Advanced RAG Engine] I[4 Retrieval Strategies] J[Cross-Encoder Reranker] K[Memory Manager] end subgraph "AI/ML Layer" L[Ollama LLM
llama2] M[HuggingFace Embeddings
all-MiniLM-L6-v2] N[Vector Store
ChromaDB + FAISS] end subgraph "Backend Services" O[Express API
Port 3456] P[MongoDB] Q[Swagger Docs] end A --> E B --> H C --> H D --> E E --> F E --> G F --> H G --> H H --> I H --> J H --> K H --> L H --> M H --> N H --> O O --> P O --> Q style A fill:#2196f3,color:#fff style E fill:#4caf50,color:#fff style H fill:#ff9800,color:#fff style L fill:#f44336,color:#fff style O fill:#00bcd4,color:#fff

RAG Processing Pipeline

graph TB
    A[User Query] --> B{Query Type Detection}
    
    B -->|Greeting| C[Return Preset Response]
    B -->|Complex Query| D[Document Retrieval]
    
    D --> E[Vector Search
Top-K Similarity] E --> F[Entity Extraction] F --> G{Extract Entities} G -->|Person| H[Call Team API] G -->|Company| I[Call Investments API] G -->|Sector| J[Call Sectors API] G -->|URL| K[Call Scrape API] H --> L[Aggregate API Data] I --> L J --> L K --> L L --> M[Build LLM Prompt] E --> M N[Conversation History] --> M M --> O[Ollama LLM
Generate Response] O --> P[Update History] P --> Q[Return Response] C --> Q style A fill:#e3f2fd,color:#000 style G fill:#fff3e0,color:#000 style O fill:#e8f5e9,color:#000 style Q fill:#f3e5f5,color:#000

Multi-Strategy Retrieval System

graph TD
    START[User Query] --> STRATEGY{Select Strategy}
    
    STRATEGY -->|Semantic| S1[Vector Similarity
ChromaDB Search] STRATEGY -->|Hybrid| S2[Ensemble
50% Vector + 50% BM25] STRATEGY -->|Multi-Query| S3[Generate Variations
Retrieve All] STRATEGY -->|Decomposed| S4[Break into Sub-Queries
Retrieve Each] S1 --> DOCS1[Retrieved Documents] S2 --> DOCS2[Retrieved Documents] S3 --> DOCS3[Retrieved Documents] S4 --> DOCS4[Retrieved Documents] DOCS1 --> RERANK[Cross-Encoder
Re-Ranking] DOCS2 --> RERANK DOCS3 --> RERANK DOCS4 --> RERANK RERANK --> TOP[Top-K Documents
Sorted by Relevance] TOP --> LLM[Generate Response
with Citations] style START fill:#e3f2fd,color:#000 style STRATEGY fill:#fff3e0,color:#000 style RERANK fill:#fce4ec,color:#000 style LLM fill:#e8f5e9,color:#000

Semantic Search

Best for conceptual questions and finding semantically similar content

General Purpose

Hybrid Search

Combines semantic + keyword (BM25) for balanced results

Recommended

Multi-Query

Generates query variations for comprehensive coverage

Exploratory

Decomposition

Breaks complex questions into sub-queries

Complex Queries

Technology Stack

mindmap
  root((RAG AI System))
    Frontend
      React 18
      TypeScript 5.7
      Material-UI 6
      Socket.IO Client
      Vite
    Backend API
      Flask 3.1
      Flask-SocketIO
      Flask-CORS
      Python 3.10+
    RAG Engine
      LangChain 0.3
      ChromaDB
      Sentence Transformers
      Cross-Encoders
      BM25Okapi
    AI/ML
      Ollama
      llama2 Model
      all-MiniLM-L6-v2
      ms-marco-MiniLM
    Services
      Express.js
      MongoDB
      Swagger
      Node.js
    DevOps
      Docker
      Docker Compose
      Nginx
      Redis
                    

Technologies & Stack

Built with cutting-edge technologies and industry-standard tools

Frontend

React 18 TypeScript 5.7 Material-UI 6 Vite Socket.IO Client

Backend

Python 3.10+ Flask 3.1 Node.js Express.js MongoDB

AI & ML

LangChain 0.3 Ollama HuggingFace ChromaDB FAISS Sentence Transformers

DevOps & Deployment

Docker Docker Compose Nginx Redis Swagger

Quick Start Guide

Get up and running in minutes with Docker Compose or local development setup

1

Clone Repository

git clone https://github.com/hoangsonww/RAG-AI-System-Portfolio-Support.git
cd RAG-AI-System-Portfolio-Support
2

Start All Services

# Start all services with Docker Compose
docker-compose up -d

# Wait for services to initialize (~2 minutes)
# View logs
docker-compose logs -f
3

Access the Application

4

Test the System

# Test Flask API health
curl http://localhost:5000/health

# Send a chat message
curl -X POST http://localhost:5000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"query": "What are PeakSpan MasterClasses about?", "strategy": "hybrid"}'
1

Install Ollama

# Install Ollama (macOS/Linux)
curl https://ollama.ai/install.sh | sh

# Start Ollama server
ollama serve &

# Pull llama2 model
ollama pull llama2
2

Setup Backend (Express)

cd backend
npm install

# Create .env file
echo "MONGO_URI=mongodb://localhost:27017/rag_db" > .env
echo "PORT=3456" >> .env

# Start backend
npm start
3

Setup RAG Application (Python)

cd ..
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
python app.py
4

Setup Frontend (React)

cd frontend
npm install
npm run dev
1

Open Google Colab Notebook

Open the notebook in Google Colab with GPU runtime (T4 recommended)

Open in Colab
2

Install Dependencies

# Install colab-xterm for terminal access
!pip install colab-xterm
%load_ext colabxterm

# Launch terminal
%xterm
3

Install Ollama in Terminal

# In the XTerm terminal
curl https://ollama.ai/install.sh | sh
ollama serve &
ollama pull llama2
4

Run RAG System

# Install Python packages
!pip install langchain_community faiss-cpu sentence-transformers requests flask pyngrok

# Run the RAG script (provided in notebook)
# The script will download documents, build vector store, and start interactive chat

Live Demo & Examples

Try the live system or explore example queries and responses

Example Conversations

User: What are PeakSpan MasterClasses about?
AI: PeakSpan MasterClasses are educational sessions focused on helping B2B SaaS companies scale. They cover topics like leadership, go-to-market strategies, product development, and operational excellence. The program brings together industry experts and portfolio companies to share insights and best practices.
Sources: MasterClass_Overview.txt (0.92) Leadership_Topics.txt (0.88)
User: Tell me about Scott Varner
AI: Scott Varner is a Managing Partner at PeakSpan Capital. He focuses on investments in the technology and software sectors, bringing over 20 years of industry experience. He has held leadership positions at companies like Microsoft, IBM, and Oracle, and is known for identifying and supporting high-growth startups. He's also committed to diversity and inclusion in tech.
API Data: Team API Insights API

API Examples

REST API - Chat Endpoint

curl -X POST http://localhost:5000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What investment strategies does PeakSpan use?",
    "strategy": "hybrid",
    "session_id": "your-session-id"
  }'

WebSocket - Real-time Streaming

import io from 'socket.io-client';

const socket = io('http://localhost:5000');

socket.on('connect', () => {
  socket.emit('chat_message', {
    query: 'Tell me about portfolio companies',
    strategy: 'hybrid'
  });
});

socket.on('response_chunk', (data) => {
  console.log(data.chunk); // Streaming response
});

socket.on('response_complete', (data) => {
  console.log('Sources:', data.sources);
});

Python SDK Example

from advanced_rag_engine import AdvancedRAGEngine, RAGConfig, RetrievalStrategy

# Initialize engine
engine = AdvancedRAGEngine(RAGConfig())
engine.initialize_from_api()

# Query with hybrid search
result = engine.query(
    "What are the key leadership topics?",
    strategy=RetrievalStrategy.HYBRID
)

print(f"Response: {result['response']}")
print(f"Sources: {result['sources']}")

Ready to Try It?

Explore the live demo or deploy your own instance

Documentation & Resources

Comprehensive documentation, guides, and learning materials

Learning Resources

Additional materials to learn AI/ML concepts and techniques:

Deep Learning & Neural Networks
LLM & Mining CX on Social Media
AI Agents & Assistants
Retrieval-Augmented Generation
Data Science Pipeline
Textual Analysis
Browse Resources