DocuThinker-AI-App

DocuThinker - AI-Powered Document Analysis and Summarization App

Welcome to DocuThinker! This is a full-stack application that integrates an AI-powered document processing backend, blue/green & canary deployment on an AWS infrastructure, and a React-based frontend. The app allows users to upload documents for summarization, generate key insights, chat with an AI, and do even more with the document’s content.

DocuThinker Logo

πŸ“š Table of Contents

πŸ“– Overview

The DocuThinker app is designed to provide users with a simple, AI-powered document management tool. Users can upload PDFs or Word documents and receive summaries, key insights, and discussion points. Additionally, users can chat with an AI using the document’s content for further clarification.

DocuThinker is created using the FERN-Stack architecture, which stands for Firebase, Express, React, and Node.js. The backend is built with Node.js and Express, integrating Firebase for user authentication and MongoDB for data storage. The frontend is built with React and Material-UI, providing a responsive and user-friendly interface.

graph LR
    U[Client's Browser] -->|HTTPS| N[NGINX - SSL, Routing, Caching]
N -->|static calls| A[React Frontend]
N -->|/api/* proxy| B[Express Backend]
A -->|REST API calls| N

B --> C[Firebase Auth]
B --> D[Firestore]
B --> E[MongoDB]
B --> F[Redis Cache]
B --> G[AI/ML Services]

A --> H[Material-UI]
A --> I[React Router]

G --> J[Google Cloud APIs]
G --> K[LangChain]

Feel free to explore the app, upload documents, and interact with the AI! For architecture details, setup instructions, and more, please refer to the sections below, as well as the ARCHITECTURE.md file.

πŸš€ Live Deployments

[!TIP] Access the live app at https://docuthinker.vercel.app/ by clicking on the link or copying it into your browser! πŸš€

We have deployed the entire app on Vercel and AWS. You can access the live app here.

[!IMPORTANT] The backend server may take a few seconds to wake up if it has been inactive for a while. The first API call may take a bit longer to respond. Subsequent calls should be faster as the server warms up.

✨ Features

DocuThinker offers a wide range of features to help users manage and analyze their documents effectively. Here are some of the key features of the app:

βš™οΈ Technologies

DocuThinker is built with 120+ technologies spanning frontend, backend, AI/ML, mobile, infrastructure, and DevOps. Below is the complete technology stack.

For a comprehensive deep-dive into the AI/ML architecture with visual diagrams, see AI_ML.md.

JavaScript TypeScript Python Node.js HTML5 CSS3 React Material UI Tailwind CSS Emotion React Router Axios Webpack Craco Babel React Markdown KaTeX PDF.js React Dropzone React Helmet Dropbox Google Drive Vercel Analytics React Native Expo React Navigation React Native Web React Native Reanimated Express GraphQL Firebase Firebase Auth JWT RabbitMQ Multer Nodemon Anthropic SDK MCP Zod FastAPI Uvicorn LangChain LangGraph CrewAI OpenAI Claude Gemini PyTorch HuggingFace ONNX Sentence Transformers Optuna Pandas Matplotlib RAG Google Cloud NLP Google Speech-to-Text PostgreSQL MongoDB Firestore Redis Neo4j FAISS ChromaDB Mongoose Flyway Docker Docker Compose Kubernetes Helm Terraform ArgoCD Devcontainer AWS EKS ECS Fargate S3 CloudFront RDS ElastiCache WAF CloudWatch Secrets Manager IAM Istio Envoy NGINX Kiali cert-manager Prometheus Grafana Jaeger Zipkin Loki OpenTelemetry Elasticsearch Logstash Kibana AlertManager Coralogix Vault External Secrets Falco OPA Gatekeeper Trivy SonarQube Snyk Flagger KEDA Velero Litmus Chaos GitHub Actions GitLab CI CircleCI Jenkins GHCR Vercel Render Netlify Swagger OpenAPI REST API Postman Jest React Testing Library pytest k6 Supertest ESLint Prettier VS Code Extension Dotenv

πŸ–ΌοΈ User Interface

DocuThinker features a clean and intuitive user interface designed to provide a seamless experience for users. The app supports both light and dark themes, responsive design, and easy navigation. Here are some screenshots of the app:

Landing Page

Landing Page

Document Upload Page

Document Upload Page

Document Upload Page - Dark Mode

Document Upload Page - Dark Mode

Document Upload Page - Document Uploaded

Document Upload Page - Document Uploaded

Google Drive Document Selection

Google Drive Document Selection

Home Page

Home Page

Home Page - Dark Mode

Home Page - Dark Mode

Chat Modal

Chat Modal

Chat Modal - Dark Mode

Chat Modal - Dark Mode

Document Analytics

Document Analytics

Documents Page

Documents Page

Documents Page - Dark Mode

Documents Page - Dark Mode

Document Page - Search Results

Document Page - Search Results

Profile Page

Profile Page

Profile Page - Dark Mode

Profile Page - Dark Mode

How To Use Page

How To Use Page

Login Page

Login Page

Registration Page

Registration Page

Forgot Password Page

Forgot Password Page

Mobile App’s View

Responsive Design

Navigation Drawer

πŸ“‚ Complete File Structure

The DocuThinker app is organized into separate subdirectories for the frontend, backend, and mobile app. Each directory contains the necessary files and folders for the respective components of the app. Here is the complete file structure of the app:

DocuThinker-AI-App/
β”œβ”€β”€ .beads/                           # Beads task coordination system
β”‚   β”œβ”€β”€ .status.json                  # Agent reservations & active bead tracking
β”‚   β”œβ”€β”€ README.md                     # Beads workflow quick-reference
β”‚   β”œβ”€β”€ active/                       # Beads available for agents to pick up
β”‚   β”œβ”€β”€ completed/                    # Archive of finished beads
β”‚   └── templates/
β”‚       └── feature-bead.md           # Template for new feature beads
β”œβ”€β”€ .agent-sessions/                  # Agent session history & coordination
β”‚   β”œβ”€β”€ README.md                     # Session management guide
β”‚   β”œβ”€β”€ SCHEMA.md                     # Session data structure specification
β”‚   β”œβ”€β”€ config.json                   # Session configuration
β”‚   β”œβ”€β”€ active/                       # Sessions currently in progress
β”‚   β”œβ”€β”€ completed/                    # Archived finished sessions
β”‚   └── templates/
β”‚       β”œβ”€β”€ session-log.md            # Standard session log template
β”‚       β”œβ”€β”€ handoff-report.md         # Agent-to-agent handoff template
β”‚       └── escalation-report.md      # Conflict / blocker escalation template
β”œβ”€β”€ .claude/                          # Claude Code workspace settings
β”œβ”€β”€ .mcp.json                         # MCP server configuration
β”œβ”€β”€ AGENTS.md                         # Agent behavior instructions
β”œβ”€β”€ CLAUDE.md                         # Claude Code project instructions
β”œβ”€β”€ ai_ml/                            # AI/ML pipelines & services directory (Python)
β”œβ”€β”€ orchestrator/                     # Agentic orchestration layer (Node.js)
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ supervisor.js             # Intent classification, decomposition, dispatch
β”‚   β”‚   β”œβ”€β”€ circuit-breaker.js        # Per-provider circuit breaker state machine
β”‚   β”‚   β”œβ”€β”€ agent-loop.js             # Iterative tool-use agent loop
β”‚   β”‚   β”œβ”€β”€ handoff.js                # Cross-agent context transfer
β”‚   β”‚   β”œβ”€β”€ batch-processor.js        # Concurrent batch document processing
β”‚   β”‚   β”œβ”€β”€ cost-tracker.js           # Token cost tracking with budget limits
β”‚   β”‚   β”œβ”€β”€ dlq.js                    # Dead letter queue with retry logic
β”‚   β”‚   β”œβ”€β”€ python-bridge.js          # HTTP bridge to Python AI/ML service
β”‚   β”‚   β”œβ”€β”€ providers.js              # Unified LLM client (Claude + Gemini)
β”‚   β”‚   └── tool-registry.js          # Tool registration and dispatch
β”‚   β”œβ”€β”€ context/
β”‚   β”‚   β”œβ”€β”€ token-budget.js           # Context window management
β”‚   β”‚   β”œβ”€β”€ conversation-store.js     # Auto-summarizing conversation memory
β”‚   β”‚   β”œβ”€β”€ observability.js          # OTel-compatible context metrics
β”‚   β”‚   └── hybrid-rag.js             # Keyword + semantic search with RRF
β”‚   β”œβ”€β”€ prompts/
β”‚   β”‚   β”œβ”€β”€ system-prompts.js         # 14 versioned system prompts
β”‚   β”‚   └── cache-strategy.js         # 3-layer Anthropic prompt caching
β”‚   β”œβ”€β”€ schemas/
β”‚   β”‚   └── ai-outputs.js             # 12 Zod validation schemas
β”‚   β”œβ”€β”€ mcp/
β”‚   β”‚   β”œβ”€β”€ server.js                 # MCP server exposing 13 tools
β”‚   β”‚   └── client.js                 # MCP client for external servers
β”‚   β”œβ”€β”€ __tests__/
β”‚   β”‚   └── orchestrator.test.js      # Integration tests (Jest)
β”‚   β”œβ”€β”€ Dockerfile                    # Production container (node:20-alpine)
β”‚   β”œβ”€β”€ package.json                  # Dependencies and scripts
β”‚   └── index.js                      # Express server entry point (port 4000)
β”‚
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ middleware/
β”‚   β”‚   └── jwt.js                    # Authentication middleware with JWT for the app's backend
β”‚   β”œβ”€β”€ controllers/
β”‚   β”‚   └── controllers.js            # Controls the flow of data and logic
β”‚   β”œβ”€β”€ graphql/
β”‚   β”‚   β”œβ”€β”€ resolvers.js              # Resolvers for querying data from the database
β”‚   β”‚   └── schema.js                 # GraphQL schema for querying data from the database
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── models.js                 # Data models for interacting with the database
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   └── services.js               # Models for interacting with database and AI/ML services
β”‚   β”œβ”€β”€ views/
β”‚   β”‚   └── views.js                  # Output formatting for success and error responses
β”‚   β”œβ”€β”€ redis/
β”‚   β”‚   └── redisClient.js            # Redis client for caching data in-memory
β”‚   β”œβ”€β”€ swagger/
β”‚   β”‚   └── swagger.js                # Swagger documentation for API endpoints
β”‚   β”œβ”€β”€ .env                          # Environment variables (git-ignored)
β”‚   β”œβ”€β”€ firebase-admin-sdk.json       # Firebase Admin SDK credentials (git-ignored)
β”‚   β”œβ”€β”€ index.js                      # Main entry point for the server
β”‚   β”œβ”€β”€ Dockerfile                    # Docker configuration file
β”‚   β”œβ”€β”€ manage_server.sh              # Shell script to manage and start the backend server
β”‚   └── README.md                     # Backend README file
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ public/
β”‚   β”‚   β”œβ”€β”€ index.html                # Main HTML template
β”‚   β”‚   └── manifest.json             # Manifest for PWA settings
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ assets/                   # Static assets like images and fonts
β”‚   β”‚   β”‚   └── logo.png              # App logo or images
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ ChatModal.js          # Chat modal component
β”‚   β”‚   β”‚   β”œβ”€β”€ Spinner.js            # Loading spinner component
β”‚   β”‚   β”‚   β”œβ”€β”€ UploadModal.js        # Document upload modal component
β”‚   β”‚   β”‚   β”œβ”€β”€ Navbar.js             # Navigation bar component
β”‚   β”‚   β”‚   β”œβ”€β”€ Footer.js             # Footer component
β”‚   β”‚   β”‚   └── GoogleAnalytics.js    # Google Analytics integration component
β”‚   β”‚   β”œβ”€β”€ pages/
β”‚   β”‚   β”‚   β”œβ”€β”€ Home.js               # Home page where documents are uploaded
β”‚   β”‚   β”‚   β”œβ”€β”€ LandingPage.js        # Welcome and information page
β”‚   β”‚   β”‚   β”œβ”€β”€ Login.js              # Login page
β”‚   β”‚   β”‚   β”œβ”€β”€ Register.js           # Registration page
β”‚   β”‚   β”‚   β”œβ”€β”€ ForgotPassword.js     # Forgot password page
β”‚   β”‚   β”‚   └── HowToUse.js           # Page explaining how to use the app
β”‚   β”‚   β”œβ”€β”€ App.js                    # Main App component
β”‚   β”‚   β”œβ”€β”€ index.js                  # Entry point for the React app
β”‚   β”‚   β”œβ”€β”€ App.css                   # Global CSS 1
β”‚   β”‚   β”œβ”€β”€ index.css                 # Global CSS 2
β”‚   β”‚   β”œβ”€β”€ reportWebVitals.js        # Web Vitals reporting
β”‚   β”‚   β”œβ”€β”€ styles.css                # Custom styles for different components
β”‚   β”‚   └── config.js                 # Configuration file for environment variables
β”‚   β”œβ”€β”€ .env                          # Environment variables file (e.g., REACT_APP_BACKEND_URL)
β”‚   β”œβ”€β”€ package.json                  # Project dependencies and scripts
β”‚   β”œβ”€β”€ craco.config.js               # Craco configuration file
β”‚   β”œβ”€β”€ Dockerfile                    # Docker configuration file
β”‚   β”œβ”€β”€ manage_frontend.sh            # Shell script for managing and starting the frontend
β”‚   β”œβ”€β”€ README.md                     # Frontend README file
β”‚   └── package.lock                  # Lock file for dependencies
β”‚
β”œβ”€β”€ mobile-app/                       # Mobile app directory
β”‚   β”œβ”€β”€ app/                          # React Native app directory
β”‚   β”œβ”€β”€ .env                          # Environment variables file for the mobile app
β”‚   β”œβ”€β”€ app.json                      # Expo configuration file
β”‚   β”œβ”€β”€ components/                   # Reusable components for the mobile app
β”‚   β”œβ”€β”€ assets/                       # Static assets for the mobile app
β”‚   β”œβ”€β”€ constants/                    # Constants for the mobile app
β”‚   β”œβ”€β”€ hooks/                        # Custom hooks for the mobile app
β”‚   β”œβ”€β”€ scripts/                      # Scripts for the mobile app
β”‚   β”œβ”€β”€ babel.config.js               # Babel configuration file
β”‚   β”œβ”€β”€ package.json                  # Project dependencies and scripts
β”‚   └── tsconfig.json                 # TypeScript configuration file
β”‚
β”œβ”€β”€ aws/                              # AWS deployment assets (ECR/ECS/CloudFormation/CDK)
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ cloudformation/
β”‚   β”‚   └── fargate-service.yaml      # Reference Fargate stack for backend + ai_ml services
β”‚   β”œβ”€β”€ infrastructure/
β”‚   β”‚   β”œβ”€β”€ cdk-app.ts                # CDK entrypoint
β”‚   β”‚   └── lib/docuthinker-stack.ts  # CDK stack definition
β”‚   └── scripts/
β”‚       └── local-env.sh              # Helper to mirror production env vars locally
β”‚ 
β”œβ”€β”€ kubernetes/                       # Kubernetes configuration files
β”‚   β”œβ”€β”€ manifests/                    # Kubernetes manifests for deployment, service, and ingress
β”‚   β”œβ”€β”€ backend-deployment.yaml       # Deployment configuration for the backend
β”‚   β”œβ”€β”€ backend-service.yaml          # Service configuration for the backend
β”‚   β”œβ”€β”€ frontend-deployment.yaml      # Deployment configuration for the frontend
β”‚   β”œβ”€β”€ frontend-service.yaml         # Service configuration for the frontend
β”‚   β”œβ”€β”€ firebase-deployment.yaml      # Deployment configuration for Firebase
β”‚   β”œβ”€β”€ firebase-service.yaml         # Service configuration for Firebase
β”‚   └── configmap.yaml                # ConfigMap configuration for environment variables
β”‚
β”œβ”€β”€ nginx/
β”‚   β”œβ”€β”€ nginx.conf                    # NGINX configuration file for load balancing and caching
β”‚   └── Dockerfile                    # Docker configuration file for NGINX
β”‚
β”œβ”€β”€ images/                           # Images for the README
β”œβ”€β”€ .env                              # Environment variables file for the whole app
β”œβ”€β”€ docker-compose.yml                # Docker Compose file for containerization
β”œβ”€β”€ jsconfig.json                     # JavaScript configuration file
β”œβ”€β”€ package.json                      # Project dependencies and scripts
β”œβ”€β”€ package-lock.json                 # Lock file for dependencies
β”œβ”€β”€ postcss.config.js                 # PostCSS configuration file
β”œβ”€β”€ tailwind.config.js                # Tailwind CSS configuration file
β”œβ”€β”€ render.yaml                       # Render configuration file
β”œβ”€β”€ vercel.json                       # Vercel configuration file
β”œβ”€β”€ openapi.yaml                      # OpenAPI specification for API documentation
β”œβ”€β”€ manage_docuthinker.sh             # Shell script for managing and starting the app (both frontend & backend)
β”œβ”€β”€ .gitignore                        # Git ignore file
β”œβ”€β”€ LICENSE.md                        # License file for the project
β”œβ”€β”€ README.md                         # Comprehensive README for the whole app
└── (and many more files...)          # Additional files and directories not listed here

πŸ› οΈ Getting Started

Prerequisites

Ensure you have the following tools installed:

Additionally, basic fullstack development knowledge and AI/ML concepts are recommended to understand the app’s architecture and functionalities.

Frontend Installation

  1. Clone the repository:

    git clone https://github.com/hoangsonww/DocuThinker-AI-App.git
    cd DocuThinker-AI-App/backend
    
  2. Navigate to the frontend directory:

    cd frontend
    
  3. Install dependencies:

    npm install
    

    Or npm install --legacy-peer-deps if you face any peer dependency issues.

  4. Start the Frontend React app:
    npm start
    
  5. Build the Frontend React app (for production):

    npm run build
    
  6. Alternatively, you can use yarn to install dependencies and run the app:
    yarn install
    yarn start
    
  7. Or, for your convenience, if you have already installed the dependencies, you can directly run the app in the root directory using:

    npm run frontend
    

    This way, you don’t have to navigate to the frontend directory every time you want to run the app.

  8. The app’s frontend will run on http://localhost:3000. You can now access it in your browser.

Backend Installation

[!NOTE] Note that this is optional since we are deploying the backend on Render. However, you can (and should) run the backend locally for development purposes.

  1. Navigate to the root (not backend) directory:
    cd backend
    
  2. Install dependencies:
    npm install
    

    Or npm install --legacy-peer-deps if you face any peer dependency issues.

  3. Start the backend server:
    npm run server
    
  4. The backend server will run on http://localhost:3000. You can access the API endpoints in your browser or Postman.
  5. Additionally, the backend code is in the backend directory. Feel free to explore the API endpoints and controllers.

[!CAUTION] Note: Be sure to use Node v.20 or earlier to avoid compatibility issues with Firebase Admin SDK.

Running the Mobile App

  1. Navigate to the mobile app directory:
    cd mobile-app
    
  2. Install dependencies:
     npm install
    
  3. Start the Expo server:
    npx expo start
    
  4. Run the app on an emulator or physical device: Follow the instructions in the terminal to run the app on an emulator or physical device.

πŸ“‹ API Endpoints

The backend of DocuThinker provides several API endpoints for user authentication, document management, and AI-powered insights. These endpoints are used by the frontend to interact with the backend server:

Method Endpoint Description
POST /register Register a new user in Firebase Authentication and Firestore, saving their email and creation date.
POST /login Log in a user and return a custom token along with the user ID.
POST /upload Upload a document for summarization. If the user is logged in, the document is saved in Firestore.
POST /generate-key-ideas Generate key ideas from the document text.
POST /generate-discussion-points Generate discussion points from the document text.
POST /chat Chat with AI using the original document text as context.
POST /forgot-password Reset a user’s password in Firebase Authentication.
POST /verify-email Verify if a user’s email exists in Firestore.
GET /documents/{userId} Retrieve all documents associated with the given userId.
GET /documents/{userId}/{docId} Retrieve a specific document by userId and docId.
GET /document-details/{userId}/{docId} Retrieve document details (title, original text, summary) by userId and docId.
DELETE /delete-document/{userId}/{docId} Delete a specific document by userId and docId.
DELETE /delete-all-documents/{userId} Delete all documents associated with the given userId.
POST /update-email Update a user’s email in both Firebase Authentication and Firestore.
POST /update-password Update a user’s password in Firebase Authentication.
GET /days-since-joined/{userId} Get the number of days since the user associated with userId joined the service.
GET /document-count/{userId} Retrieve the number of documents associated with the given userId.
GET /user-email/{userId} Retrieve the email of a user associated with userId.
POST /update-document-title Update the title of a document in Firestore.
PUT /update-theme Update the theme of the app.
GET /user-joined-date/{userId} Get date when the user associated with userId joined the service.
GET /social-media/{userId} Get the social media links of the user associated with userId.
POST /update-social-media Update the social media links of the user associated with userId.
POST /update-profile Update the user’s profile information.
POST /update-document/{userId}/{docId} Update the document details in Firestore.
POST /update-document-summary Update the summary of a document in Firestore.
POST /sentiment-analysis Analyzes the sentiment of the provided document text
POST /bullet-summary Generates a summary of the document text in bullet points
POST /summary-in-language Generates a summary in the specified language
POST /content-rewriting Rewrites or rephrases the provided document text based on a style
POST /actionable-recommendations Generates actionable recommendations based on the document text
GET /graphql GraphQL endpoint for querying data from the database

More API endpoints will be added in the future to enhance the functionality of the app. Feel free to explore the existing endpoints and test them using Postman or Insomnia.

[!NOTE] This list is not exhaustive. For a complete list of API endpoints, please refer to the Swagger or Redoc documentation of the backend server.

API Documentation

For example, our API endpoints documentation looks like this:

Swagger Documentation

Additionally, we also offer API file generation using OpenAPI. You can generate API files using the OpenAPI specification. Here is how:

npx openapi-generator-cli generate -i http://localhost:5000/api-docs -g typescript-fetch -o ./api

This will generate TypeScript files for the API endpoints in the api directory. Feel free to replace or modify the command as needed.

API Architecture

API Testing

Example Request to Register a User:

curl --location --request POST 'http://localhost:3000/register' \
--header 'Content-Type: application/json' \
--data-raw '{
    "email": "test@example.com",
    "password": "password123"
}'

Example Request to Upload a Document:

curl --location --request POST 'http://localhost:3000/upload' \
--header 'Authorization: Bearer <your-token>' \
--form 'File=@"/path/to/your/file.pdf"'

Error Handling

The backend APIs uses centralized error handling to capture and log errors. Responses for failed requests are returned with a proper status code and an error message:

{
  "error": "An internal error occurred",
  "details": "Error details go here"
}

πŸ€– AI/ML Agentic Platform

DocuThinker employs a two-layer agentic architecture that separates orchestration concerns (Node.js) from AI/ML execution (Python), connected by a resilient bridge with circuit breakers, cost controls, and full observability.

Architecture Overview

Layer Technology Port Responsibility
Orchestrator Node.js 18+ / Express 4000 Supervisor routing, agent loops, tool dispatch, cost tracking, MCP
AI/ML Backend Python / FastAPI 8000 LLM inference, RAG pipelines, NER, CrewAI multi-agent, vector/graph stores
graph TB
    subgraph "Clients"
        WEB[React Frontend]
        EXT[External Agents / MCP]
    end

    subgraph "Orchestrator :4000"
        SUP[Supervisor<br/>classify / decompose / dispatch]
        AL[Agent Loop<br/>tool-use cycle up to 10 iters]
        CB[Circuit Breaker<br/>CLOSED / OPEN / HALF_OPEN]
        CT[Cost Tracker<br/>daily + monthly budgets]
        BP[Batch Processor<br/>concurrent doc processing]
        DLQ[Dead Letter Queue<br/>retry + DLQ]
        HO[Handoff Manager<br/>cross-agent context transfer]
        TR[Tool Registry<br/>local + Python-bridge tools]
        TB[Token Budget Manager<br/>context window guard]
        CS[Conversation Store<br/>auto-summarizing history]
        OBS[Context Observability<br/>OTel-compatible metrics]
        PC[Prompt Cache Strategy<br/>3-layer Anthropic caching]
        MCP_S[MCP Server<br/>13 tools over stdio]
        MCP_C[MCP Client<br/>connect to external servers]
    end

    subgraph "AI/ML Backend :8000"
        PY_SVC[DocumentIntelligenceService]
        RAG[Agentic RAG Pipeline]
        CREW[CrewAI Multi-Agent]
        NLP[SpaCy NER / Sentiment]
        VEC[ChromaDB Vectors]
        KG[Neo4j Knowledge Graph]
    end

    subgraph "LLM Providers"
        CLAUDE[Anthropic Claude]
        GEMINI[Google Gemini]
    end

    WEB -->|REST| SUP
    EXT -->|MCP stdio| MCP_S
    SUP --> AL
    SUP --> BP
    AL --> TR
    TR -->|Python Bridge| PY_SVC
    AL --> CB
    CB --> CLAUDE
    CB --> GEMINI
    CT -.->|budget check| SUP
    TB -.->|token check| SUP
    DLQ -.->|retry| SUP
    HO -.->|context| AL
    CS -.->|history| AL
    OBS -.->|metrics| CT
    PC -.->|cache hints| AL
    PY_SVC --> RAG
    PY_SVC --> CREW
    PY_SVC --> NLP
    RAG --> VEC
    RAG --> KG

Orchestrator Components

The orchestrator (orchestrator/) is a standalone Node.js service providing:

Context Management

Prompt Engineering

MCP Integration

Orchestrator API Endpoints

Method Endpoint Description
GET /health System health with circuit breaker, cost, cache, DLQ, and provider status
GET /api/costs Cost usage report by provider and intent
GET /api/circuits Circuit breaker state for all providers
GET /api/context-metrics Context utilization and cache hit rate metrics
GET /api/dlq Dead letter queue stats and recent messages
GET /api/tools Registered tool definitions and count
POST /api/tools/execute Execute a registered tool by name
POST /api/token-check Check token budget for a given model/prompt/messages
POST /api/supervisor/process Route a request through the supervisor pipeline
POST /api/agent/run Run the agentic tool-use loop with a message and context
POST /api/batch/process Batch process multiple documents (summarize, keyIdeas, sentiment)
POST /api/conversations/:userId/:documentId/message Add a message to a conversation
GET /api/conversations/:userId/:documentId Retrieve conversation history
DELETE /api/conversations/:userId/:documentId Clear a conversation

[!TIP] Visit the orchestrator/README.md for full API request/response examples and the ai_ml/README.md for the Python AI/ML layer.

🧩 Beads Task Coordination

DocuThinker AI agents (and humans) use a Beads sub-architecture to coordinate work across multiple AI agents and humans operating on the same codebase. A bead is a self-contained, dependency-aware task unit that any agent can pick up, execute, and complete β€” enabling safe parallel development without merge conflicts.

Why Beads?

When several AI agents (or human developers) work concurrently, they risk editing the same files and producing conflicting changes. Beads solve this with:

Bead Lifecycle

stateDiagram-v2
    [*] --> Authored: Bead created from template
    Authored --> Claimed: Agent reserves files via .status.json
    Claimed --> InProgress: Agent begins implementation
    InProgress --> Testing: Code changes complete
    Testing --> Done: Acceptance criteria pass
    Testing --> InProgress: Tests fail β€” iterate
    Done --> [*]: Reservations released
    InProgress --> Blocked: Dependency not met
    Blocked --> InProgress: Dependency resolved

Directory Structure

.beads/
β”œβ”€β”€ .status.json          # Live agent reservations & bead counters
β”œβ”€β”€ README.md             # Quick-start guide for the beads workflow
└── templates/
    └── feature-bead.md   # Canonical bead template

Status Tracking (.beads/.status.json)

The status file is the single source of truth for agent coordination:

{
  "version": "1.0.0",
  "agents": {},
  "reservations": {},
  "lastUpdated": null,
  "beadsCompleted": 0,
  "beadsActive": 0
}
Field Purpose
agents Map of active agent IDs to their metadata (name, start time, current bead)
reservations Map of file paths to the agent ID that holds the reservation
beadsCompleted Counter of successfully finished beads
beadsActive Counter of beads currently in progress

Bead Template

Every bead follows a structured template (.beads/templates/feature-bead.md):

Section Description
Background Why the work exists
Current State Files to read before starting
Desired Outcome Specific, testable result
Files to Touch Explicit list of files to read, enhance, or create
Dependencies Upstream beads that must finish first and downstream beads this unblocks
Acceptance Criteria Checklist including β€œall existing tests still pass”

Conflict Zones vs. Safe Parallel Zones

Certain files are single-agent only β€” only one agent may hold a reservation at a time:

Conflict Zone File Reason
docker-compose.yml Shared service definitions
ai_ml/services/orchestrator.py Central AI/ML entry point
ai_ml/providers/registry.py LLM provider configuration
orchestrator/index.js Orchestrator entry point
Shared config files Cross-service settings

Safe parallel zones (multiple agents can work simultaneously):

Agent Communication Protocol

sequenceDiagram
    participant A as Agent
    participant S as .status.json
    participant C as Codebase

    A->>S: 1. Check for conflicts
    S-->>A: No reservation on target files
    A->>S: 2. Post reservation (agent ID + file list)
    A->>C: 3. Implement bead instructions
    A->>C: 4. Run tests (acceptance criteria)
    A->>S: 5. Release reservations
    A->>S: 6. Increment beadsCompleted

Agents must:

  1. Check .beads/.status.json before starting any work.
  2. Reserve files by posting their agent ID and claimed file paths.
  3. Update status every 30 minutes while actively working.
  4. Release all reservations upon completion or failure.
  5. Use branch naming: agent/<agent-name>/<bead-id>.

[!NOTE] For the full agent coordination protocol including conflict resolution and escalation, see AGENTS.md. For how beads integrate with the AI/ML pipeline, see AI_ML.md.

🧰 GraphQL Integration

Introduction to GraphQL in Our Application

Our application supports a fully-featured GraphQL API that allows clients to interact with the backend using flexible queries and mutations. This API provides powerful features for retrieving and managing data such as users, documents, and related information.

Key Features of the GraphQL API

Getting Started

  1. GraphQL Endpoint:
    The GraphQL endpoint is available at:
    https://docuthinker-app-backend-api.vercel.app/graphql
    

    Or, if you are running the backend locally, the endpoint will be:

    http://localhost:3000/graphql
    
  2. Testing the API:
    You can use the built-in GraphiQL Interface to test queries and mutations. Simply visit the endpoint in your browser. You should see the following interface:

    GraphiQL Interface

    Now you can start querying the API using the available fields and mutations. Examples are below for your reference.

Example Queries and Mutations

1. Fetch a User and Their Documents

This query retrieves a user’s email and their documents, including titles and summaries:

query GetUser {
  getUser(id: "USER_ID") {
    id
    email
    documents {
      id
      title
      summary
    }
  }
}

2. Fetch a Specific Document

Retrieve details of a document by its ID:

query GetDocument {
  getDocument(userId: "USER_ID", docId: "DOCUMENT_ID") {
    id
    title
    summary
    originalText
  }
}

3. Create a New User

Create a user with an email and password:

mutation CreateUser {
  createUser(email: "example@domain.com", password: "password123") {
    id
    email
  }
}

4. Update a Document Title

Change the title of a specific document:

mutation UpdateDocumentTitle {
  updateDocumentTitle(userId: "USER_ID", docId: "DOCUMENT_ID", title: ["Updated Title.pdf"]) {
    id
    title
  }
}

5. Delete a Document

Delete a document from a user’s account:

mutation DeleteDocument {
  deleteDocument(userId: "USER_ID", docId: "DOCUMENT_ID")
}

Advanced Tips

For more information about GraphQL, visit the official documentation. If you encounter any issues or have questions, feel free to open an issue in our repository.

πŸ“± Mobile App

The DocuThinker mobile app is built using React Native and Expo. It provides a mobile-friendly interface for users to upload documents, generate summaries, and chat with an AI. The mobile app integrates with the backend API to provide a seamless experience across devices.

Currently, it is in development and will be released soon on both the App Store and Google Play Store.

Stay tuned for the release of the DocuThinker mobile app!

Below is a screenshot of the mobile app (in development):

Mobile App

πŸ“¦ Containerization

The DocuThinker app can be containerized using Docker for easy deployment and scaling. The docker-compose.yml defines all services including the new agentic orchestrator.

  1. Run the following command to build and start all services:
    docker compose up --build
    
  2. All services will start on their respective ports (see table below).

You can also view the image in the Docker Hub repository here.

Docker Compose Services

Service Container Port Description
frontend docuthinker-frontend 3001 React frontend
backend docuthinker-backend 3000 Express API server
orchestrator docuthinker-orchestrator 4000 Agentic orchestration layer (Node.js)
ai-ml docuthinker-ai-ml 8000 Python AI/ML services (FastAPI)
redis docuthinker-redis 6379 In-memory cache (Redis 7 Alpine)
firebase firebase – Firebase emulator

The orchestrator container includes a health check (/health), runs as a non-root user, and depends on Redis being healthy before starting.

graph TB
    A[Docker Compose] --> B[Frontend Container]
    A --> C[Backend Container]
    A --> O[Orchestrator Container]
    A --> ML[AI/ML Container]
    A --> D[Redis Container]
    A --> F[Firebase Container]
    B -->|Port 3001| G[React App]
    C -->|Port 3000| H[Express Server]
    O -->|Port 4000| I[Agentic Orchestrator]
    ML -->|Port 8000| J[FastAPI AI/ML]
    D -->|Port 6379| K[Redis Cache]
    I -->|Python Bridge| J
    I -->|Circuit Breaker| L[Claude / Gemini]
    H -->|REST| I

🚧 Deployment

DocuThinker now ships primarily via Kubernetes with blue/green promotion plus weighted canaries driven by the updated Jenkinsfile. Vercel/Render remain as backup endpoints, and AWS ECS Fargate is still available as an alternative target.

graph TB
    GIT[GitHub Repo] --> JENKINS[Jenkins Pipeline]
    JENKINS --> TEST[Install + Lint + Tests]
    TEST --> BUILD[Containerize Frontend + Backend]
    BUILD --> REG[Push Images to Registry]
    REG --> CANARY[Canary Deploy - 10% weight]
    CANARY --> BG[Promote to Blue/Green]
    BG --> USERS[Live Traffic]
    JENKINS --> VERCEL[Vercel Fallback Deploy]
    VERCEL --> USERS

Production Rollouts (Kubernetes blue/green + canary)

See kubernetes/README.md for the full rollout flow, ingress weighting, and rollback commands.

Frontend Deployment (Vercel)

Backend & AI/ML Deployment

βš–οΈ Load Balancing & Caching

πŸ”— Jenkins Integration

If successful, you should see the Jenkins pipeline running tests, pushing images, rolling out the canary, and promoting blue/green automatically whenever changes are merged. Example dashboard:

Jenkins Pipeline

πŸ› οΈ GitHub Actions Integration

In addition to Jenkins, we also have a GitHub Actions workflow set up for CI/CD. The workflow is defined in the .github/workflows/ci.yml file.

The GitHub Actions workflow includes the following steps:

GitHub Actions Workflow

πŸ§ͺ Testing

DocuThinker includes a comprehensive suite of tests to ensure the reliability and correctness of the application. The tests cover various aspects of the app, including:

Backend Unit & Integration Testing

To run the backend tests, follow these steps:

  1. Navigate to the backend directory:
    cd backend
    
  2. Install the necessary dependencies:
    # Run the tests in default mode
    npm run test
       
    # Run the tests in watch mode
    npm run test:watch
       
    # Run the tests with coverage report
    npm run test:coverage
    

This will run the unit tests and integration tests for the backend app using Jest and Supertest.

Frontend Unit & E2E Testing

To run the frontend tests, follow these steps:

  1. Navigate to the frontend directory:
    cd frontend
    
  2. Install the necessary dependencies:
    # Run the tests in default mode
    npm run test
       
    # Run the tests in watch mode
    npm run test:watch
       
    # Run the tests with coverage report
    npm run test:coverage
    

This will run the unit tests and end-to-end tests for the frontend app using Jest and React Testing Library.

🚒 Kubernetes Integration

graph TB
    A[Kubernetes Cluster] --> B[Ingress Controller]
    B --> C[Frontend Service]
    B --> D[Backend Service]
    C --> E[Frontend Pods]
    D --> F[Backend Pods]
    E --> G[Pod 1]
    E --> H[Pod 2]
    E --> I[Pod 3]
    F --> J[Pod 1]
    F --> K[Pod 2]
    F --> L[Pod 3]
    D --> M[ConfigMap]
    D --> N[Secrets]
    D --> O[Persistent Volume]
    O --> P[MongoDB]
    O --> Q[Redis]

βš›οΈ VS Code Extension

The DocuThinker Viewer extension brings your document upload, summarization and insight‑extraction workflow right into VS Code.

Key Features

To install the extension, follow these steps:

  1. Open VSCode.
  2. Go to Extensions (Ctrl+Shift+X).
  3. Search for β€œDocuThinker Viewer”.
  4. Click Install.
  5. Open the Command Palette (Ctrl+Shift+P on Windows or Cmd+Shift+P on macOS) and type β€œDocuThinker”. Then select β€œDocuThinker: Open Document Panel” to open the extension panel.
  6. Start using the app normally!
  7. If you want to further configure the extension, you can do so by going to the settings (Ctrl+,) and searching for β€œDocuThinker”. Or, go to the extension settings by clicking on the gear icon next to the extension in the Extensions panel.

VSCode Extension

For full install and development steps, configuration options, and troubleshooting, see extension/README.md.

πŸ”§ Contributing

We welcome contributions from the community! Follow these steps to contribute:

  1. Fork the repository.

  2. Create a new branch:
    git checkout -b feature/your-feature
    
  3. Commit your changes:
    git commit -m "Add your feature"
    
  4. Push the changes:
    git push origin feature/your-feature
    
  5. Submit a pull request: Please submit a pull request from your forked repository to the main repository. I will review your changes and merge them into the main branch shortly.

Thank you for contributing to DocuThinker! πŸŽ‰

πŸ“ License

This project is licensed under the Creative Commons Attribution-NonCommercial License. See the LICENSE file for details.

[!IMPORTANT] The DocuThinker open-source project is for educational purposes only and should not be used for commercial applications. But free to use it for learning and personal projects!

πŸ“š Additional Documentation

For more information on the DocuThinker app, please refer to the following resources:

However, this README file should already provide a comprehensive overview of the project ~

πŸ‘¨β€πŸ’» Author

Here are some information about me - the project’s humble creator:


Happy Coding and Analyzing! πŸš€

Created with ❀️ by Son Nguyen in 2024-2025. Licensed under the Creative Commons Attribution-NonCommercial License.


πŸ” Back to Top