RAG-LangChain-AI-System

LangChain & RAG System for Portfolio Support in Google Colab

This repository contains a Google Colab Notebook that implements a Retrieval-Augmented Generation (RAG) system for portfolio support. The system integrates document retrieval, dynamic entity extraction, and external API calls to generate informed, context-aware responses using a Hugging Face language model via Ollama. The entire application runs in Google Colab and is designed to work on both Linux and Windows environments.

In addition to that, this repository also contains a Flask app that serves as an API endpoint for the RAG system. The app allows users to interact with the RAG system via HTTP POST requests, enabling seamless integration with other applications and services.

Additionally, it also includes a sample backend Express API that can be used to interact with the RAG system. The system can (and should) query from the API to get more data to generate better-informed responses.

LangChain & RAG System for Portfolio Support in Google Colab

This project was done as part of the PeakSpan Capital technical assessment & portfolio support project. The Google Colab notebook is at RAG_LangChain_AI_System.ipynb.

Table of Contents

Key Features

Key Technologies Used

Python Flask Hugging Face Ollama FAISS Express MongoDB ngrok Postman Google Colab Google Python Style Guide

Note: The live sample Express API is hosted on Render.com and can be accessed at https://rag-langchain-ai-system.onrender.com. The API provides endpoints for team profiles, investments, sectors, consultations, and more. You can use these endpoints to retrieve data and enrich the responses generated by the RAG system. However, please note that it will spin down after 15 minutes of inactivity, so it may need some time to spin up again if it has been inactive for a while.

Embedding and Language Models Used

Note: In this project, the “external APIs” are simulated using a sample backend Express API. The actual APIs can be substituted with real endpoints to access live data of your choice. Visit the Sample Backend Express API for more details.

Strategies for Retrieval Accuracy and Persistent Memory

API Tool Integration Methodology

How to Deploy / Use the Code

1. Set Up Your Colab Environment

Note: Please use a Google Colab instance with a GPU (e.g. T4 GPU) for better performance. All code are tested and optimized for Google Colab only!

a. Install the Colab XTerm extension (for command-line support):

!pip install colab-xterm
%load_ext colabxterm

b. Launch an XTerm terminal within Colab:

%xterm

This opens a full-screen terminal window within your notebook.

c. Install and serve Ollama:

In the XTerm terminal, run:

curl https://ollama.ai/install.sh | sh
ollama serve &

d. Pull an AI Model (Example using llama2):

ollama pull llama2

This downloads the model for use.

e. Verify the Ollama Installation:

!ollama -version

If you see the version number, your Ollama server is running correctly.

2. Install Required Python Packages

Run the following cell in Colab:

!pip install langchain_community faiss-cpu sentence-transformers requests flask pyngrok

3. Run the RAG Script

Copy the full RAG system script (provided in the notebook) into a new cell and run it. The script will:

  1. Download and extract MasterClass documents.
  2. Build a FAISS vector store from the document contents.
  3. Initialize the Ollama language model.
  4. Start an interactive conversation loop where you can type queries.
  5. Retrieve relevant document context and external API data to generate responses.

Note:

4. Running the Flask App

  1. Set Up ngrok for Colab:

    • Install pyngrok and Flask if not already installed:
      !pip install flask pyngrok
      
    • Set your ngrok authtoken (replace "YOUR_NGROK_AUTH_TOKEN" with your actual token):
      from pyngrok import ngrok
      ngrok.set_auth_token("YOUR_NGROK_AUTH_TOKEN")
      
  2. Run the Flask App Cell:

    Execute the cell containing the Flask app code. Once the documents are loaded and indexed, the app will start on port 5000 and ngrok will create a public tunnel. The output will display a public URL (e.g., https://your-ngrok-url.ngrok-free.app).

  3. Test the Flask Endpoint:

    Send a POST request to the /chat endpoint using the public URL. For example, in a new Colab cell:

    !curl -X POST "https://your-ngrok-url.ngrok-free.app/chat" -H "Content-Type: application/json" -d '{"query": "hello"}'
    

    Replace https://your-ngrok-url.ngrok-free.app with the actual URL printed by ngrok.

Running the Sample Backend Express API

  1. Install Required Packages:

    Navigate to the backend directory and install the required packages:

    cd backend
    npm install
    

    Also, don’t forget to set up the .env file with the following content:

     MONGO_URI=<your-mongo-uri>
     PORT=3456
    
  2. Start the Express Server:

    Start the Express server:

    npm start
    

    The server will run on http://localhost:3456. Visiting it in your browser should show the API documentation in Swagger UI.

  3. Test the API Endpoints:

    You can now test the API endpoints using tools like Postman or cURL. For example:

     curl http://localhost:3456/api/team
    
  4. Integrate with the RAG System:

    Update the Flask app to query the sample backend API endpoints for additional data. You can modify the /chat endpoint in the Flask app to call the sample backend API and enrich the responses with relevant information. Also, feel free to make changes to the API as needed if you want it to return different data or support more operations.

Demonstration Examples

Below are some example interactions from the notebook (which you can also verify by viewing the console output in the notebook - under the “RAG System” code section).

Why Use Google Colab

Google Colab provides a free, cloud-based Jupyter notebook environment with GPU support, making it an ideal platform for running AI models, training neural networks, and executing complex computations compared to local machines.

Thus, I have elected to use Google Colab for this project to leverage its great GPU capabilities, easy setup, and seamless integration with external APIs and services.

I have also tested the code so that it works on my MacOS (Local) and Windows (Local) machines, with minor adjustments. However, the performance was quite poor compared to Google Colab, so I recommend using Google Colab for the best experience.

Code Sharing

Additional Resources

DO YOU WANT TO LEARN MORE ABOUT AI/ML?

This repository also contains additional resources that you can utilize to teach yourself and learn AI/ML! Feel free to explore the resources directory for more information. Resources include:

These resources cover a wide range of topics, from textual analysis and data science pipelines to deep learning, neural networks, and representation learning for recommender systems. You can use these resources to enhance your knowledge and skills in AI/ML and apply them to real-world projects and applications.

Feel free to also check out my other GitHub projects for more AI/ML resources, tutorials, and code samples, or read my blog posts for insights on AI, machine learning, and SWE topics. I hope you find these resources helpful and informative as you continue your learning journey in AI/ML! 🚀

Conclusion

This RAG system for portfolio support in Google Colab demonstrates the integration of document retrieval, dynamic entity extraction, and external API calls to generate context-aware responses using a Hugging Face language model via Ollama. The system is designed to provide accurate and informative responses based on user queries and conversation history. By leveraging the power of AI models and external data sources, the system can assist users in accessing relevant information about PeakSpan MasterClasses, team profiles, investments, sectors, and more.

The system’s ability to maintain persistent memory, handle follow-up questions, and enrich responses with external API data makes it a valuable tool for portfolio management and information retrieval tasks. By combining document context, dynamic entity extraction, and API chaining, the system can generate comprehensive and context-aware responses that address user queries effectively.


Thank you for checking out this project today! 🙏 Happy coding! 🚀