Install Sage

First, make sure that pipx is installed in your environment. Then install Sage:

pipx install git+https://github.com/Storia-AI/sage.git@main

Install Ollama

To run any open-source LLM directly on your machine, use Ollama:

  1. Go to ollama.com and download the appropriate binary for your machine.
  2. Open a new terminal window.
  3. Pull the desired model, for instance: ollama pull llama3.1.

Optional: Index the codebase with Marqo

If your codebase is small (less than 100 files), you can skip this section.

For larger codebases, we cannot afford to use the default LLM-based retrieval. Instead, we need to use vector-based retrieval, which requires chunking, embedding, and storing the codebase in a vector store. To achieve this locally, we use Marqo.

Install Marqo

First, make sure you have Docker installed. Then run:

docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest

This will open a persistent Marqo console window. It should take 2-3 minutes on a fresh install.

Index the codebase

To chunk, embed and store your codebase in the Marqo vector store, run the following command (replacing huggingface/transformers with your desired GitHub repository):

sage-index huggingface/transformers --mode=local

Chat with any codebase locally

You are now ready to chat with your codebase. Run the following command, making sure to replace huggingface/transformers with your desired GitHub repository:

sage-chat huggingface/transformers --mode=local

You should now have a Gradio app running on localhost. Happy (local) chatting!

Customizations

You can customize your chat experience by passing any of the following flags to the sage-chat command:

--llm-model
string

You can select any LLM from the Ollama library. By default, we use llama3.1.

--embedding-model
string

The embedding model is responsible for converting text to dense vectors. You can select any embedding model from Hugging Face (the MTEB leaderboard might be helpful). By default, we use hf/e5-base-v2.

--reranker-model
string

After retrieving the top 25 relevant files from the vector database, we use a reranker to narrow it down to top 5. You can choose any reranker from Hugging Face (check out the “Reranking” tab under the MTEB leaderboard). By default, we use the cross-encoder/ms-marco-MiniLM-L-6-v2.