Codebases can contain a lot of irrelevant files that might trip up the LLM. To control which files are added to the retrieval set, you can specify an inclusion or exclusion file in the following format:

# This is a comment
ext:.my-ext-1
ext:.my-ext-2
ext:.my-ext-3
dir:my-dir-1
dir:my-dir-2
dir:my-dir-3
file:my-file-1.md
file:my-file-2.py
file:my-file-3.cpp

where:

  • ext specifies a file extension
  • dir specifies a directory. This is not a full path. For instance, if you specify dir:tests in an exclusion directory, then /path/to/my/tests/file.py will be ignored.
  • file specifies a file name. This is also not a full path. For instance, if you specify file:__init__.py, then /path/to/my/__init__.py will be ignored.

To specify an inclusion file (i.e. only index the specified files):

sage-index $GITHUB_REPO --include=/path/to/inclusion/file

To specify an exclusion file (i.e. index all files, except for the ones specified):

sage-index $GITHUB_REPO --exclude=/path/to/exclusion/file

By default, we use the exclusion file sample-exclude.txt.