CodeNav: Beyond tool-use to using real-world codebases with LLM agents 🚀

Visualization of the CodeNav agent. A user query is processed by an agent that interfaces with several environments to write code to answer the query.

CodeNav is an LLM agent that navigates and leverages previously unseen code repositories to solve user queries. In contrast to tool-use LLM agents that require "registration" of all relevant tools via manual descriptions within the LLM context, CodeNav automatically indexes and searches over code blocks in the target codebase, finds relevant code snippets, imports them, and uses them to iteratively generate a solution with execution feedback.

Getting Started 🛠️

You can use CodeNav as a command line tool or programmatically as a Python module. In either case, you'll first want to install CodeNav:

pip install git+https://github.com/allenai/codenav

CodeNav as a command line tool

After installing codenav, you can use it as a command line tool by running :

codenav init # Downloads/starts the Elasticsearch search index CodeNav depends to search for code snippets

and then

codenav query \
  --code_dir /PATH/TO/CODEBASE/YOU/WANT/CODENAV/TO/USE \
  --playground_dir /WORKING/DIRECTORY/FOR/CODENAV/AGENT \
  --query "Query you want CodeNav to answer using the above codebase"

You can find other command line options by running codenav --help. For example, you might run something like

codenav query \
  --code_dir /PATH/TO/THIS/REPO/codenav \
  --playground_dir /PATH/TO/THIS/REPO/playground \
  --query "Write a google-style documentation string for the DoneEnv class and save it to DoneEnv.py"

Running the above results in the CodeNav agent saving a file DoneEnv.py with contents:

Click to see DoneEnv.py contents

class DoneEnv(CodeNavEnv):
    """
    DoneEnv is an environment class that handles the 'done' action in the CodeNav framework.

    Methods:
        check_action_validity(action: CodeNavAction) -> Tuple[bool, str]:
            Checks if the given action is valid for the 'done' action.

        step(action: CodeNavAction) -> None:
            Executes the 'done' action.
    """
    def check_action_validity(self, action: CodeNavAction) -> Tuple[bool, str]:
        """
        Checks if the given action is valid for the 'done' action.

        Args:
            action (CodeNavAction): The action to be validated.

        Returns:
            Tuple[bool, str]: A tuple containing a boolean indicating validity and an error message if invalid.
        """
        assert action.content is not None

        if action.content.strip().lower() in ["true", "false"]:
            return True, ""
        else:
            return (
                False,
                "When executing the done action, the content must be either 'True' or 'False'",
            )

    def step(self, action: CodeNavAction) -> None:
        """
        Executes the 'done' action.

        Args:
            action (CodeNavAction): The action to be executed.
        """
        return None

Note: the codenav command line tool is simply an alias for running the codenav_run.py so you can replace codenav ... with python -m codenav.codenav_run ... or python /path/to/codenav/codenav_run.py ... and obtain the same results.

Here's a more detailed description of the arguments you can pass to codenav query or python -m codenav.codenav_run query:

Argument	Type	Description
`--code_dir`	str	The path to the codebase you want CodeNav to use. By default all files in this directory will get indexed with relative file paths. For instance, if you set `--code_dir /Users/tanmay/codebase` which contains a `computer_vision/tool.py` file then this file will be indexed with relative path `computer_vision/tools.py`
`--force_subdir`	str	If you wish to only index a subdirectory within the code_dir then set this to the name of the sub directory
`--module`	str	If you have a module installed e.g. via `pip install transformers` and you want CodeNav to use this module, you can simply set `--module transformers` instead of providing `--code_dir`
`--repo_description_path`	str	If you have a README file or a file with a description of the codebase you are using, you can provide the path to this file here. You may use this file to point out to CodeNav the high-level purpose and structure of the codebase (e.g. highlight important directories, files, classes or functions)
`--force_reindex`	bool	Set this flag if you want to force CodeNav to reindex the codebase. Otherwise, CodeNav will reuse an existing index if it exists or create one if it doesn't
`--playground_dir`	str	The path specified here will work as the current directory for CodeNav's execution environment
`--query`	str	The query you want CodeNav to solve using the codebase
`--query_file`	str	If your query is long, you may want to save it to a txt file and provide the path to the text file here
`--max_steps`	int	The maximum number of interactions to allow between CodeNav agent and environments

CodeNav as a library

If you'd like to use CodeNav programmatically, you can do so by importing the codenav module and using the various functions/classes we provide. To get a sense of how this is done, we provide a number of example scripts under the codenav_examples directory:

create_index.py: Creates an Elasticsearch index for this codebase and then uses the RetrievalEnv environment to search for a code snippet.
create_episode.py: Creates an OpenAICodeNavAgent agent and then uses it to generate a solution for the query "Find the DoneEnv and instantiate it" on this codebase (i.e. executes a CodeNav agent on the CodeNav codebase). Be sure to run the create_index.py script above to generate the index before running this script.
create_code_env.py): Creates a PythonCodeEnv object and then executes a given code string in this environemnt
create_prompt.py: Creates a custom prompt and instantiates and CodeNav agent with that prompt.
parallel_evaluation.py: Demonstrates how to run multiple CodeNav agents in parallel. This is useful for evaluating on a dataset of queries using multiple processes. The EvalSpec abstraction also helps you organize the code a little better!

Note - You will still need to launch ElasticSearch server before running any of the above. To do so run

python -m codenav.codenav_run init

Elasticsearch & Indexing Gotchas 🤔

When running CodeNav you must start an Elasticsearch index on your machine (e.g. by running codenav init) and once you run a query on a given codebase, CodeNav will index that codebase exactly once. This process means there are two things you should keep in mind:

You must manually shut off the Elasticsearch index once you are done with it. You can do this by running codenav stop.
If you modify/update the codebase you are asking CodeNav to use the Elasticsearch index will not automatically update and thus CodeNav will be writing code using stale information. In this case, you should add the --force_reindex flag when running codenav query, this will force CodeNav to reindex the codebase.
If you run CodeNav and find that it is unable to search for a file, you may want to make sure the file was indexed correctly. You can inspect all indexed files using Elasticsearch's Kibana interface at http://localhost:5601/. To view all the indices index by CodeNav, go to http://localhost:5601/app/management/data/index_management. Then click on the index you want to inspect and the click on "Discover Index" on the top-right side of the page. This will show you all the code blocks stored in this index. You can now use the UI to run queries against this index and see if the file your are looking for is present in the index and if it has the correct file path.

Warning ⚠️

CodeNav is a research project and may make errors. As CodeNav can potentially execute ANY code it wants, it is not suitable for security sensitive applications. We strongly recommend that you run CodeNav in a sandboxed environment where data loss or security breaches are not a concern.

Authors ✍️

License 📄

This project is licensed under the Apache 2.0 License.

Citation

@misc{gupta2024codenavtooluseusingrealworld,
  title={CodeNav: Beyond tool-use to using real-world codebases with LLM agents}, 
  author={Tanmay Gupta and Luca Weihs and Aniruddha Kembhavi},
  year={2024},
  eprint={2406.12276},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2406.12276}, 
}

CodeNav builds along the research direction we started exploring with VisProg (CVPR 2023 Best Paper). For more context please visit https://github.com/allenai/visprog/blob/main/README.md.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
codenav		codenav
codenav_examples		codenav_examples
playground		playground
scripts		scripts
.VERSION		.VERSION
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CodeNav: Beyond tool-use to using real-world codebases with LLM agents 🚀

Getting Started 🛠️

CodeNav as a command line tool

CodeNav as a library

Elasticsearch & Indexing Gotchas 🤔

Warning ⚠️

Authors ✍️

License 📄

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

allenai/codenav

Folders and files

Latest commit

History

Repository files navigation

CodeNav: Beyond tool-use to using real-world codebases with LLM agents 🚀

Getting Started 🛠️

CodeNav as a command line tool

CodeNav as a library

Elasticsearch & Indexing Gotchas 🤔

Warning ⚠️

Authors ✍️

License 📄

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages