RAG over code | 🦜🔗 Langchain

At a glance

The post asks if there is a tutorial on how to use RAG (Retrieval Augmented Generation) over code in the LlamaIndex library, similar to the one in LangChain. The comments suggest using the CodeSplitter in LlamaIndex, which can split code into separate documents based on top-level functions and classes. Community members also discuss the need for a better code splitter in LlamaIndex and share a code example using LangChain's RecursiveCharacterTextSplitter with the LangchainNodeParser. However, there is no explicitly marked answer in the comments.

Useful resources

rrichard1861

Hello, is there a tutorial to do RAG over code in llamaindex? similar to this one in langchain: https://js.langchain.com/docs/use_cases/rag/code_understanding

5 comments

TTeemu

Have you tried the CodeSplitter? https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules.html#codesplitter

rrawwerks

@Teemu - the part of the langchain demo that seems particularly useful is their splitting strategy, which:
Keeps each top-level function and class in the code is loaded into separate documents.
Puts remaining into a separate document.
Retains metadata about where each split comes from

just doing n # of lines of code is not likely to produce good results.

LLogan M

Yea we haven't spent much time focusing on RAG for code. Would love to have a better code splitter contributed ❤️

Someone started a PR that's similar to what @rawwerks described, but the approach was not great, and the PR was also never completed 😅

rrawwerks

i got this advice (use langchain w/ llamaindex) from Laurie Voss in preparation for my award-winning hackathon submission. sharing here to help others:

python

Plain Text

from langchain.text_splitter import RecursiveCharacterTextSplitter
from llama_index.node_parser import LangchainNodeParser

python_splitter = RecursiveCharacterTextSplitter.from_language(
    'python', chunk_size=50, chunk_overlap=0
)
parser = LangchainNodeParser(python_splitter)
nodes = parser.get_nodes_from_documents(documents)

LLogan M

(Which Laurie got from me, hehehe)

Its a good approach 🙂 Someday we will add a better code splitter to llama-index

Add a reply

Find answers from the community

RAG over code | 🦜🔗 Langchain