Find answers from the community

Updated 6 months ago

RAG over code | πŸ¦œπŸ”— Langchain

At a glance

The post asks if there is a tutorial on how to use RAG (Retrieval Augmented Generation) over code in the LlamaIndex library, similar to the one in LangChain. The comments suggest using the CodeSplitter in LlamaIndex, which can split code into separate documents based on top-level functions and classes. Community members also discuss the need for a better code splitter in LlamaIndex and share a code example using LangChain's RecursiveCharacterTextSplitter with the LangchainNodeParser. However, there is no explicitly marked answer in the comments.

Useful resources
Hello, is there a tutorial to do RAG over code in llamaindex? similar to this one in langchain: https://js.langchain.com/docs/use_cases/rag/code_understanding
T
r
L
5 comments
@Teemu - the part of the langchain demo that seems particularly useful is their splitting strategy, which:
Keeps each top-level function and class in the code is loaded into separate documents.
Puts remaining into a separate document.
Retains metadata about where each split comes from

just doing n # of lines of code is not likely to produce good results.
Yea we haven't spent much time focusing on RAG for code. Would love to have a better code splitter contributed ❀️

Someone started a PR that's similar to what @rawwerks described, but the approach was not great, and the PR was also never completed πŸ˜…
i got this advice (use langchain w/ llamaindex) from Laurie Voss in preparation for my award-winning hackathon submission. sharing here to help others:

python
Plain Text
from langchain.text_splitter import RecursiveCharacterTextSplitter
from llama_index.node_parser import LangchainNodeParser

python_splitter = RecursiveCharacterTextSplitter.from_language(
    'python', chunk_size=50, chunk_overlap=0
)
parser = LangchainNodeParser(python_splitter)
nodes = parser.get_nodes_from_documents(documents)
(Which Laurie got from me, hehehe)

Its a good approach πŸ™‚ Someday we will add a better code splitter to llama-index
Add a reply
Sign up and join the conversation on Discord