Find answers from the community

Updated 3 months ago

RAG over code | πŸ¦œπŸ”— Langchain

Hello, is there a tutorial to do RAG over code in llamaindex? similar to this one in langchain: https://js.langchain.com/docs/use_cases/rag/code_understanding
T
r
L
5 comments
@Teemu - the part of the langchain demo that seems particularly useful is their splitting strategy, which:
Keeps each top-level function and class in the code is loaded into separate documents.
Puts remaining into a separate document.
Retains metadata about where each split comes from

just doing n # of lines of code is not likely to produce good results.
Yea we haven't spent much time focusing on RAG for code. Would love to have a better code splitter contributed ❀️

Someone started a PR that's similar to what @rawwerks described, but the approach was not great, and the PR was also never completed πŸ˜…
i got this advice (use langchain w/ llamaindex) from Laurie Voss in preparation for my award-winning hackathon submission. sharing here to help others:

python
Plain Text
from langchain.text_splitter import RecursiveCharacterTextSplitter
from llama_index.node_parser import LangchainNodeParser

python_splitter = RecursiveCharacterTextSplitter.from_language(
    'python', chunk_size=50, chunk_overlap=0
)
parser = LangchainNodeParser(python_splitter)
nodes = parser.get_nodes_from_documents(documents)
(Which Laurie got from me, hehehe)

Its a good approach πŸ™‚ Someday we will add a better code splitter to llama-index
Add a reply
Sign up and join the conversation on Discord