Find answers from the community

Updated 3 months ago

Code

Hi, I am loading GitHub repos into llamaindex and using GPTVectorStoreIndex for indexing. I am trying to get the GPT to help me answer questions about the code. I am getting okay results, but it could be a lot better. It works well if the question about the code is present in the documentation, but outside of documented code it isn't doing a good job.

Would you happen to know if there is a better way of doing this? πŸ™

I thought ListIndex might be better, but it takes 5+ minutes to get an answer and I am also running into context limit errors, I would probably have to use some larger 32K context model for example. But that would get quite expensive.
L
M
6 comments
Hmm there shouldn't be context window errors, at least with default settings

In Amy case, I'm pretty sure the github reader doesn't actually load source code, only text files like markdown

Code is super tricky to work with tbh. You have to be careful to not chunk functions in half. LlamaIndex takes a lot of work to work well with code from what I've seen πŸ€”
Hmm there shouldn't be context window errors, at least with default settings

I guess I must have done something wrong then πŸ˜„ Am I correct in believing that ListIndex would work best for this use case? If yes, I might have to give it another try.

In Amy case, I'm pretty sure the github reader doesn't actually load source code, only text files like markdown

Wait, github reader doesn't load source code files like .py, .js etc? Are you sure? It appears that it can do that from the image I sent.
I haven't specific this parameter filter_file_extensions = ([".py"], GithubRepositoryReader.FilterType.INCLUDE) in my code during building of the documents though, so I wonder if by default it reads only text files like you said? πŸ€”

Code is super tricky to work with tbh. You have to be careful to not chunk functions in half. LlamaIndex takes a lot of work to work well with code from what I've seen :thinking:

I see, well I'm probably not smart enough yet to make it work haha πŸ˜„
Attachment
Snimek_obrazovky_2023-07-12_173534.png
ah nvm, you are correct about the file extensions, whoops lol
I still don't think a list index is a good application for this though. The ideal solution is probably a vector index, with some complicated custom retriever code lol
A list index will read every file in the index -- which as you said is very slow
oh well, i guess i will have to wait for someone smarter than me to do it heheπŸ˜… appreciate your help, thank you πŸ™‚
Add a reply
Sign up and join the conversation on Discord