Code

At a glance

Hi, I am loading GitHub repos into llamaindex and using GPTVectorStoreIndex for indexing. I am trying to get the GPT to help me answer questions about the code. I am getting okay results, but it could be a lot better. It works well if the question about the code is present in the documentation, but outside of documented code it isn't doing a good job.

Would you happen to know if there is a better way of doing this? 🙏

I thought ListIndex might be better, but it takes 5+ minutes to get an answer and I am also running into context limit errors, I would probably have to use some larger 32K context model for example. But that would get quite expensive.

6 comments

LLogan M

Hmm there shouldn't be context window errors, at least with default settings

In Amy case, I'm pretty sure the github reader doesn't actually load source code, only text files like markdown

Code is super tricky to work with tbh. You have to be careful to not chunk functions in half. LlamaIndex takes a lot of work to work well with code from what I've seen 🤔

MMaker

Hmm there shouldn't be context window errors, at least with default settings

I guess I must have done something wrong then 😄 Am I correct in believing that ListIndex would work best for this use case? If yes, I might have to give it another try.

In Amy case, I'm pretty sure the github reader doesn't actually load source code, only text files like markdown

Wait, github reader doesn't load source code files like .py, .js etc? Are you sure? It appears that it can do that from the image I sent.
I haven't specific this parameter filter_file_extensions = ([".py"], GithubRepositoryReader.FilterType.INCLUDE) in my code during building of the documents though, so I wonder if by default it reads only text files like you said? 🤔

Code is super tricky to work with tbh. You have to be careful to not chunk functions in half. LlamaIndex takes a lot of work to work well with code from what I've seen :thinking:

I see, well I'm probably not smart enough yet to make it work haha 😄

Attachment

LLogan M

ah nvm, you are correct about the file extensions, whoops lol

LLogan M

I still don't think a list index is a good application for this though. The ideal solution is probably a vector index, with some complicated custom retriever code lol

LLogan M

A list index will read every file in the index -- which as you said is very slow

MMaker

oh well, i guess i will have to wait for someone smarter than me to do it hehe😅 appreciate your help, thank you 🙂

Add a reply

Find answers from the community

Code