Find answers from the community

Updated 2 years ago

has anybody created an index over github

At a glance

The post asks if anyone has created an index over GitHub repositories and what the best way to do that is. Community members suggest using a GH data loader from a website called LlamaHub, and provide some code examples to help with the implementation. However, one community member is having trouble getting the data loader to work, encountering an error when trying to import the necessary modules. Another community member provides a potential solution, and the original community member confirms that it worked. But they then run into an issue where the indexer only seems to be indexing plain text files, not code files. The community members continue discussing potential solutions and troubleshooting the issue, but there is no explicitly marked answer.

Useful resources
has anybody created an index over github repos? If so, what was the best way to do that?
j
M
l
10 comments
you can try using our GH data loader! https://llamahub.ai/l/github_repo
also cc @HAL 9000
(of course the actual index to use will be somewhat dependent on your use case, would love to hear your feedback there)
ah very cool, thanks!
hey @Max πŸ™‚ did you mange use the github_repo data loader? i can't seem to get it working, i get the error: ModuleNotFoundError: No module named 'gpt_index' when trying to run the file to create the index. seems like I cant import the GithubClient or the GithubRepositoryReader...
Attachment
image.png
could you try something like this? from llama_index.readers.llamahub_modules import GithubRepositoryReader
@jerryjliu0 yes this worked: from llama_index.readers.llamahub_modules import GithubClient, GithubRepositoryReader , thank you so much πŸ˜„

I have managed to index the files, but it seems that it only indexing plain text files.. not code. this is my implementation:

is the filter_directories recursive? meaning that every folder inside the top folder I specify will be indexed? when trying index this folder with only .cs files, the json result file is empty (using 0 tokens when indexing).

INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 0 tokens

maybe I am misunderstanding the filtering...
Attachment
image.png
cc @HAL 9000 for more info on the github reader!
@lars yeah given the output it doesn't look like its indexing anything atm
i created a thread called and cc’d him. keep up the good work. πŸ™Œ
Add a reply
Sign up and join the conversation on Discord