The post asks if anyone has created an index over GitHub repositories and what the best way to do that is. Community members suggest using a GH data loader from a website called LlamaHub, and provide some code examples to help with the implementation. However, one community member is having trouble getting the data loader to work, encountering an error when trying to import the necessary modules. Another community member provides a potential solution, and the original community member confirms that it worked. But they then run into an issue where the indexer only seems to be indexing plain text files, not code files. The community members continue discussing potential solutions and troubleshooting the issue, but there is no explicitly marked answer.
hey @Max π did you mange use the github_repo data loader? i can't seem to get it working, i get the error: ModuleNotFoundError: No module named 'gpt_index' when trying to run the file to create the index. seems like I cant import the GithubClient or the GithubRepositoryReader...
@jerryjliu0 yes this worked: from llama_index.readers.llamahub_modules import GithubClient, GithubRepositoryReader , thank you so much π
I have managed to index the files, but it seems that it only indexing plain text files.. not code. this is my implementation:
is the filter_directories recursive? meaning that every folder inside the top folder I specify will be indexed? when trying index this folder with only .cs files, the json result file is empty (using 0 tokens when indexing).
INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens INFO:root:> [build_index_from_documents] Total embedding token usage: 0 tokens