I m trying to index an entire codebase

At a glance

The community member is trying to index a large codebase (200+ repositories) using llamaindex, but the parsing process is taking a long time. They believe the splitter is slow due to its serial nature and are wondering if there is a parallelized implementation available. The comments suggest that while there is no parallelized implementation yet, the community member could try parsing/threading each repository manually. Another comment mentions a resource on the llamahub.ai website that may be helpful. One community member notes that reading the documents is faster in a single-thread, but the indexing process is taking a lot of time.

Useful resources

oole

I'm trying to index an entire codebase (consists of 200+ repositories) with llamaindex and Parsing documents into nodes takes forever. I believe that the splitter is quite slow due to its being serial. Is there any parallelised implementation available?

4 comments

LLogan M

not yet, although that's good feedback. You could parse/thread each repo manually though

EEmanuel Ferreira

May it help you

https://llamahub.ai/l/github_repo

oole

Yes. What I've seen is reading the documents is faster in a single-thread but indexing is what's taking a lot of time

oole

Could you please let me know how this is helpful?

Add a reply

Find answers from the community

I m trying to index an entire codebase