Find answers from the community

Updated 4 months ago

bigcode/starencoder Β· Hugging Face

At a glance

The community member is considering using the RAG (Retrieval-Augmented Generation) plugin in VS Code and is seeking feedback on the best embedding choice and retrieval strategy for code. Another community member suggests using Llamaindex's CodeSplitter for better code chunking, but notes that querying a codebase can be quite difficult. The discussion also mentions a blog post from Sweep on the topic of chunking improvements, which the community member plans to explore further.

Useful resources
Hi there!
I am thinking of doing RAG plugin on VS Code. There is the question of embedding choice for code and retrieval strategy.
Do you guys have good feedback on using RAG for code and what model / retrieval were used? I saw that HF has an embedding for code for instance but wondering if there are other smaller options https://huggingface.co/bigcode/starencoder
T
D
L
8 comments
Is there an end to end example of querying a codebase with it?
Not that I've seen actually. Chunking is one step, but querying a code base can be quite difficult from what I've seen πŸ€”
I see! I think I can get something quick and dirty running and see how it works πŸ˜„
Have you seen good resource on the topic?
Sweep has some interesting blogs on the subject

Here's one example

https://docs.sweep.dev/blogs/chunking-improvements
Super cool! Will try to hack something this weekend
Thanks @Logan M πŸ™‚
Add a reply
Sign up and join the conversation on Discord