Find answers from the community

Updated 2 years ago

Code

Meanwhile copy and pasting the code into chatgpt directly lead to a way better result
L
S
y
11 comments
Yea llama index is not optimized for code

Im pretty sure ChatGPT is not the same model as gpt-3.5-turbo, if you were curious

Code is tough for a bunch of reasons. Sure you can show the model a small function and ask what the problem is

But how do you index a codebase? The LLM needs to be aware of the entire namespace (function names, variables) and what they all do. But in most cases, that isn't going to all fit into one text chunk

And them if you do need to break code into nodes, you can randomly split a function in half, you need to keep everyting complete.
I tried to feed llamaindex a technical documentation, sometimes the response is somewhat random. Like making up a nonexistent url. Is there any way I can optimize llamaindex so that they keep information as complete as possible? so the chunks or nodes is not split up too small? @Logan M
Was the documentation just from one big file?

I know something that usually helps is splitting large documents into chapters or sections, before indexing. If these sections are different enough, they could even be their own indexes and used in something like a router query engine.

You can also customize the chunking logic a bit. By default, it will split into chunks of 1024 tokens, with some overlap.

Plain Text
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(chunk_size=1500)
index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)


You can also increase the top_k a bit, instead of larger chunks (the default is 2)

Plain Text
query_engine = index.as_query_engine(similarity_top_k=3)
Also for reference, if you are using gpt-3.5, that model isn't that great sometimes
is there an example of implementing a router query engine?
I use davinci as for now though
is there any documentation or reference on how to come up with the best chunk size or top_k or preprocess certain document so that produce best result?
@Logan M I would love to understand the underlying concept so I can make the best out of llamaindex
Not really any docs for this, a lot of it comes down to experimentation πŸ˜… generally the default chunk size works well (1024)

The top k can take some tweaking. But the higher it goes the higher the latency may become
by latency do you mean execution time?
yup! The biggest bottleneck to execution time is LLM calls

By default, llama index uses something called "compact" response mode, where all the text from the top_k nodes is stuffed into a single LLM call. If the text is too long though, it has to refine across multiple LLM calls, which is when the execution time increases due to adding LLM calls

With a top k of 3 and chunk size of 1024, it should only ever make 1 LLM call
Add a reply
Sign up and join the conversation on Discord