I am using an index in JSON format which

At a glance

I am using an index in JSON format, which is over ten megabytes in size and the query speed is very slow. If I switch to a vector database, will the speed improve?

8 comments

LLogan M

10MB is not that bad. I've used up to 2GB JSON files and the speed was fast

What does your query look like? Just using GPTSimpleVectorIndex?

LLogan M

Sometimes OpenAI servers are very busy too, which slows things down

KKira 💎 Glory Lab

Plain Text

# define llm predictor
llm_predictor = LLMPredictor(
    llm=ChatOpenAI(
        model_name="gpt-3.5-turbo",
        max_tokens=1024,
        openai_api_key=api_key,
        temperature=0.2,
        streaming=False,
    )
)

# define prompt helper
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_output = 512
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor,
    prompt_helper=prompt_helper,
    chunk_size_limit=2048,
)


index_file = 'indices/index.json'
os.makedirs('indices', exist_ok=True)

if os.path.exists(index_file):
    index = GPTSimpleVectorIndex.load_from_disk(index_file, service_context=service_context)
else:
    index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
    index.save_to_disk(index_file)

result = index.query(
            query_text,
            text_qa_template=QA_PROMPT,
            response_mode="tree_summarize",
            similarity_top_k=4,
            mode=QueryMode.EMBEDDING,
            streaming=False,
        )

This is the code I use to load the index and perform queries, which takes more than ten seconds to get an answer. In fact, the answers provided by ChatGPT are not very long.

Am I setting some parameters improperly? I'm very sorry for taking up your time. 🙇

LLogan M

ah, I think response_mode="tree_summarize" makes more than one call to the LLM, which is the main reason why it might be a little slow. OpenAI servers can also be overloaded at certain times, which slows things down even more 😦

I might also decrease the chunk size a bit (maybe to 1024? It's up to you though, maybe you've tuned this already). You could also decrease max_tokens to 512, but again, you've probably tuned these parameters already and are avoid cut-off responses 😉

I'm not sure if your query requires tree_summarize or not, but you could try removing it and see what happens lol.

KKira 💎 Glory Lab

This is my understanding of this:
Reducing chunk size and max_tokens may reduce the accuracy of the search, but it will improve the search speed. However, it should still be determined based on the content and type of text.
The most significant factor affecting speed is interaction with ChatGPT. If there are more than one call, then response time will significantly increase.

LLogan M

Yes, that's it 👍

KKira 💎 Glory Lab

Thank you very much! I will try again. Since building a new index would cost several dollars, I am a bit hesitant to do these operations. 🤦

LLogan M

Yea, you can try removing the tree_summarize before you attempt a re-build, maybe the results will be what you expect but faster 🙂

Add a reply

Find answers from the community

I am using an index in JSON format which