Find answers from the community

Updated 3 months ago

```py

Plain Text
llm = LlamaCpp(
        model_path=r'C:\Users\UserAdmin\Desktop\vicuna\Wizard-Vicuna-30B-Uncensored.ggmlv3.q2_K.bin', 
        verbose=False,
        n_ctx=2048,
        n_gpu_layers=55,
        n_batch=512,
        n_threads=11,
        temperature=0.65)
embed_model = LangchainEmbedding(HuggingFaceEmbeddings(
        model_name=r".\all-mpnet-base-v2",
        model_kwargs={'device': 'cuda'})
                )
llm_predictor = LLMPredictor(llm=llm)

    
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size = 200, embed_model=embed_model)

documents = SimpleDirectoryReader(r'.\data\pdfs').load_data()
 
index =  VectorStoreIndex.from_documents(documents, service_context=service_context)

query_engine = index.as_query_engine(text_qa_template=QA_TEMPLATE)
L
H
9 comments
it uses cuda by default anyways, so specifying the device won't do much πŸ˜…

Assuming you have the room, you can try increasing the batch size (default is 10)

Plain Text
LangchainEmbedding(HuggingFaceEmbeddings(...), embed_batch_size=20)


I don't actually know for sure if this works with huggingface embeddings, but worth a shot!
yeah im not sure that there's any noticiable positive changes from that
somehow loading the documents took 10% longer hahahah
thank you so much though
i think there's just no way to make it more efficient i guess?
i noticed an appreciable decrease in loading time for documents AND indexing if i increase chunk size. are there are negative repercussions for that/
im planning to do semantic search over a set of documents
the only issue i can think of is that if i increase chunk size, it might go beyond the context size?
llama-index will try to ensure that doesn't happen, but I wouldn't push the chunk size any higher than ~3000

Larger chunk sizes will also mean longer response times though πŸ€”
Add a reply
Sign up and join the conversation on Discord