Find answers from the community

V
Varrek
Offline, last seen 3 months ago
Joined September 25, 2024
V
Varrek
·

Hey team

Hey team
Is it possible to get documents from existing index?
Want to use it together with
Plain Text
dataset_generator = RagDatasetGenerator.from_documents(
    documents,
    service_context=gpt_4_context,
    num_questions_per_chunk=2,  # set the number of questions per nodes
    show_progress=True,
)

But when doing it with index.docstore.todiocs have an error 'str' object has no attribute 'id'
1 comment
W
V
Varrek
·

Hey team

Hey team
I use chat engine in context mode to Q&A over context. Even if I specifying that model should use only context, it still answers some general questions and adds additional info to specific questions, that is not available in the context.
Is there a way to use only context. Tried different promts.
8 comments
W
V
And one more question. I have data in Dutch, but customer wants to use 4 different languages to chat with a chatbot. Have someone used multilingual model for the embeddings? Which one performs the best? Any other stuff to consider?
1 comment
T
V
Varrek
·

Hey team

Hey team
I used custom model to create multilingual embeddings for DOcumentSummaryIndex:
Plain Text
from langchain.embeddings import HuggingFaceEmbeddings, SentenceTransformerEmbeddings
embed_model = embeddings = SentenceTransformerEmbeddings(model_name="intfloat/multilingual-e5-large")
    service_context = ServiceContext.from_defaults(
        callback_manager=callback_manager, llm=llm, embed_model=embed_model
    )
    urls = get_urls_from_xlsx_file(filepath)
    documents = reader.load_data(urls)
    response_synthesizer = get_response_synthesizer(use_async=True)
    index = GPTDocumentSummaryIndex.from_documents(
        documents,
        service_context=service_context,
        response_synthesizer=response_synthesizer,
        show_progress=True,
    )
    index.storage_context.persist(save_index_path)

howerer, when I tried to query it I got an error that product = np.dot(embedding1, embedding2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: shapes (1536,) and (1024,) not aligned: 1536 (dim 0) != 1024 (dim 0)

Query code:
Plain Text
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
    llama_debug = LlamaDebugHandler(print_trace_on_end=True)
    callback_manager = CallbackManager([llama_debug])

    embed_model  = SentenceTransformerEmbeddings(model_name="intfloat/multilingual-e5-large")
    service_context = ServiceContext.from_defaults(
        callback_manager=callback_manager, llm=llm, embed_model=embed_model
    )
    index = load_index(index_path)
    query_engine = index.as_query_engine(
        verbose=True,
        retriever_mode="embedding",
        service_context=service_context
    )
    while True:
        prompt = input("Type prompt...")
        response = query_engine.query(prompt)
        print(response)

Can someone please point me out to the issue root cause?
2 comments
V
L
V
Varrek
·

Hey team

Hey team,
I am trying to use DocumentSummaryIndex and experiencing a very slow response time (up to 1 minute). I noticed that in the index, which I saved locally, the "embedding" field is empty. How can I use this field or improve the performance of the Document Summary index? Creation of index and inference below.


Thanks!
9 comments
V
L