Find answers from the community

Updated 2 months ago

πŸ†˜ HELP!! Does anyone has experience

πŸ†˜ HELP!! Does anyone has experience with none Latin documents and data in llama-index. Specially Arabic alphabets. Is llama-index default tokenizer and embeddings fit for Arabic documents. Any idea or experience in this field!!!!
L
H
4 comments
The default OpenAI embeddings should be fine for multilingual/non-english data
Thanks Logan, I am asking this because I have indexed a corpus of non English data, but when querying the indexed data I don't get the desired results. It is unable to pick some context which I am certain it exists in the indexed data. Maybe I have to tweak with the retriever or some other settings of the prompt and the query.
@Logan M Any idea how could i get the embeddings' value of my prompt in llama-index?
Plain Text
from llama_index.embeddings import OpenAIEmbedding
embed_model = OpenAIEmbedding()

doc_embed = embed_model.get_text_embedding("my doc")
query_embeds = embed_model.get_query_embedding("My query")
Add a reply
Sign up and join the conversation on Discord