Find answers from the community

v
valu
Offline, last seen 3 months ago
Joined September 25, 2024
v
valu
·

Jina

i dno if im doing something wrong.
Plain Text
from llama_index.core import VectorStoreIndex
from llama_index.core.schema import TextNode
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(
    model_name="jinaai/jina-embeddings-v2-base-en",
)
nodes = [TextNode(text="first question to match", id_ = "1"), TextNode(text="this is a simulation", id_ = "2")]
index = VectorStoreIndex(nodes , embed_model = embed_model, show_progress=True)
vector_retriever = index.as_retriever(similarity_top_k=10)
matches = vector_retriever.retrieve("first question to match")
for node in matches:
    print(node.get_score())
    print(node.get_text())

any advice on how to improve this? I find bm25 embeddings to do better with real content, so am trying hybrid search. but quite disappointed with the semantic search
5 comments
v
L
v
valu
·

Cache

Plain Text
embed_model = HuggingFaceEmbedding(
    model_name="jinaai/jina-embeddings-v2-base-en",
)

is there a way to cache the model locally?
3 comments
L
v
v
valu
·

Questions

Hey if I have a an array of a thousand questions and I want to search for similarity to a specific question - what's the best way to approach this? Add the bank to a chromadb with metadata and then do a search?
13 comments
v
L
if you have research papers -> what's the best way to extract and chunk the data? I want the LLM to reference the journal article when providing answers
4 comments
T
v