valu

Jina

i dno if im doing something wrong.

Plain Text

from llama_index.core import VectorStoreIndex
from llama_index.core.schema import TextNode
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(
    model_name="jinaai/jina-embeddings-v2-base-en",
)
nodes = [TextNode(text="first question to match", id_ = "1"), TextNode(text="this is a simulation", id_ = "2")]
index = VectorStoreIndex(nodes , embed_model = embed_model, show_progress=True)
vector_retriever = index.as_retriever(similarity_top_k=10)
matches = vector_retriever.retrieve("first question to match")
for node in matches:
    print(node.get_score())
    print(node.get_text())

any advice on how to improve this? I find bm25 embeddings to do better with real content, so am trying hybrid search. but quite disappointed with the semantic search

5 comments

vvalu

Cache

Plain Text

embed_model = HuggingFaceEmbedding(
    model_name="jinaai/jina-embeddings-v2-base-en",
)

is there a way to cache the model locally?

3 comments

vvalu

Hey if I have a an array of a thousand questions and I want to search for similarity to a specific question - what's the best way to approach this? Add the bank to a chromadb with metadata and then do a search?

13 comments

vvalu

if you have research papers what s the

if you have research papers -> what's the best way to extract and chunk the data? I want the LLM to reference the journal article when providing answers

4 comments

Find answers from the community

Jina

Cache

Questions

if you have research papers what s the