cablecutter

Are the InstructorEmbeddings not working

Are the InstructorEmbeddings not working right now?

I have tried multiple examples including https://docs.llamaindex.ai/en/stable/examples/embeddings/huggingface.html#huggingfaceembedding
And
https://docs.llamaindex.ai/en/stable/examples/embeddings/custom_embeddings.html

Both result in the error:

Plain Text

/usr/local/lib/python3.10/dist-packages/sentence_transformers/SentenceTransformer.py in __init__(self, model_name_or_path, modules, device, cache_folder, trust_remote_code, revision, token, use_auth_token)
    192 
    193             if is_sentence_transformer_model(model_name_or_path, token, cache_folder=cache_folder, revision=revision):
--> 194                 modules = self._load_sbert_model(
    195                     model_name_or_path,
    196                     token=token,

TypeError: INSTRUCTOR._load_sbert_model() got an unexpected keyword argument 'token'

19 comments

ccablecutter

## Question regarding node

Question regarding node postprocessing and window:

1) Can the node parser "window" function be performed on nodes, or only documents? I have run the operation on nodes, however the "window" only includes the text from the current node. (would this require a custom parser?)
2) When running consecutive PostProcessing functions, can the 'window' text be considered in ReRankers, rather than the original 'text'?

IE I would like to process as follows:

1) docsplitter = CustomJSONNodeParser (which results in one node for each segment with the text / start / end / speaker)
2) WindowNodeParser = Include x "text" from surrounding nodes

1) Retrieve topk 10 nodes
2) Consider the node text to be the "window" metadata
3) GPT rerank based on the "window" metadata

Plain Text

Example of how the nodes are restructurred now using 

from llama_index.core.node_parser import SentenceWindowNodeParser

# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="text",
)

base_nodes = node_parser.get_nodes_from_documents(md_nodes)

13 comments

ccablecutter

Is there a llama function to update

Is there a llama function to update embed_keys_to_exclude / llm_keys_to_exclude? for all nodes / specific nodes?

1 comment

ccablecutter

When processing a JSON file https://docs

When processing a JSON file https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules.html

Can JSONNodeParser be combined with SentenceWindowNodeParser or SemanticSplitterNodeParser ?

I'd like to maintain the metadata of the original JSON structure and be able to Vector Query at a low level, but return a larger contextual window for processing with the LLM.

2 comments

ccablecutter

intfloat/multilingual-e5-large · Hugging...

Can anyone recommend an embedding model for swedish? I am currently using https://huggingface.co/intfloat/multilingual-e5-large but am not getting great results with RAG. It works, but seems oddly fixated on specific chunks without reason.

11 comments

ccablecutter

How do I use Langchain document loaders with Llamaindex?

@kapa.ai How do I use Langchain document loaders with Llamaindex?

5 comments

ccablecutter

Summarize the implementation of StreamlitChatPack to build a chatbot using streamlit and

@kapa.ai Summarize the implementation of StreamlitChatPack to build a chatbot using streamlit and llama index. Provide links to example code when possible.

2 comments

ccablecutter

How do I find the original prompt templates for various response synthesizers? How do I k

@kapa.ai How do I find the original prompt templates for various response synthesizers? How do I know which ones to modify and how to name them. Example text_qa_template refine_template

10 comments

ccablecutter

Within the LLAMAIndex Framework, how do I see the complete query sent to the LLM in order

@kapa.ai Within the LLAMAIndex Framework, how do I see the complete query sent to the LLM in order to monitor prompt formatting? Is there a verbose mode for the query engines?

6 comments

ccablecutter

Question regarding JSON LOADER / WHISPER

Question regarding JSON LOADER / WHISPER TRANSCRIPTS

The goal may be something as simple as retrieving segments (chunks) of the interview/transcript based on thematic codes (Drivers, Solutions, Barriers), summarization tasks, or additional Q&A.

The question revolves around the best doc loading / chunking / embedding strategy given the structure of whisper transcripts. If one wanted to maintain metadata in the doc (segment) level such as "speaker" "confidence" "timestamps", how would one then structure the chunks and embedding to maintain semantic cohesion?

IE we may have 15 lines from a 5000 line interview (whisper JSON file) which should be grouped together:
...
[
Speaker1: Asking a question
Speaker1: Continuing the same question
Speaker1: filler word
Speaker2: Asks clarifying question
Speaker 1: quick answer:
Speaker2: begins answering...
Speaker2: Continues....
Speaker2: Continues...
Speaker1: interrupts with quick clarifier
Speaker 2: continues...
(end of answer)
]
...

What are some methods to employ to isolate these high level question answer pairs from a whisper transcript? How can the JSON loader be employed, or are there best practices around whisper transcript RAG in general?

Thanks 🙂👀

5 comments

Find answers from the community

Are the InstructorEmbeddings not working

## Question regarding node

Question regarding node postprocessing and window:

IE I would like to process as follows:

Is there a llama function to update

When processing a JSON file https://docs

intfloat/multilingual-e5-large · Hugging...

How do I use Langchain document loaders with Llamaindex?

Summarize the implementation of StreamlitChatPack to build a chatbot using streamlit and

How do I find the original prompt templates for various response synthesizers? How do I k

Within the LLAMAIndex Framework, how do I see the complete query sent to the LLM in order

Question regarding JSON LOADER / WHISPER