denen99

T9ne counting

I am trying to figure out how to get an accurate token count when using AWS Bedrock. The callbacks do not seem to match what Bedrock is returning and I need to capture this accurately for billing reasons, is there a way to solve this?

Bedrock - 'X-Amzn-Bedrock-Output-Token-Count': '247', 'X-Amzn-Bedrock-Input-Token-Count': '836'}

My Callback -
Embedding Tokens: 10
LLM Prompt Tokens: 1771
LLM Completion Tokens: 484
Total LLM Token Count: 2255

7 comments

ddenen99

I am trying to see if there is a way to

I am trying to see if there is a way to run my own node_postprocessor over an HTTP call . The reason for this is the main LlamaIndex ragapp is a lambda and the startup costs of loading a bge re-ranker is high so I was going to have it pre-loaded behind a mini flask app. Are there any guides on how to make a huggingface reranker model like BGE or JINA work behind an API the way like, a cohere or others do? Would i have to write my own postprocessor implementation that wraps this ?

3 comments

ddenen99

Metadata

What is the idiomatic way to add custom metadata to a document? I need to keep track of an "external_id" for potential future deletion and curious about the right way to do this. Should i use the file_metadata() callback in the load_file() method when loading a new file ?

3 comments

ddenen99

Anyone know why the JSONReader at https

Anyone know why the JSONReader at https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/readers/json.py#L51 is not included in the default list of file extensions to Readers map https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/readers/file/base.py#L69 ?

5 comments

ddenen99

I am running into a weird issue when

I am running into a weird issue when trying to parse a CSV File, hitting the OpenAI token limit when trying to generate embeddings using text-embedding-3-large.. Anything stand out int his code that would cause the issue ?

Plain Text

embedding = OpenAIEmbedding(api_key="XXX", model="text-embedding-3-large")

node_parser = SentenceWindowNodeParser.from_defaults(window_size=3)
dir_reader = SimpleDirectoryReader(input_files=[tmpfile])
docs = dir_reader.load_data(show_progress=True)
for doc in docs:
    doc.metadata["external_id"] = external_id

nodes = node_parser.get_nodes_from_documents(docs, show_progress=True)

print("Getting batched embeddings for nodes from embedding " + embedding.model_name + "..")
text_chunks = [node.get_content(metadata_mode=MetadataMode.EMBED) for node in nodes]
embeddings = embedding.get_text_embedding_batch(text_chunks, show_progress=True)

This then errors out with

Plain Text

openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 71420 tokens (71420 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

12 comments

ddenen99

i am looking to see how i can increase

i am looking to see how i can increase the speed of generating embeddings, currently a large file is taking several minutes. Shoudl this be moved to a pipeline? This is running on AWS Lambda with LlamaIndex 0.9 (still working on 0.10 upgrade) . Embedding is OpenAI text-embedding-large-3

Plain Text

  def add_nodes(self, nodes):
        return self.vector_store.add(nodes)

    def add_nodes_from_file(
        self, tmpfile, external_id: str, node_parser: NodeParser, embedding: HuggingFaceEmbedding
    ):
        dir_reader = SimpleDirectoryReader(input_files=[tmpfile])
        docs = dir_reader.load_data()
        for doc in docs:
            doc.metadata["external_id"] = external_id

        nodes = node_parser.get_nodes_from_documents(docs)

        for node in nodes:
            node_embeddings = embedding.get_text_embedding(
                node.get_content(metadata_mode="all")
            )
            node.embedding = node_embeddings

        res = self.add_nodes(nodes)
        print("Result from add nodes: " + str(res))
        return res

22 comments

ddenen99

Is it possible to store a

Is it possible to store a DocumentSummaryIndex in a chroma Vector store? VectorStoreIndex has a .from_vector_store() method but documentSummaryIndex does not. When trying to do load_index_from_storage(storage_context=storage_context, service_context=service_context) i get an error about no persist_dir as follows ValueError: No index in storage context, check if you specified the right persist_dir. and requires me to .persist() the documentSummaryIndex to a file.

here is how i load the index:

Plain Text

 
    db = chromadb.PersistentClient(path="./chroma_db")
    chroma_collection = db.get_or_create_collection("test")
    vector_store = ChromaVectorStore(chroma_collection=chroma_collection, persist_dir="./chroma_db")
    storage_context = StorageContext.from_defaults(vector_store=vector_store, persist_dir="./chroma_db")
    service_context = ServiceContext.from_defaults(
        llm=chatgpt, 
        transformations=extractors,
        embed_model=embedding, 
        system_prompt=system_prompt)

    doc_summary_index = DocumentSummaryIndex.from_documents(documents=docs, 
                                                            storage_context=storage_context,
                                                            service_context=service_context, 
                                                            show_progress=True)
    doc_summary_index.storage_context.persist(persist_dir="./chroma_db")

And then loading it back after

Plain Text

 doc_summary_index = load_index_from_storage(storage_context=storage_context, service_context=service_context)

    query_engine = doc_summary_index.as_query_engine(
        response_mode="tree_summarize", use_async=True, service_context=service_context
    )

7 comments

ddenen99

Getting a very weird AssertionError with

Getting a very weird AssertionError with no info when trying to query with chat engine (using Bedrock, running on Lambda). Perms are fine i triple checked but anyone ever see this before ?

  vector_index = VectorStoreIndex.from_vector_store(
        vector_store=store.vector_store, service_context=service_context
    )

    #Create the chat engine 
    chat_engine = vector_index.as_chat_engine(**chat_engine_params)

Then
response = chat_engine.chat(query) returns AssertionError() and nothing else.

Any ideas?

25 comments

Find answers from the community

T9ne counting

I am trying to see if there is a way to

Metadata

Anyone know why the JSONReader at https

I am running into a weird issue when

i am looking to see how i can increase

Is it possible to store a

Getting a very weird AssertionError with