Find answers from the community

T
Tsovak
Offline, last seen 3 months ago
Joined September 25, 2024
do you have any idea why _CBEventType.SUB_QUESTION is not called?
the code in the thread

Plain Text
**********
Trace: chat
    |_CBEventType.AGENT_STEP -> 5.045546 seconds
      |_CBEventType.LLM -> 0.980289 seconds
      |_CBEventType.FUNCTION_CALL -> 3.450443 seconds
      |_CBEventType.LLM -> 0.0 seconds
**********
6 comments
T
L
can you post to all channels? otherwise, we might miss it.
3 comments
L
T
I'm trying to run https://github.com/run-llama/sec-insights locally but got an error

I was following the readme and last command make seed_db_local that should download the dataset fails.
it downloads files into temp folder.
Plain Text
Downloading SEC filings
Downloading filings to "/var/folders/nx/4fnfp6zd20d7y9zvtg_rs89m0000gn/T/tmpxcqrq4rb"
File Types: ['10-K']

next Copying downloaded SEC filings to S3
Plain Text
Seeding storage with DB documents:   0%|                                                                                                      | 0/15 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/xxxxx/repo/github/examples/sec-insights/backend/app/chat/engine.py", line 154, in build_doc_id_to_index_map
    indices = load_indices_from_storage(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/anaconda3/envs/sec-insights/lib/python3.11/site-packages/llama_index/indices/loading.py", line 71, in load_indices_from_storage
    raise ValueError(f"Failed to load index with ID {index_id}")
ValueError: Failed to load index with ID ab1dad14-a061-4f25-a9b6-88d02d649a36


Has anybody managed to fix it?
p.s. full output in the thread
2 comments
W
T
T
Tsovak
·

```

Plain Text
client = QdrantClient()
vector_store = QdrantVectorStore(client=client, collection_name="my_collection", batch_size=1)

 self.pipeline = IngestionPipeline(
            transformations=[
                SentenceSplitter(chunk_size=Settings.chunk_size, chunk_overlap=Settings.chunk_overlap),
                TitleExtractor(),
                SummaryExtractor(metadata_mode=MetadataMode.ALL, summaries=["prev", "self", "next"]),
                QuestionsAnsweredExtractor(),
                KeywordExtractor(),
                Settings.embed_model,
            ],
            vector_store=vector_store,
            docstore=SimpleDocumentStore(),  # An in-memory store for Document and Node objects
            # by default Local Cache is used, but also can be RedisCache, MongoDBCache, FirestoreCache
            # This saves time on subsequent runs that use the same data
            # cache=IngestionCache(),
        )

# ingest directly into a vector db
self.nodes = self.pipeline.run(documents=self.load_my_pdf())
# create your index
self.index = VectorStoreIndex(nodes=self.nodes, show_progress=True)


I use qdrant, Pipeline extracts most of the metadata that is possible to extract. but the query_engine doesn't consider a piece of metadata information at all.
I mean when User asks a question it doesn't take something from metadata. for example,
User: some question
Agent: some answer
User: what is a file name or document name?
Agent: A file_name is a placeholder for the actual name of a file

what am I doing wrong?
7 comments
L
T
do you have an example chat-bot system with message history, conversations, and internal documents?
in every example what I saw is just 2 lines of code that everybody copy-pasted from each other.
I want to understand how a smart people design a database for persisting conversations, users, how they combine a chat history to ask a question on top of a document.
1 comment
T
I use index.as_chat_engine and memory. how to get sources that have been used?
Plain Text
streaming_response = as_chat_engine_var.stream_chat()

streaming_response.sources is en empty
13 comments
L
T
T