Find answers from the community

Updated 10 months ago

Question regarding JSON LOADER / WHISPER

Question regarding JSON LOADER / WHISPER TRANSCRIPTS

The goal may be something as simple as retrieving segments (chunks) of the interview/transcript based on thematic codes (Drivers, Solutions, Barriers), summarization tasks, or additional Q&A.

The question revolves around the best doc loading / chunking / embedding strategy given the structure of whisper transcripts. If one wanted to maintain metadata in the doc (segment) level such as "speaker" "confidence" "timestamps", how would one then structure the chunks and embedding to maintain semantic cohesion?

IE we may have 15 lines from a 5000 line interview (whisper JSON file) which should be grouped together:
...
[
Speaker1: Asking a question
Speaker1: Continuing the same question
Speaker1: filler word
Speaker2: Asks clarifying question
Speaker 1: quick answer:
Speaker2: begins answering...
Speaker2: Continues....
Speaker2: Continues...
Speaker1: interrupts with quick clarifier
Speaker 2: continues...
(end of answer)
]
...

What are some methods to employ to isolate these high level question answer pairs from a whisper transcript? How can the JSON loader be employed, or are there best practices around whisper transcript RAG in general?

Thanks πŸ™‚πŸ‘€
L
c
5 comments
To isolate question/answer pairs, you basically need to use an LLM to process the transcript I think
Maybe I'm an idiot, but I'm having a hard ass time figuring out how nodes / documents can be built without using the abstracted "loaders" and "parsers" and the loaders and parsers are not working the way I'd like them to for my use case.
Plain Text
from llama_index.core.schema import TextNode

node = TextNode(text="text", metadata={"key": "val"})
Documents and nodes have basically the same API, just a semantic difference in what they represent
Add a reply
Sign up and join the conversation on Discord