Find answers from the community

S
ShubyZ
Offline, last seen 3 weeks ago
Joined December 24, 2024
Hello, I'm using DocumentSummaryIndex to create summaries of all the documents that I have. I have the code that uses an extended BaseExtractor class that would create extra metadata and also have the code that would loop through and append the metadata.

However, I ran into some code that showed that this may be done by sending an extractor into the DocumentSummaryIndex builder
summary_index = DocumentSummaryIndex.from_documents( documents=documents, transformations=[splitter], response_synthesizer = response_synthesizer, extractors=[sentiment_extractor], )
The above code doesn't work; the extractor is never called. Is there a way of doing the above? It's cleaner than the approach I currently have
1 comment
W
Help: I'm trying to use an Extractor as so:

class SentimentExtractor(BaseExtractor): def __init__(self, me, str): print("here") self.me = me self.str = str async def aextract(self, nodes: Sequence) -> List[Dict]: metadata_list = [] for node in nodes: generated_sentiment = {"sentiment": "Positive"} # Replace with actual LLM call metadata_list.append(generated_sentiment) return metadata_list

when I instanciate the class
sentiment_extractor = SentimentExtractor(me = "zsdfsadf", str="hello")

I get errors:
ValueError: "SentimentExtractor" object has no field "me"

If I define my own BaseExtractor just as a replacement, I don't get the error

I'm trying to actually pass in an LLM and custom prompt to extract the sentiment out of nodes and add to the metadata but seem to be running into something odd. Anything with dependencies?
5 comments
S
L
Hello, I have a my main index built. I created a query engine and query and then checked on the retrieved nodes in from the response object. I thought the nodes would give me the list of nodes used to build the response. What it looks like is that it’s a similar number to top_k that I setup. So whatever top_k settings I have turns out to be the number of nodes even if the query has nothing to do with the contents of the node. Any help in how to only get the nodes that were used to build the response?
6 comments
S
W
L