Find answers from the community

Updated 2 months ago

Hi, I am new to llamaindex and I'm

Hi, I am new to llamaindex and I'm trying to extract metadata of each node at the moment. I'm following tutorial in the documents and is using CustomExtractor as instruction. However, it had an error like this. I have searched solutions on the internet, but there isn't anything helpful. Please help me to solve this.
Attachment
image_2023-12-13_180041885.png
W
L
J
7 comments
Hi can you show the CustomExtractor code ?

Also are you following this tutorial: https://docs.llamaindex.ai/en/stable/examples/metadata_extraction/MetadataExtractionSEC.html
Ah we swapped the base class to be async first

Need to update that example

Also implement the aextract function, but just have it call self.extract() if there's nothing async about your code

https://github.com/run-llama/llama_index/blob/fadef5f31ef6acd9b39b72103931b5eb62f98585/llama_index/extractors/interface.py#L74
Yes, I'm following that tutorial step by step and find this error. CustomExtractor class is exactly as the same as in the tutorial.
Thank you @Logan M . I will try your suggestion. Also look forward to an update in that example.
HI @WhiteFang_Jr here is my code and its type error.
# read documents documents = SimpleDirectoryReader(input_files=["data/impact-of-large-language-models-in-business--09:10:2023.txt"]).load_data() # define llm llm = OpenAI(model="gpt-3.5-turbo", temperature=0) # define text splitter text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=128) class CustomExtractor(BaseExtractor): def extract(self, nodes): metadata_list = [ { "custom": ( node.metadata["document_title"] + "\n" + node.metadata["excerpt_keywords"] ) } for node in nodes ] return metadata_list extractors = [ TitleExtractor(nodes=5, llm=llm), QuestionsAnsweredExtractor(questions=3, llm=llm), SummaryExtractor(summaries=["prev", "self"], llm=llm), KeywordExtractor(keywords=10, llm=llm), CustomExtractor() ] transformations = [text_splitter] + extractors pipeline = IngestionPipeline(transformations=transformations) nodes = pipeline.run(documents=documents, show_progress=True)
Attachment
image_2023-12-15_104721859.png
Hey!
Did you try @Logan M suggestion?


Plain Text
class CustomExtractor(BaseExtractor):
    async def aextract(self, nodes: Sequence[BaseNode]) -> List[Dict]:
        """Extracts metadata for a sequence of nodes, returning a list of
        metadata dictionaries corresponding to each node.

        Args:
            nodes (Sequence[Document]): nodes to extract metadata from

        """
        return self.extract(nodes)

    def extract(self, nodes):
        metadata_list = [
            {
                "custom": (
                    node.metadata["document_title"]
                    + "\n"
                    + node.metadata["excerpt_keywords"]
                )
            }
            for node in nodes
        ]
        return metadata_list
yes, it worked. You're my lucky charm haha 🥰
Add a reply
Sign up and join the conversation on Discord