from llama_index.core import ( VectorStoreIndex, SimpleKeywordTableIndex, SimpleDirectoryReader, pprint_response )
ImportError Traceback (most recent call last) <ipython-input-14-25efe78ab2a7> in <cell line: 23>() 21 22 from pprint import pprint ---> 23 from llama_index.core import ( 24 VectorStoreIndex, 25 SimpleKeywordTableIndex, ImportError: cannot import name 'pprint_response' from 'llama_index.core' (/usr/local/lib/python3.10/dist-packages/llama_index/core/__init__.py) --------------------------------------------------------------------------- NOTE: If your import is failing due to a missing package, you can manually install dependencies using either !pip or !apt. To view examples of installing some common dependencies, click the "Open Examples" button below. ---------------------------------------------------------------------------
pprint_response
is not a valid import from llama_index.core
. Based on the knowledge sources provided, it seems like pprint_response
is not a part of the LlamaIndex library.llama_index.core
that you can use:from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader, StorageContext, )
from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader, load_index_from_storage, StorageContext, )
from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader, StorageContext, SummaryIndex )
from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader, StorageContext, SimpleKeywordTableIndex, SummaryIndex )
display_response
function from llama_index.core.response.notebook_utils
:from llama_index.core.response.notebook_utils import display_response
pprint_response
available in the LlamaIndex library. It seems like there might be a misunderstanding or a typo.print
function in Python, or the display_response
function from llama_index.core.response.notebook_utils
if it's available in your version of LlamaIndex:from llama_index.core.response.notebook_utils import display_response
response = query_engine.query("Your query here") display_response(response)
BaseRetriever
, VectorIndexRetriever
, and KGTableRetriever
from the LlamaIndex library, you can use the following code:from llama_index.core.retrievers import BaseRetriever, VectorIndexRetriever
KGTableRetriever
is not directly importable from llama_index.core.retrievers
. It's mentioned in the Retriever Modes () section of the LlamaIndex documentation, but no specific import path is provided.from llama_index.core.extractors import ( SummaryExtractor, QuestionsAnsweredExtractor, TitleExtractor, KeywordExtractor, BaseExtractor )
from llama_index.core.extractors import ( SummaryExtractor, QuestionsAnsweredExtractor, TitleExtractor, KeywordExtractor, BaseExtractor )
SummaryExtractor
, QuestionsAnsweredExtractor
, TitleExtractor
, KeywordExtractor
, and BaseExtractor
classes from the llama_index.core.extractors
module. These classes are used to define metadata extractors in LlamaIndex. SummaryExtractor
is used to automatically extract a summary over a set of Nodes, QuestionsAnsweredExtractor
is used to extract a set of questions that each Node can answer, and so on. entity_extractor = EntityExtractor( prediction_threshold=0.5, label_entities=False, # include the entity label in the metadata (can be erroneous) device="cpu", # set to "cuda" if you have a GPU ) extractors = [ TitleExtractor(nodes=5, llm=llm), QuestionsAnsweredExtractor(questions=3, llm=llm), SummaryExtractor(summaries=["prev", "self"]), KeywordExtractor(keywords=10), EntityExtractor(prediction_threshold=0.5) # KeywordExtractor(keywords=10, llm=llm), # CustomExtractor() ] transformations = [[text_splitter] + extractors , entity_extractor]
entity_extractor
twice. Once at the beginning and once in the extractors
list. You should only define it once.SummaryExtractor
and KeywordExtractor
in the extractors
list are missing the llm
argument which is required according to the provided knowledge sources.transformations
list should be a flat list of transformations, not a nested list.entity_extractor = EntityExtractor( prediction_threshold=0.5, label_entities=False, # include the entity label in the metadata (can be erroneous) device="cpu", # set to "cuda" if you have a GPU ) extractors = [ TitleExtractor(nodes=5, llm=llm), QuestionsAnsweredExtractor(questions=3, llm=llm), SummaryExtractor(summaries=["prev", "self"], llm=llm), KeywordExtractor(keywords=10, llm=llm), entity_extractor ] transformations = [text_splitter] + extractors
text_splitter
and llm
before using them in this code. You can find more details in the LlamaIndex Documentation () and GitHub Repository ().class CustomExtractor(BaseExtractor): def extract(self, nodes): metadata_list = [ { "custom": ( node.metadata["document_title"] + "\n" + node.metadata["excerpt_keywords"] ) } for node in nodes ] return metadata_list
CustomExtractor
within the transformations
, you first need to define the CustomExtractor
class. Then, you can add an instance of this class to the extractors
list. Here's how you can do it:class CustomExtractor(BaseExtractor): def extract(self, nodes): metadata_list = [ { "custom": ( node.metadata["document_title"] + "\n" + node.metadata["excerpt_keywords"] ) } for node in nodes ] return metadata_list # Define your other extractors here... extractors = [ TitleExtractor(nodes=5, llm=llm), QuestionsAnsweredExtractor(questions=3, llm=llm), SummaryExtractor(summaries=["prev", "self"], llm=llm), KeywordExtractor(keywords=10, llm=llm), entity_extractor, CustomExtractor() # Add the custom extractor here ] transformations = [text_splitter] + extractors
CustomExtractor
class, which inherits from BaseExtractor
. The extract
method of CustomExtractor
takes a list of nodes and returns a list of dictionaries, where each dictionary contains a "custom" key with the value being a combination of the node's document title and excerpt keywords.CustomExtractor
is added to the extractors
list. Finally, the extractors
list is combined with text_splitter
to form the transformations
list.text_splitter
, llm
, and entity_extractor
before using them in this code. You can find more details in the LlamaIndex Documentation () and GitHub Repository ().TypeError Traceback (most recent call last) <ipython-input-47-ee3786c0ffe9> in <cell line: 41>() 45 entity_extractor, 46 # KeywordExtractor(keywords=10, llm=llm), ---> 47 CustomExtractor() 48 ] 49 TypeError: Can't instantiate abstract class CustomExtractor with abstract method aextract
CustomExtractor
class you've defined is missing the implementation of the aextract
method. The aextract
method is an abstract method in the BaseExtractor
class, which means any class that inherits from BaseExtractor
must provide an implementation for this method.CustomExtractor
class to include an aextract
method:class CustomExtractor(BaseExtractor): def extract(self, nodes): metadata_list = [ { "custom": ( node.metadata["document_title"] + "\n" + node.metadata["excerpt_keywords"] ) } for node in nodes ] return metadata_list async def aextract(self, nodes): # Asynchronous version of the extract method return self.extract(nodes)
aextract
method is simply calling the extract
method. If your extract
method involves IO-bound tasks (like making network requests or reading from the file system), you should make it asynchronous to improve performance.LLMQuestionGenerator
is used in LlamaIndex to generate questions from your data. Here's a basic example of how to use it:from llama_index.core.llama_dataset.generator import LLMQuestionGenerator from llama_index.llms.openai import OpenAI # create llm llm = OpenAI(model="gpt-4", temperature=0.0) # define generator question_gen = LLMQuestionGenerator.from_defaults(llm=llm) # use the generator to generate questions questions = question_gen.generate(tools=tools, query=QueryBundle("Your query here"))
LLMQuestionGenerator.from_defaults(llm=llm)
is used to create an instance of LLMQuestionGenerator
with the specified language model (llm
). The generate
method is then used to generate questions.tools
and QueryBundle
before using them in this code. The tools
is a list of ToolMetadata
instances, and QueryBundle
is a class used to bundle a query with its context.SubQuestionQueryEngine
is a component of the LlamaIndex library that allows you to decompose a complex query into subqueries, each answered by an individual vector index, and then synthesize the results to answer the overall query.from llama_index.core.tools import QueryEngineTool, ToolMetadata from llama_index.core.query_engine import SubQuestionQueryEngine from llama_index.llms.openai import OpenAI # Define a QueryEngineTool for each vector index individual_query_engine_tools = [ QueryEngineTool( query_engine=index_set[year].as_query_engine(), metadata=ToolMetadata( name=f"vector_index_{year}", description=f"useful for when you want to answer queries about the {year} SEC 10-K for Uber", ), ) for year in years ] # Create the SubQuestionQueryEngine query_engine = SubQuestionQueryEngine.from_defaults( query_engine_tools=individual_query_engine_tools, llm=OpenAI(model="gpt-3.5-turbo"), )
QueryEngineTool
is used to define a tool for each vector index. Each tool has a name and a description, which the LLM agent uses to decide which tool to choose. The SubQuestionQueryEngine
is then created using the from_defaults
method, passing in the individual_query_engine_tools
and an llm
that will be used to run the subqueries.index_set
and years
before using them in this code. The index_set
is a dictionary where the keys are years and the values are the corresponding vector indices. The years
is a list of years for which you have data.from llama_index import ListIndex, ServiceContext service_context = ServiceContext.from_defaults(chunk_size=1024) nodes = service_context.node_parser.get_nodes_from_documents(andrew_gina_docs) list_index = ListIndex(nodes)
docs = SimpleDirectoryReader(input_files=["."], filename_as_id=True).load_data() vector_index = VectorStoreIndex.from_documents(docs)
ListIndex
with your current code, you first need to create a ServiceContext
and use its node_parser
to convert your documents into nodes. Then, you can create a ListIndex
from these nodes. Here's how you can do it:from llama_index import ListIndex, ServiceContext, SimpleDirectoryReader, VectorStoreIndex # Load documents docs = SimpleDirectoryReader(input_files=["."], filename_as_id=True).load_data() # Create a VectorStoreIndex vector_index = VectorStoreIndex.from_documents(docs) # Create a ServiceContext service_context = ServiceContext.from_defaults(chunk_size=1024) # Convert documents into nodes nodes = service_context.node_parser.get_nodes_from_documents(docs) # Create a ListIndex list_index = ListIndex(nodes)
ServiceContext.from_defaults(chunk_size=1024)
is used to create a ServiceContext
with a chunk size of 1024. The node_parser
of the ServiceContext
is then used to convert the documents into nodes. Finally, a ListIndex
is created from these nodes.llama_index.core
module in LlamaIndex contains the core components of the library, such as VectorStoreIndex
, QueryEngine
, and ServiceContext
. Here's an example of how to use these components to load data, create an index, and query the index:from llama_index.core import VectorStoreIndex, ServiceContext, SimpleDirectoryReader # Load documents docs = SimpleDirectoryReader(input_files=["."], filename_as_id=True).load_data() # Create a VectorStoreIndex vector_index = VectorStoreIndex.from_documents(docs) # Create a ServiceContext service_context = ServiceContext.from_defaults(chunk_size=1024) # Create a QueryEngine query_engine = vector_index.as_query_engine(service_context=service_context) # Query the index response = query_engine.query("Your query here")
SimpleDirectoryReader
is used to load the documents from the current directory. VectorStoreIndex.from_documents(docs)
is then used to create a VectorStoreIndex
from these documents. ServiceContext.from_defaults(chunk_size=1024)
is used to create a ServiceContext
with a chunk size of 1024. vector_index.as_query_engine(service_context=service_context)
is used to create a QueryEngine
from the VectorStoreIndex
and ServiceContext
. Finally, query_engine.query("Your query here")
is used to query the index.