Find answers from the community

Home
Members
jayjoshy
j
jayjoshy
Offline, last seen 3 months ago
Joined September 25, 2024
I'm getting some weird behaviour from SimpleDirectoryReader() with llamaparse and wondering if it's intentional. When I load just one file I am ending up with multiple document objects.

Plain Text
parser = LlamaParse(
    result_type="markdown",
    verbose=True,
)
file_extractor = {".pdf": parser}
document = SimpleDirectoryReader(
  input_files=[pdf_path], # pdf_path is ONE file path. ie. './easy_data/example_file.pdf'
  file_extractor=file_extractor,
  filename_as_id=True,
).load_data(show_progress=True)

however, when I run len(document) i am getting a number > 1, which doesn't make sense. Any ideas what's going on?
3 comments
L
j
jayjoshy
·

Nodes

Hi, question about construction of a property graph index. Right now, the docs mention only this:
Plain Text
index = PropertyGraphIndex.from_documents(
    documents,
    kg_extractors=[extractor1, extractor2, ...],
)

With the normal vector store index I can generate the nodes first (ie. per document)
Plain Text
nodes_A = pipeline.run(documents=[DocumentOne])
nodes_B = pipeline.run(documents=[DocumentTwo])
index = VectorStoreIndex(nodes_A + nodes_B)

Is there an analogous way to do the same thing with the property graph index? where you can do it document by document before constructing the full index?
1 comment
L
I am using Ollama embeddings and was wondering if there are fields for query_instruction similar to HuggingFaceEmbedding:
Plain Text
    EMBED_MODEL = OllamaEmbedding(
        model_name=EMBED_MODEL_NAME, # mxbai-embed-large:latest
        base_url="http://localhost:11435",
        ollama_additional_kwargs={"mirostat": 0},
        # ? add query_instruction?
    )

for mxbai-embed-large:latest, it asks for: "Represent this sentence for searching relevant passages:" for the queries.

if there isn't a built in field, do I just need to manually add the prompt to the queries at retreival time?
Plain Text
retriever.retrieve(f"Represent this sentence for searching relevant passages: {query}")
11 comments
F
L
I have a FastAPI app running on a remote server (with a better GPU than my personal computer) that returns a pydantic model of the form:
Plain Text
class PipelineResults(BaseModel):
    nodes: List[BaseNode]


The API returns a serialized form of PipelineResults, and in my PC script:
Plain Text
response = requests.post(pipeline_url, json=json_data.model_dump(), headers=headers) # response = serialized PipelineResults
result = PipelineResults.model_validate(response.json()) # raises err

I get the error:
TypeError: Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content

Which I assume is because the BaseNode class doesn't have a method to deserialize the dict. Is there any solution for this?
Thanks all, have loved using llamaindex so far 🙂
3 comments
j
L