jayjoshy

I'm getting some weird behaviour from

I'm getting some weird behaviour from SimpleDirectoryReader() with llamaparse and wondering if it's intentional. When I load just one file I am ending up with multiple document objects.

Plain Text

parser = LlamaParse(
    result_type="markdown",
    verbose=True,
)
file_extractor = {".pdf": parser}
document = SimpleDirectoryReader(
  input_files=[pdf_path], # pdf_path is ONE file path. ie. './easy_data/example_file.pdf'
  file_extractor=file_extractor,
  filename_as_id=True,
).load_data(show_progress=True)

however, when I run len(document) i am getting a number > 1, which doesn't make sense. Any ideas what's going on?

6 comments

jjayjoshy

Nodes

Hi, question about construction of a property graph index. Right now, the docs mention only this:

Plain Text

index = PropertyGraphIndex.from_documents(
    documents,
    kg_extractors=[extractor1, extractor2, ...],
)

With the normal vector store index I can generate the nodes first (ie. per document)

Plain Text

nodes_A = pipeline.run(documents=[DocumentOne])
nodes_B = pipeline.run(documents=[DocumentTwo])
index = VectorStoreIndex(nodes_A + nodes_B)

Is there an analogous way to do the same thing with the property graph index? where you can do it document by document before constructing the full index?

1 comment

jjayjoshy

I am using Ollama embeddings and was

I am using Ollama embeddings and was wondering if there are fields for query_instruction similar to HuggingFaceEmbedding:

Plain Text

    EMBED_MODEL = OllamaEmbedding(
        model_name=EMBED_MODEL_NAME, # mxbai-embed-large:latest
        base_url="http://localhost:11435",
        ollama_additional_kwargs={"mirostat": 0},
        # ? add query_instruction?
    )

for mxbai-embed-large:latest, it asks for: "Represent this sentence for searching relevant passages:" for the queries.

if there isn't a built in field, do I just need to manually add the prompt to the queries at retreival time?

Plain Text

retriever.retrieve(f"Represent this sentence for searching relevant passages: {query}")

11 comments

jjayjoshy

I have a FastAPI app running on a remote

I have a FastAPI app running on a remote server (with a better GPU than my personal computer) that returns a pydantic model of the form:

Plain Text

class PipelineResults(BaseModel):
    nodes: List[BaseNode]

The API returns a serialized form of PipelineResults, and in my PC script:

Plain Text

response = requests.post(pipeline_url, json=json_data.model_dump(), headers=headers) # response = serialized PipelineResults
result = PipelineResults.model_validate(response.json()) # raises err

I get the error:
TypeError: Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content

Which I assume is because the BaseNode class doesn't have a method to deserialize the dict. Is there any solution for this?
Thanks all, have loved using llamaindex so far 🙂

3 comments

Find answers from the community

I'm getting some weird behaviour from

Nodes

I am using Ollama embeddings and was

I have a FastAPI app running on a remote