What seems strange to me @Logan M , is that the SchemaLLMPathExtractor is hallucinating deliberately, introducing invented nodes that have no relation to the documents.
From two documents, it introduces 3 entities, and on top of that, they have nothing to do with the content. What could I be doing wrong?
entities = Literal["MISSION", "ORGANIZATION", "VEHICLE", "TECHNOLOGY", "EVENT", "LOCATION"]
relations = Literal["CONDUCTS", "DEVELOPS", "OCCURS_AT", "IMPLEMENTS", "PARTICIPATES_IN", "EXPLORES", "HIGHLIGHTS", "SPONSORS", "TARGETS", "CELEBRATES", "HOSTS"]
schema = [
('ORGANIZATION', 'CONDUCTS', 'MISSION'),
('ORGANIZATION', 'DEVELOPS', 'TECHNOLOGY'),
('EVENT', 'OCCURS_AT', 'LOCATION'),
('MISSION', 'IMPLEMENTS', 'TECHNOLOGY'),
('ORGANIZATION', 'PARTICIPATES_IN', 'EVENT'),
('MISSION', 'EXPLORES', 'LOCATION'),
('EVENT', 'HIGHLIGHTS', 'TECHNOLOGY'),
('ORGANIZATION', 'SPONSORS', 'EVENT'),
('MISSION', 'TARGETS', 'LOCATION'),
('EVENT', 'CELEBRATES', 'MISSION'),
('LOCATION', 'HOSTS', 'EVENT')
]
extract_prompt="""
You are a data scientist working for a company that is interested in understanding the relationship between different entities in a knowledge graph.
You have been tasked with extracting the relationship between entities in a knowledge graph.
IMPORTANT: All entities must be extracted from the documents provided.
"""
kg_extractor = SchemaLLMPathExtractor(
llm= llm,
num_workers=4,
max_triplets_per_chunk=10,
possible_entities=entities,
possible_relations=relations,
kg_validation_schema=schema,
strict=True,
extract_prompt= PromptTemplate(extract_prompt)
)