Find answers from the community

Updated 2 years ago

Novel Knowledge Graph implementation: use delta CRDTs as hypernodes

At a glance

The community member proposes a delta-CRDT based hypergraph where the canonical units represent change, not state. The idea is to encode full bi- or poly-directional relations using deltas, rather than mutating values on different objects. This "rhizomatic datastore" allows filtering and materializing the deltas into various representations. The community member suggests using a vector store to normalize relation and backpointer names, and using GraphQL for querying.

The comments discuss the potential challenges, such as reliably extracting entities and relationships, and the need for an ontology. One community member is working on a similar problem and has used Pinecone as a vector store to deduplicate extracted concepts, but notes this is not the full rhizomatic structure described in the post.

I have thoughts about knowledge graphs, I wonder if someone is willing to sanity check this:

  1. A delta-CRDT based hypergraph where the canonical units represent change, not state. A "delta" here establishes a relationship with specific semantics as of a point in time.
Plain Text
{
  id: 1,
  timestamp: ...,
  creator: user_1,
  relations: {
    employer: { reference: acme_inc, backpointer: employees },
    employee: { reference: joe_smith, backpointer: employer }
  }
}


  1. This is more complex than a basic triple, but not by much. The idea is that you are encoding a full bi- (or poly-)directional relation rather than mutating values on different objects. You just accumulate (and contextually replicate) deltas. Want to negate a given relationship? Don't delete it, append a new delta with the 'negates' relation targeting the old one, etc.
  1. State, then, becomes a question of filtering down to the deltas you care about and then reducing them into a representation that applies to your use case.
  1. I call this a rhizomatic datastore, because the flat append-only list of deltas can be filtered and materialized into any number of representations. The rhizome is the set of all deltas - but then you can maintain stateful views of the rhizome separately and trivially patch them with applicable deltas as they come in.
  1. In an agent model, I'm thinking I'd use a vector store to normalize relation and backpointer names. For instance, if I say 'joe smith is employed by acme' or 'joe smith works for acme' or 'joe smith's job is over at acme' those should all resolve to the same delta, which either reuses existing relation and backpointer names from the vector store or creates new ones.
  1. Querying this whole thing requires a root node and a schema to apply. I'm actually thinking graphql would probably be the simplest way to go.
Putting it all together, if I'm talking to my AI agent about Joe Smith and his job, it doesn't need to pull in everything it knows about Joe Smith. We can create a schema that represents an EmployedPerson, which filters down to a limited set of relations. Those relations are then used to select applicable deltas, which get grouped together and injected into the prompt in a concise way.

I feel like this approach gives us a lot more flexibility than simply using 'name' attributes on 'entity' and 'relationship' objects, you know? It also encodes temporality and provenance for every piece of information we have access to.

I'm in a position to potentially implement a proof of concept, and I'm working on a writeup to justify it. I'm posting here to ask y'all for your thoughts.
m
L
n
10 comments
Would love any feedback from anyone to whom the above makes sense. What this give us, ultimately, is the atoms of information, which are innately relational.
hmm, it definitely sounds cool. But in the context of asking an LLM a question and doing retrieval, I wonder how exactly the filtering would happen

You'd also need your own version of text-2-sql πŸ€”
So for instance, I tell an AI that Joe got a new job working for EvilCorp.

The pre-query code would parse that to extract entities and relationships, and then it would query against a vector store to see if we have any pre-defined relation or backpointer names that map to what we're learning. In this case, we'd pull up "employer" and "employee". Then we can isolate the deltas that use those relations to refer to Joe, which will surface the example delta above. That would be serialized into the prompt. Your AI call could look something like:

Plain Text
Prior knowledge:
  As of <timestamp>, Joe works for Acme.

User statement:
  Joe got a new job working for EvilCorp!

Please generate any new deltas required to update the knowledge base with respect to this user statement.


For complex stuff you really want schemas, though. So like, an Employment schema could be a whole subgraph linking employer, employee, salary, reporting structure, job responsibility, and tracking job history over time, all in one compact summary.

There are open questions to address about how to best implement some of these vector components and the actual flow, but do you think there's a there there?
Tbh, I think the hardest part will be extracting entities and relationships reliably πŸ˜… You could use an LLM, but that's slow. You could use Rebel, but that has it's own limitations.

It also assumes you have some ontology built to work with already πŸ€”
(not trying to be pessimistic anything, just thinking about finer details lol)
Assuming the above two are solved, I think it makes sense
Yeah, the experimental part here is whether a vector store will work to normalize them in the way I want.

The ontology is a hypergraph where the hyper-nodes are deltas at points in time that point to the domain nodes, which literally only exist as keys whose properties are computed at query-time by reducing all referencing deltas.

I think this will work.
@myk I realise this is digging up an old thread but curious if you ended up going down this route. I currently have a similar problem I think. My ontology is already defined with 500+ classes. I'm trying to do something similar to what you described where i build an existing state in a graph then when a parsed user input is passed it will update any relevant relationships.

More specifically I want to extract knowledge from input using a defined large ontology to identify objects and relationships (structured/unstructured) and store it in a knowledge graph

Then if a user passes a new input that changes the relationship between two objects this will be updated.

If the ontology was smaller i could leverage ontology prompting for the extraction but thats not possible here.
I’m working on this now. Using pinecone as a vector store to deduplicate extracted concepts has worked in a toy example. Building out a more robust case now. See github.com/badass-courses/nerds-ai for my toy example.
This is not the full rhizomatic structure described above yet, this is just using vector store as a way to catch potential duplicate phrasing and converge on a canonical lexicon.
Add a reply
Sign up and join the conversation on Discord