What’s In A Datom?
Abstract
Datomic adopted the
datom as the
fundamental unit of data. Playing with this notion, we observe that
different communication contexts call for slightly different types of
datoms.
Recall the humble datom.
[e a v tx added?]
;; e - entity
;; a - attribute
;; v - value
;; tx - transaction id
;; added? - flag indicating addition / retraction
Establishing a denotational semantic domain like that is great fun, because it invites us to look at each constituent individually and to consider what other things could reasonably take its place, and what the resulting thing would mean.
Entities
According to the Datomic
glossary, e
is “the first component of a datom, specifying who or what the datom
is about”. Within a system, identifying entities by positive integers
(eids) should’nt usually leave much to be desired.
The question of what to put in the e
-slot becomes more interesting,
once we consider communcation across system boundaries. Separate
systems might not use the same identification scheme. Even if they do,
systems need to coordinate the assignment of identifiers, such as to
avoid collisions.
A common example of this arises in communcation between a web-application and a server. Here negative eids might indicate entities that have been created on the client, but are not yet known to the server or other clients. Alternatively, clients might make use of a UUID scheme, in order to avoid coordination entirely.
We can therefore add the new shape [uuid a v tx added?]
to our
collection, for use in communication between separate eid domains.
Attributes and Values
At least in the Clojure world, strong, global names (in the form of
fully qualified keywords) are found in the a
-slot. This should be
considered a great blessing and display of wisdom and kindness. It
will take a much less biased mind to even consider other things to
fill their place.
Similarly, values (numbers, strings, booleans, maybe instants, etc…) are well understood and liked. We do not mess around with those.
Time
To Clojure’s and Datomic’s eternal credit, immutable values and reified, logical system time are well established in our community. It took an outsider to teach me that timestamps can be so much more still. In particular, timestamps most certainly don’t have to be scalars and allow us to talk about multiple axes of time, manage speculative multi-user computations, and work with heterogeneous data sources.
We will therefore write the more general t
for timestamp, when
talking about datoms.
Multiplicity
Datomic has set semantics. Consequently, datoms can only have one out
of two possible multiplicities: 0 and 1. At any given logical point in
time, an additive datom allows us to go from 0 to 1, a retractive one
allows us to go from 1 to 0. The added?
-slot of a datom allows us to
indicate whether it is meant to be additive or retractive.
Datomic comes with two transaction functions to create additive and
retractive datoms respectively: :db/add
and :db/retract
. If we
allow for a slight reformulation here, we could imagine the group
(#{0 1} add)
to govern the addition and retraction of datoms under
set semantics:
;; sets
'(#{0 1} add)
(= (add 0 0) 0)
(= (add 0 1) (add 1 0) 1)
(= (add 1 1) 0)
Written like this, we are naturally led to ask whether other groups could reasonably take its place?
;; multisets
'(integer? +)
(= (add 0 0) 0)
(= (add 0 1) (add 1 0) 1)
(= (add 1 1) 2)
(= (add -1 1) (add 1 -1) 0)
;; ...
;; probabilities
'([0..1] ???)
Again we might look at other
systems for
some inspiration. In any case we might want to write the more general
diff
(difference, as in change in multiplicity) in place of
added?
, when thinking about what a datom can be.
Intent
Datoms do not record intent, they record facts. For completeness sake, we note that some communication requires preservation of intent. In particular, whenever we talk about a source-of-truth, we are referring to a system that has access to user intent and the authority to impose its interpretation. Other times, preservation of intent is outright dangerous, because it allows for diverging interpretations to creep in.
Datomic’s transaction data is a representation that preserves intent:
;; intent
[data-fn args*]
Datomic is therefore designed to act as a source-of-truth, because it transforms intent-aware inputs (which require interpretation in the form of transaction functions) into intent-less datoms. 3DF, in contrast, doesn’t know anything about the correct interpretation of user intent, and therefore expects datoms as input.
When replicating / propagating information, we want to be careful to only ever send intent-less data. As soon as more than one system has authority to impose interpretation, we are playing the game of distributed consensus, which is not a fun game at all.
Summary
We have seen a number of generalizations and tweaks to the humble datom. Let’s list the most important categories again and give them names.
[data-fn args*] ;; intent-preserving novelty
[e a v t diff] ;; controlled novelty
[uuid a v diff] ;; open novelty
Intent-preserving novelty is uniquely suited to record all user interactions with your system. All other forms of novelty should be derivable from a stored representation of user intent.
Systems that share entity identification and timestamping schemes can communicate controlled novelty amongst each other. Most commonly, these are database peers partaking in the replication of transactions.
Finally, some systems are interested in information exchange without a shared notion of entity ids and timestamps. Consider again a web-application that at any point in time maintains some local-only entities, some shared with server A, and some with server B. Communication must happen via open novelty now. In a fully peer-to-peer setting, intent-preserving novelty can be used in combination with a UUID scheme.
This concludes our little bestiary.