26 June 2018

A Query Language

Abstract
Datomic’s pull queries can be a convenient, batteries-included alternative to GraphQL. We show how. Then we make up an excuse to write a slightly extended version of the pull language, using core.spec.

Declarative query languages have found new popularity as a means of serving the highly varying needs of an increasingly diverse set of clients (TVs and game consoles, smartphones of manifold size, shape, and capability, desktop computers, VR headsets, and even watches). GraphQL is probably the best known effort in this space (and the only one known to me for that matter, besides Netflix’s Falcor). Let’s have a look at a GraphQL expression.

{
  human(id: 1002) {
    name
    starships {
      name
      class
    }
  }
}

This will fetch the name of the human with id 1002, as well as their associated starships. For every associated starship, its name and ship class will be contained in the response as well. Note that both GraphQL as well as pull queries must specify a set of root entities, on which to resolve the expression (the set of humans in this case). Translating the above into Clojure, we could end up with something like

{:human [:human/name
         {:human/starships [:ship/name 
                            :ship/class]}]}

This is the tiniest of additions on top of plain pull queries. Instead of passing a query directly, clients now pass a map from root collections to pull queries. The server can simply fetch all entity ids in the specified root collection, and run d/pull-many.

So far so good, but we have lost the capability to filter the human collection by id, like we could in the GraphQL example. Expressing such a filter could be done very directly:

{(human {:db/id 1002}) [:human/name
                        {:human/starships [:ship/name
                                           :ship/class]}]}

But in order to keep things fresh and interesting, we will do it in a bit more general way, that resembles Datalog’s query clauses more closely:

{:human [[:db/id 1002]
         :human/name
         {:human/starships [:ship/name 
                            :ship/class]}]}

This even allows us to filter on nested relations as well, something that (to my knowledge) GraphQL does not support at the moment:

{:human [[:db/id 1002]
         :human/name
         {:human/starships [[:ship/name "Anubis"]
                            :ship/class]}]}

Parsing

We can use core.spec to define a grammar for this language and get a parser for free — a party trick I’ve picked up from a great talk by Stuart Sierra.

;; a rough grammar

(s/def ::query (s/map-of keyword? ::pattern))
(s/def ::pattern (s/coll-of ::attr-spec))
(s/def ::attr-spec (s/or :attribute ::attr-name
                         :clause ::clause
                         :expand ::map-spec))
(s/def ::attr-name keyword?)
(s/def ::clause (s/or :data-pattern ::data-pattern))
(s/def ::data-pattern (s/tuple ::attr-name ::constant))
(s/def ::constant (constantly true)) ;; @TODO
(s/def ::map-spec (s/and (s/map-of ::attr-name ::pattern)
                         #(= (count %) 1)))
                         
;; a 'parser'

(defn parse [query]
  (let [conformed (s/conform ::query query)]
    (if (s/invalid? conformed)
      (throw (ex-info "Couldn't parse query" (s/explain-data ::query query)))
      conformed)))

If we try this out in a REPL, we obtain the following parse tree:

(parse '{:human [[:db/id 1002]
                  :human/name
                  {:starships [[:ship/name "Anubis"]
                               :ship/class]}]})
                               
;; => {:human
;;     [[:clause [:data-pattern [:db/id 1002]]]
;;      [:attribute :human/name]
;;      [:expand
;;       {:starships
;;        [[:clause [:data-pattern [:ship/name "Anubis"]]]
;;         [:attribute :ship/class]]}]]})

Interpretation

As easy as it was, parsing is only half of the game. We now need to interpret expressions in our new language and execute them atop a Datomic database. In its simplest form, the interpreter will keep track of some interpretation context as it walks the parse tree. We can specify the interpretation of the various patterns supported by our language, using a multi-method.

(defmulti impl (fn [ctx node] (first node)))

Our interpretation context is made up of a Datomic database value, a set of datoms storing query results, and finally the set of entity id’s that we’re interested in. The latter is initialized with the set of root entities, as discussed above.

(defn pull [db pattern eids]
  (let [ctx {:db     db
             :datoms #{}
             :eids   eids}]
    (impl ctx [:pattern pattern])))

The simplest construct is the :pattern, which from the grammar above we remember as simply a collection of other constructs. Therefore the interpretation of a :pattern is precisely the combined interpretation of all of its constituent constructs.

(defmethod impl :pattern [ctx [_ specs]]
  (reduce impl ctx specs))

For :attribute constructs, we must fetch the desired attribute for all of the entities that the client is interested in. This can be done efficiently by accessing Datomic’s indices directly, using the d/datoms function. For this lookup, we will access the :aevt index, which keeps datoms sorted by the attribute they specify.

(defmethod impl :attribute [{:keys [db eids] :as ctx} [_ attr]]
  (let [datoms (into #{}
                     (filter (fn [datom] (contains? eids (.-e datom))))
                     (d/datoms db :aevt attr))]
    (update ctx :datoms set/union datoms)))

Expand constructs traverse relations. Consequently, we must fetch the related entities and pull them recursively. We will then add them alongside the resulting datoms into our interepretation context:

(defmethod impl :expand [{:keys [db eids] :as ctx} [_ map-spec]]
  (let [[attr pattern] (first map-spec)
        children       (into #{}
                             (filter (fn [datom] (contains? eids (.-e datom))))
                             (d/datoms db :aevt attr))
        children-eids  (into #{} (map :v) children)
        children-ctx   (pull db pattern children-eids)]
    (-> ctx
        (update :eids set/union (:eids children-ctx))
        (update :datoms set/union (into #{} (filter (fn [datom]
                                                      (contains? (:eids children-ctx) (.-v datom))) children)))
        (update :datoms set/union (:datoms children-ctx)))))

The final supported construct is the :clause. It filters the set of entities that the client is interested in, to include only those matching the specified value. Again we make use of direct index access via d/datoms. For some, indexed attributes, Datomic will maintain an additional index called :avet, which keeps datoms sorted by attribute and value. For all other attributes, we will have to do the filtering ourselves.

(defmethod impl :clause [{:keys [db eids] :as ctx} [_ clause]]
  (let [[_ data-pattern] clause
        [attr v]         data-pattern
        indexed?         (:indexed (d/attribute db attr))
        matching-datoms  (if indexed?
                           (into #{} (d/datoms db :avet attr v))
                           (into #{}
                                 (filter (fn [datom] (= (.-v datom) v)))
                                 (d/datoms db :aevt attr)))]
    (-> ctx
        (update :datoms set/union matching-datoms)
        (update :eids set/intersection (into #{} (map :e) matching-datoms)))))

Putting it all together, we first need a way of resolving the specified set of root entities…

(defn resolve-root [db root]
  (case root
    :human (set (d/q '[:find [?e ...] :where [?e :human/name _]] db))))

…as well as a way of extracting the resulting set of datoms from the final interpretation context and collecting them into a tree.

(defn resolve-query [db query]
  (->> query
       (parse)
       (reduce-kv
        (fn [result root pattern]
          (let [root-eids (resolve-root db root)
                ctx       (pull db pattern root-eids)
                eids      (:eids ctx)
                entities  (->> (:datoms ctx)
                               (reduce
                                (fn [tree datom]
                                  (if-not (contains? eids (.-e datom))
                                    tree
                                    (let [attr (d/attribute db (.-a datom))
                                          ref? (= (:value-type attr) :db.type/ref)]
                                      (if ref?
                                        (update-in tree [(.-e datom) (:ident attr)] conj (.-v datom))
                                        (assoc-in tree [(.-e datom) (:ident attr)] (.-v datom)))))) {}))
                hydrate   (fn [eid]
                            (->> (get entities eid)
                                 (reduce-kv
                                  (fn [entity a v]
                                    (let [attr (d/attribute db a)
                                          ref? (= (:value-type attr) :db.type/ref)]
                                      (if-not ref?
                                        (assoc entity a v)
                                        (if (coll? v)
                                          (assoc entity a (mapv entities v))
                                          (assoc entity a (get entities v)))))) {})))
                ;; hydrate
                tree      (->> root-eids
                               (into [] (comp (map hydrate) (remove empty?))))]
            (assoc result root tree))) {})))

And at last, we can try it out in the REPL.

(resolve-query
  (d/db conn)
  '{:human [:human/name
            {:human/starships [:ship/name
                               [:ship/class :ship.class/fighter]]}]})

;; => {:human
;;     [#:human{:name "Naomi Nagata",
;;              :starships
;;              [#:ship{:name "Roci", :class :ship.class/fighter}]}
;;      #:human{:starships
;;              [#:ship{:name "Roci", :class :ship.class/fighter}],
;;              :name "Amos Burton"}]}

All code is available on Github.