25 March 2018

Derived Attributes For DataScript

Abstract
Many aspects of business logic can often be captured as attributes derived from more fundamental facts stored in DataScript. Exposing both through a unified Pull API makes this view explicit. The resulting code keeps logic consistent across the application and improves the self-documentation of component data-dependencies through declarative queries.

Consider the backend for an online shop, given by the following schema:

(def schema
  {:product/id                {:db/unique :db.unique/identity}
   :product/list-price        {}
   :product/premium-eligible? {}
   
   :customer/id       {:db/unique :db.unique/identity}
   :customer/purchase {:db/valueType :db.type/ref
                       :db/cardinality :db.cardinality/many}})

We want to give customers who’ve purchased more than CHF 1.000 worth of products a premium status, thus offering them (along with many other benefits I’m sure) a discount on certain, premium-eligible products. This new concept must be reflected in both, a customer’s profile view, as well as on a product-detail page, in order to show the right price.

(component profile-view
  (query [customer-id]
    '[{:customer/purchase [:product/id 
                           :product/list-price]}])
  (render [props]
    (let [total (->> (:customer/purchase props)
                     (map :product/list-price)
                     (apply +))]
      (if (<= total 1000)
        [:h1 "Keep purchasing those goods!"]
        [:h1 "You're looking really premium today!"]))))
        
(component product-view
  (query [customer-id product-id]
    '{:product  [:product/list-price 
                 :product/premium-eligible?]
      :customer {:customer/purchase [:product/id 
                                     :product/list-price]}})
  (render [props]
    (let [total      (->> (:customer/purchase props)
                          (map :product/list-price)
                          (apply +))
          product    (:product props)
          list-price (:product/list-price product)]
      (if (and (> total 1000)
               (:product/premium-eligible? product))
        [:h1 "Premium Price: " (* list-price 0.8)]
        [:h1 "Price: " list-price]))))

Note: All code examples in this post are somewhat pseudo-code-ish, to keep things simple. The concepts used are taken from great projects like rum and om.next. If you’ve had some exposure to the declarative-query approach to UI components, you will have no trouble following along. If not, the om.next wiki is a great place to start learning more.

This first sketch invites some very obvious consistency problems, which we should fix, before talking about anything else. We replicated the business logic of what we consider to be a premium customer, and what prices to offer to them inside the two components.

(def PREMIUM_THRESHOLD 1000)
(def PREMIUM_DISCOUNT 0.8)

(defn is-premium? [db customer-id]
  (let [customer        (d/entity db customer-id)
        total-purchased (->> (:customer/purchase customer)
                             (map :product/list-price)
                             (apply +))]
    (> total-purchased PREMIUM_THRESHOLD)))
    
(defn get-price [db customer-id product-id]
  (let [product    (d/entity db product-id)
        list-price (:product/list-price product)]
    (if (and (is-premium? db customer-id)
             (:product/prime-eligible? product))
      (* list-price PREMIUM_DISCOUNT)
      list-price)))

(component profile-view
  (query [customer-id] 
    '[])
  (render [{:keys [db customer-id]}]
    (if (is-premium? db customer-id)
      [:h1 "Keep purchasing those goods!"]
      [:h1 "You're looking really premium today!"]))))
        
(component product-view
  (query [customer-id product-id]
    '[])
  (render [{:keys [db customer-id product-id]}]
    (let [price (get-price db customer-id product-id)]
      [:h1 "Price: " price])))

This is better in the DRY sense, meaning that the important parts of the application’s logic are defined only once and can be re-used from multiple components, without the possibility of inconsistent interpretations. Naturally, a UI component is not the best place for pricing decisions to happen. Some less obvious problems persist.

First of all, we’ve lost information that was previously encoded in the component’s queries, harming self-documentation of the code. We also lose the ability to automatically determine the set of facts on which a component depends, which would’ve helped us in quickly determining whether a given component should be re-rendered.

Worse, if in fact we’d still be using that optimization, we’ve opened ourselves up to a much more subtle source of inconsistencies. Whenever the result of a query function depends on facts which are not included in the component query, the rendered UI will be out of sync with the application state. Finally, components must now be careful to only ever pass the database snapshot to query functions, from which they are rendering themselves, otherwise the component might end up in an inconsistent state.

We’ve also potentially caused some inefficiencies, because the same database queries might now get re-evaluated all over the place. We could memoize is-premium? and get-price, but we’d be wasting a lot of effort memoizing results we’ll never be interested in again. Intuitively, these computations need only be cached across the same render cycle.

Peering Into The Tarpit

Out of the Tarpit, by Moseley and Marks, is a seminal work and a beloved staple of Clojure lore. In the tarpit sense, the premiumness of a customer and the final display price of an item are part of the non-essential, or accidental state of our application, because they are functionally determined by other, more fundamental attributes (such as purchase history and list price in this case).

Still, as we will see, exposing both kinds of state through a unified API will provide an elegant way to implement and use business logic consistently.

Disciples of the schools of Datomic and DataScript are accustomed to representing essential state as attributes, which in turn are stored as facts in the database. We will therefore refer to such attributes as reified. As it transpires, we can model non-essential state as computations on top of reified attributes and use them in queries, alongside their less fickle siblings. Such attributes, we will call derived.

Implementation

We can easily extend DataScript with the notion of derived attributes. First, we will have to indicate derived attributes as such in the schema.

(def schema
  {:product/id                {:db/unique :db.unique/identity}
   :product/list-price        {}
   :product/premium-eligible? {}
+  :product/price             {:db/valueType :db.type/derived}
   
   :customer/id       {:db/unique :db.unique/identity}
   :customer/purchase {:db/valueType :db.type/ref
                       :db/cardinality :db.cardinality/many}
+  :customer/premium? {:db/valueType :db.type/derived}})

Of course, we will have to provide the actual derivations. Let’s call this function read, to pay tribute to and highlight a similarity in approach with om.next.

(defmulti read (fn [key db eid & args] key))

(defmethod read :customer/premium? [db customer-id]
  (let [customer        (d/entity db customer-id)
        total-purchased (->> (:customer/purchase customer)
                             (map :product/list-price)
                             (apply +))]
    (> total-purchased PREMIUM_THRESHOLD)))
    
(defmethod read :product/price [db product-id customer-id]
  (let [product    (d/entity db product-id)
        list-price (:product/list-price product)]
    (if (and (is-premium? db customer-id)
             (:product/prime-eligible? product))
      (* list-price PREMIUM_DISCOUNT)
      list-price)))

You will note, that nothing has changed implementation-wise, apart from coercing the various computations into a more uniform interface.

Next, we will have to extend the Pull grammar and consequently the query parser itself. In order to make derived attributes clearly distinguishable inside a query, they should be properly annotated. For this, we simply re-use the existing syntax for attribute expressions (limit and default):

[:customer/id (read :customer/premium?)]

Implementing support for this in the parser is straightforward and consists mostly of copy-pasting the code path for any of the existing attribute expressions.

Finally, all that’s left to do is to extend DataScript’s Pull API to accept a polymorphic read implementation as an additional argument, which can then be called by the pull parser to resolve derived attributes in a query.

Our UI components end up looking like this:

(component profile-view
  (query [customer-id]
    '[(read :customer/premium?)])
  (render [props]
    (if (:customer/premium? props)
      [:h1 "Keep purchasing those goods!"]
      [:h1 "You're looking really premium today!"])))
        
(component product-view
  (query [customer-id product-id]
    `[(read :product/price ~customer-id)])
  (render [props]
    [:h1 "Price: " (:product/price props)]))

To make things even smoother, we can allow for derivations returning entity ids to be recursively pulled as would be expected:

[{(read :parent/derived-child [:child/id])}]

Which we could then use to display a list of a customer’s favourite products, together with their names and ratings.

(component favourites-view
  (query [customer-id] 
    '[{(read :customer/favourite) [:product/name 
                                   (read :product/rating)]}])
  (render [props] ...))

Note: If you are interested in actually using this, but are not sure how to make the necessary patches yourself, please send me an email and we can figure out a way to get this into a more re-usable package on top of DataScript.

Conclusions

Let us revisit the premise of this exploration. Business logic must live outside of components, in order to be applied consistently and without too much redundant work. We observed that doing so breaks the benefits provided by om.next-style declarative component queries, because components may now depend on state not declared in their query.

Capturing non-essential state via functions on DataScript values comes very naturally to the functional programmer, provides a single source of truth for business logic, and keeps UI components simple. Working with reified and derived attributes through a unified Pull API ensures consistent reads, while maintaining the self-documenting properties of declarative component queries. It also allows us to more accurately capture component dependencies, such that render optimizations based on component queries remain possible and safe.

On top of that, this approach solves caching in an elegant and generalized way: During a render cycle, reading a derived attribute from the query result is no different from reading a reified one. Some sub-queries will of course still be evaluated more often than strictly required. Should this ever become a serious problem, it would still be better adressed by caching one layer below, inside the query engine.

Extending support to regular queries and other parts of DataScript can be useful as well and will be covered in future posts. I’d love to hear your feedback and questions via mail or Twitter.