Post-process utils for pivot exports The shape returned by the pivot qp is not the same visually as what a pivot table looks like in the app. It's all of the same data, but some post-processing logic needs to run on the rows to be able to present them visually in the same way as in the app. | (ns metabase.query-processor.pivot.postprocess (:refer-clojure :exclude [run!]) (:require [clojure.set :as set] [clojure.string :as str] [flatland.ordered.map :as ordered-map] [flatland.ordered.set :as ordered-set] [metabase.query-processor.streaming.common :as common] [metabase.util.malli :as mu] [metabase.util.malli.registry :as mr] [metabase.util.performance :as perf :refer [run!]]) (:import (java.util ArrayList))) |
(set! *warn-on-reflection* true) | |
I'll do my best to concisely explain what's happening here. Some terms: - raw pivot rows -> the rows returned by the above 'pivot query processor machinery'. - pivot-cols/pivot-rows -> vectors of indices into the raw pivot rows where the final pivot row/col values come from the values from these indices are what make up the header row and header column labels - pivot-measures -> vector of indices into raw pivot rows where the aggregated value comes from. This the values from these indices (often just 1 idx) are what end up in the table's 'cells' (the stuff making up the bulk of the table) | |
an example of what a raw pivot row might look like, with header shown for clarity: {:Cat A "AA", :Cat B "BA", :Cat C "CA", :Cat D "DA", :pivot-grouping 0, :Sum of Measure 1} [Cat A Cat B Cat C Cat D pivot-grouping Sum of Measure] [ "AA" "BA" "CA" "DA" 0 1] | |
The 'pivot-grouping' is the giveaway. If you ever see that column, you know you're dealing with raw pivot rows. | |
Pivot query results have a 'pivot-grouping' column. Rows whose pivot-grouping value is 0 are expected results. Rows whose pivot-grouping values are greater than 0 represent subtotals, and should not be included in non-pivot result outputs. | (def NON_PIVOT_ROW_GROUP 0) |
Most of the post processing functions use a 'pivot-spec' map. | (mr/def ::pivot-spec
[:map
[:column-titles [:sequential [:string]]]
[:pivot-rows [:sequential [:int {:min 0}]]]
[:pivot-cols [:sequential [:int {:min 0}]]]
[:pivot-grouping-key {:optional true}
[:int {:min 0}]]
[:pivot-measures {:optional true}
[:sequential [:int {:min 0}]]]]) |
Get the index into the raw pivot rows for the 'pivot-grouping' column. | (defn pivot-grouping-key [column-titles] ;; a vector is kinda sorta a map of indices->values, so ;; we can use map-invert to create the map (get (set/map-invert (vec column-titles)) "pivot-grouping")) |
Get the indices into the raw pivot rows corresponding to the pivot table's measure(s). | (mu/defn- pivot-measures
[{:keys [pivot-rows pivot-cols column-titles]} :- ::pivot-spec]
(-> (set/difference
;; every possible idx is just the range over the count of cols
(set (range (count column-titles)))
;; we exclude indices already used in pivot rows and cols, and the pivot-grouping key
;; recall that a raw pivot row will always contain this 'pivot-grouping' column, which we don't actually need to use.
(set (concat pivot-rows pivot-cols [(pivot-grouping-key column-titles)])))
sort
vec)) |
(mu/defn add-pivot-measures :- ::pivot-spec
"Given a pivot-spec map without the `:pivot-measures` key, determine what key(s) the measures will be and assoc that value into `:pivot-measures`."
[{measure-indices :pivot-measures :as pivot-spec} :- ::pivot-spec]
(let [pivot-grouping-key (pivot-grouping-key (:column-titles pivot-spec))]
(cond-> pivot-spec
;; if pivot-measures don't already exist (from the pivot qp), we add them ourselves, assuming lowest ID -> highest ID sort order
(not (seq measure-indices)) (assoc :pivot-measures (pivot-measures pivot-spec))
;; otherwise, we modify indices to skip over whatever the pivot-grouping idx is, so we pull the correct values per row
(seq measure-indices) (update :pivot-measures (fn [indices]
(mapv (fn [idx]
(if (>= idx pivot-grouping-key)
(inc idx)
idx))
indices)))
true (assoc :pivot-grouping pivot-grouping-key)))) | |
(mu/defn add-totals-settings :- ::pivot-spec
"Given a pivot-spec map and `viz-settings`, add the `:row-totals?` and `:col-totals?` keys."
[pivot-spec :- ::pivot-spec viz-settings]
(let [row-totals (if (contains? viz-settings :pivot.show_row_totals)
(:pivot.show_row_totals viz-settings)
true)
col-totals (if (contains? viz-settings :pivot.show_column_totals)
(:pivot.show_column_totals viz-settings)
true)]
(-> pivot-spec
(assoc :row-totals? row-totals)
(assoc :col-totals? col-totals)))) | |
Initiate the pivot data structure. | (mu/defn init-pivot
[pivot-spec :- ::pivot-spec]
(let [{:keys [pivot-measures]} pivot-spec]
{:config pivot-spec
:data {}
;; A nested tree of ordered maps & sets, representing all combinations of row values in the data
:row-paths (ordered-map/ordered-map)
;; A nested tree of ordered maps & sets, representing all combinations of column values in the data
:col-paths (ordered-map/ordered-map)
:measure-values (zipmap pivot-measures (repeat (sorted-set)))})) |
Assocs a list of values in a path tree, which should consist of a hierarchy of ordered-maps, with leaf values stored in ordered-sets. | (defn- add-to-path-tree
[tree ks]
(let [step
(fn step [tree [k & ks]]
(if ks
(let [next-map (or (get tree k) (ordered-map/ordered-map))]
(assoc tree k (step next-map ks)))
(let [leaf-set (if (set? tree) tree (ordered-set/ordered-set))]
(conj leaf-set k))))]
(step tree ks))) |
Aggregators for the column totals | (defn- measure->agg-fn
[k]
(case k
(:sum :count :total)
(fn [prev v]
(if (number? v)
(-> (merge {:result 0} prev)
(update :result #(+ % v)))
v))
:avg
(fn [prev v]
(if (number? v)
(-> (merge {:total 0
:count 0}
prev)
(update :total #(+ % v))
(update :count inc))
v))
:min
(fn [prev v]
(if (number? v)
(update prev :min
(fn [x] (if x
(min x v)
v)))
v))
:max
(fn [prev v]
(if (number? v)
(update prev :max
(fn [x] (if x
(max x v)
v)))
v))
;; else
(fn [_prev v] v))) |
Update the given Measure aggregations is a map whose keys are each pivot-measure; often just 1 key, but could be several depending on how the user has set up their measures.
| (defn- update-aggregate
[measure-aggregations new-values agg-fns]
(into {}
(map
(fn [[measure-key agg]]
(let [agg-fn-key (get agg-fns measure-key :total)
new-v (get new-values measure-key)]
[measure-key (if new-v
(let [agg-fn (measure->agg-fn agg-fn-key)]
(agg-fn agg new-v))
agg)])))
measure-aggregations)) |
Aggregate the given | (defn add-row
[pivot row]
(let [{:keys [pivot-rows
pivot-cols
pivot-measures
measures]} (:config pivot)
row-path (mapv row pivot-rows)
col-path (mapv row pivot-cols)
measure-vals (select-keys row pivot-measures)
total-fn* (fn [m path]
(if (seq path)
(update-in m path
#(update-aggregate (or % (zipmap pivot-measures (repeat {}))) measure-vals measures))
m))
total-fn (fn [m paths]
(reduce total-fn* m paths))]
(-> pivot
(update :row-count (fn [v] (if v (inc v) 0)))
(update :data update-in (concat row-path col-path)
#(update-aggregate (or % (zipmap pivot-measures (repeat {}))) measure-vals measures))
(update :row-paths
#(when (seq row-path) (add-to-path-tree % row-path)))
(update :col-paths
#(when (seq col-path) (add-to-path-tree % col-path)))
(update :totals (fn [totals]
(-> totals
(total-fn [[:grand-total]
row-path
col-path
[:section-totals (first row-path)]])
(total-fn (map (fn [part]
;; here, the `:rows-part` and `:cols-part` keys exist to
;; force paths into the :totals map to be unique.
;; without this, it is possible that a path is already written to
;; if a pivot-col value by chance happens to be the same number
;; of an idx into the row, such as a product ID of 4 matching
;; the pivot-measure idx of 4 if 2 pivot-rows and 1 pivot-col are configured.
;; Previously, in such a case, the measure map (the second deepest 'nesting')
;; can be erroneously accessed when later aggregating
;; to try illustrate, let's say that earlier, these 2 steps occurred:
;; `(assoc-in totals-map [:column-totals "RowA"] {4 {:result 1}})`
;; `(assoc-in totals-map [:column-totals "RowA" 3] {4 {:result 1}})`
;; the result will look like:
;; {:column-totals {"RowA" {4 {:result 1}
;; 3 {4 {:result 1}}}}}
;; Now, you're attempting to (update-aggregate totals-map [:column-totals "RowA"])
;; but, you'll be operating on an unexpected map shape (the key 3 does not correspond to a measure)
;; This is why in issue #50207, when switching around the pivot-rows, things broke. It wasn't
;; the switching, but rather that the second pivot-row's values were IDs, thus the integer 4
;; was part of some totals paths, breaking aggregating in later steps.
(concat [:column-totals :rows-part] part [:cols-part] col-path))
(rest (reductions conj [] row-path)))))))))) |
Format a value using the provided formatter or identity function. | (defn- fmt
[formatter v-map]
(let [value (if (map? v-map)
(or (:result v-map)
(when (contains? v-map :total)
(/ (double (:total v-map)) (:count v-map)))
(:min v-map)
(:max v-map)
(seq v-map))
v-map)]
(when value
((or formatter identity) (common/format-value value))))) |
Build multi-level column headers. | (defn- build-column-headers
[{:keys [pivot-cols pivot-measures column-titles row-totals?]} col-combos col-formatters]
(perf/concat
(if (= 1 (count pivot-measures))
(mapv (fn [col-combo] (perf/mapv fmt col-formatters col-combo)) col-combos)
(into [] (mapcat (fn [col-combo]
(let [formatted (perf/mapv fmt col-formatters col-combo)
it (.iterator ^Iterable pivot-measures)]
(loop [acc (transient [])]
(if (.hasNext it)
(recur (conj! acc (conj formatted (get column-titles (.next it)))))
(persistent! acc))))))
col-combos))
(when row-totals?
(repeat (count pivot-measures)
(perf/concat
(when (and row-totals? (> (count pivot-cols) 0)) ["Row totals"])
(repeat (dec (count pivot-cols)) nil)
(when (and (seq pivot-cols) (> (count pivot-measures) 1)) [nil])))))) |
Combine row keys with column headers. | (defn- build-headers
[column-headers {:keys [pivot-cols pivot-rows column-titles]}]
(some->> (not-empty (filterv seq column-headers))
perf/transpose
(mapv (fn [h]
(perf/concat
(perf/mapv #(get column-titles %)
(if (and (seq pivot-cols) (empty? pivot-rows))
pivot-cols pivot-rows))
h))))) |
Build a single row of the pivot table. | (defn- build-row
[row-combo col-combos pivot-measures data totals row-totals? ordered-formatters row-formatters config]
;; This implementation is very unorthodox, but this function is incredibly hot, so it must avoid allocation at all
;; readability costs. Any iterator-based iteration allocates, but also constructing an internal mapping lambda that
;; encloses the outer value is so so expensive.
(let [row-path (vec row-combo)
row-data (get-in data row-path)
n (count col-combos)
m (count pivot-measures)
result (ArrayList. (* (max 1 n) m))]
(when-not (seq row-formatters)
(dotimes [_ (count pivot-measures)] (.add result nil)))
;; We first add the pivot row values before the actual row values. It is intentional that we don't format pivot
;; row values just yet, because they will be used as data later on during grouping.
(perf/run! #(.add result %) row-combo)
(if (seq col-combos)
(loop [i 0, j -1, col-combo nil, vals nil]
;; we need to lead with col-combo here so that each row will alternate between all of the measures, rather
;; than have all measures of one kind bunched together. That is, if you have a table with `count` and
;; `avg` the row must show count-val, avg-val, count-val, avg-val ... etc
(if (= j -1)
(when (< i n)
(let [col-combo (nth col-combos i)]
(recur i (inc j) col-combo (reduce get row-data col-combo))))
(if (< j m)
(let [measure-key (nth pivot-measures j)
formatter (get ordered-formatters measure-key)
formatted-val (fmt formatter (get vals measure-key))]
(.add result formatted-val)
(recur i (inc j) col-combo vals))
(recur (inc i) -1 nil nil))))
;; If there are no columns, we still fill in one column per measure value
(run! (fn [measure-key]
(let [formatter (get ordered-formatters measure-key)]
(.add result (fmt formatter (get row-data measure-key)))))
pivot-measures))
(when (and row-totals? (> (count (:pivot-cols config)) 0))
(let [row-totals (get-in totals row-path)]
(run! #(.add result (fmt (get ordered-formatters %) (get row-totals %)))
pivot-measures)))
result)) |
Build column totals for a section. | (defn- build-column-totals
[section-path col-combos pivot-measures totals row-totals? ordered-formatters pivot-rows pivot-cols]
(let [cols-part (get-in totals (concat [:column-totals :rows-part] section-path [:cols-part]))
totals-row (ArrayList. (* (count col-combos) (count pivot-measures)))]
(if (seq col-combos)
(run! (fn [col-combo]
(let [m (reduce get cols-part col-combo)]
(run! (fn [measure-key]
(.add totals-row (fmt (get ordered-formatters measure-key) (get m measure-key))))
pivot-measures)))
col-combos)
;; If there are no columns, we still fill in one column per measure value
(run! (fn [measure-key]
(.add totals-row (fmt (get ordered-formatters measure-key) (get cols-part measure-key))))
pivot-measures))
(perf/concat
[(format "Totals for %s" (fmt (get ordered-formatters (first pivot-rows)) (last section-path)))]
(repeat (dec (count pivot-rows)) nil)
totals-row
(when (and row-totals? (> (count pivot-cols) 0))
(let [totals' (-> totals :section-totals (get-in section-path))]
(mapv #(fmt (get ordered-formatters %) (get totals' %))
pivot-measures)))))) |
Build grand totals row. | (defn- build-grand-totals
[{:keys [pivot-cols pivot-rows pivot-measures]} col-combos totals row-totals? ordered-formatters]
(perf/concat
["Grand totals"]
(repeat (dec (count (if (and (seq pivot-cols) (not (seq pivot-rows)))
pivot-cols pivot-rows)))
nil)
(when (and row-totals? (> (count pivot-cols) 0))
(into [] (mapcat (fn [col-combo]
(let [m (reduce get totals col-combo)]
(perf/mapv #(fmt (get ordered-formatters %) (get m %)) pivot-measures))))
col-combos))
(for [measure-key pivot-measures]
(fmt (get ordered-formatters measure-key)
(get-in totals [:grand-total measure-key]))))) |
(defn- append-totals-to-subsections
[pivot section col-combos ordered-formatters]
(let [{:keys [config
totals]} pivot
{:keys [pivot-rows
pivot-cols
pivot-measures
row-totals?]} config]
(perf/concat
(reduce
(fn [section pivot-row-idx]
(mapcat
(fn [[k rows]]
(let [partial-path (take pivot-row-idx (first rows))
subtotal-path (concat partial-path [k])
total-row (vec (build-column-totals
subtotal-path
col-combos
pivot-measures
totals
row-totals?
ordered-formatters
pivot-rows
pivot-cols))
;; inside a subsection, we know that the 'parent' subsection values will all be the same
;; so we can just grab it from the first row
next-subsection-value (nth (first rows) (dec pivot-row-idx))]
(conj (vec rows)
;; assoc the next subsection's value into the row so it stays grouped in the next reduction
(if (<= (dec pivot-row-idx) 0)
total-row
(assoc total-row (dec pivot-row-idx) next-subsection-value)))))
(group-by (fn [r] (nth r pivot-row-idx)) section)))
section
(reverse (range 1 (dec (count pivot-rows)))))
[(vec (build-column-totals
[(ffirst section)]
col-combos
pivot-measures
totals
row-totals?
ordered-formatters
pivot-rows
pivot-cols))]))) | |
Takes a tree of row or column paths and returns a new tree with ordered-maps replaced as needed with sorted-maps, and
ordered-sets replaced with sorted-sets, based on the provided | (defn sort-path-tree
[tree [first-index & indices] sort-orders]
(let [sort-order (get sort-orders first-index)
compare-fn (case sort-order
:ascending compare
:descending #(compare %2 %1)
nil)]
(cond
(associative? tree)
(into (if compare-fn
(sorted-map-by compare-fn)
(ordered-map/ordered-map))
(for [[k v] tree]
[k (sort-path-tree v indices sort-orders)]))
(set? tree)
(if compare-fn
(into (sorted-set-by compare-fn) tree)
tree)
:else tree))) |
Enumerate all paths from the root to a leaf in a tree structure composed of maps and sets. | (defn enumerate-paths
[m]
(letfn [(enumerate [prefix m]
(if-not (associative? m)
(mapv #(conj prefix %) m)
(into [] (mapcat (fn [[k v]]
(enumerate (conj prefix k) v))
m))))]
(enumerate [] m))) |
In the almost final row, the pivot row cells still contain unformatted values. We need to format them. | (defn- format-pivot-row-cells
[row row-formatters pivot-rows-cnt]
(if (pos? pivot-rows-cnt)
(let [first-entry (first row)]
;; Exclude rows that begin with "Totals ..."
(if (and (string? first-entry) (str/starts-with? first-entry "Totals"))
row
;; This manual iterator-loop goes over the whole row but applies the formatting only to first
;; `pivot-rows-cnt` values, and leaves the rest of the values unchanged (they are already formatted).
(let [it (.iterator ^Iterable row)]
(loop [res (transient []), i 0]
(if (.hasNext it)
(recur (conj! res (if (< i pivot-rows-cnt)
(fmt (nth row-formatters i) (.next it))
(.next it)))
(unchecked-inc i))
(persistent! res))))))
row)) |
Arrange and format the aggregated | (defn build-pivot-output
[pivot ordered-formatters]
(let [{:keys [config
data
totals
row-paths
col-paths]} pivot
{:keys [pivot-rows
pivot-cols
pivot-measures
;; `column` here refers to columns in the original data, which can be pivot rows *or* columns
column-sort-order
column-titles
row-totals?
col-totals?]} config
row-formatters (mapv #(get ordered-formatters %) pivot-rows)
col-formatters (mapv #(get ordered-formatters %) pivot-cols)
sorted-row-paths (sort-path-tree row-paths pivot-rows column-sort-order)
sorted-col-paths (sort-path-tree col-paths pivot-rows column-sort-order)
sorted-row-combos (enumerate-paths sorted-row-paths)
sorted-col-combos (enumerate-paths sorted-col-paths)
column-headers (build-column-headers config sorted-col-combos col-formatters)
headers (or (not-empty (build-headers column-headers config))
[(mapv #(get column-titles %) (into (vec pivot-rows) pivot-measures))])]
(perf/concat
headers
(transduce
(remove empty?)
into []
(let [sections-rows
(mapv (fn [section-row-combos]
(mapv (fn [row-combo]
(build-row row-combo sorted-col-combos pivot-measures data totals
row-totals? ordered-formatters row-formatters config))
section-row-combos))
(partition-by first sorted-row-combos))
pivot-rows-cnt (count pivot-rows)]
(perf/mapv
(fn [section-rows]
(->>
section-rows
;; section rows are either enriched with column-totals rows or left as is
((fn [rows]
(if (and col-totals? (> (count pivot-rows) 1))
(append-totals-to-subsections pivot rows sorted-col-combos ordered-formatters)
rows)))
;; then, we apply the row-formatters to the pivot-rows portion of each row,
;; filtering out any rows that begin with "Totals ..."
(mapv #(format-pivot-row-cells % row-formatters pivot-rows-cnt))))
sections-rows)))
(when col-totals?
[(build-grand-totals config sorted-col-combos totals row-totals? ordered-formatters)])))) |