How does search scoring work?This was written for a success engineer, but may be helpful here, too. Most of what you care about happens in the We have two sets of scorers. The first is based on the literal text matches and defined here:
These are all weighted: you can see that the exact-match scorer is responsible for 4/10 of the score, the consecutivity one is 2/10, etc. The second set of scorers is defined lower down, here:
And there are two more for Enterprise here:
These are easier to explain: you get points if the search result is pinned (yes or no), bookmarked (yes or no), how
recently it was updated (sliding value between 1 (edited just now) and 0 (edited 180+
days
ago), how many dashboards it appears in (sliding value between 0 (zero dashboards) and 1 (50+
dashboards)
and it's type (
On the EE side, we also give points if something's an official collection and if it's verified. Finally, what we actually search is defined in the search config here, but the short answer is "the name and, if there is one, the description". We used to search raw SQL queries for cards, but that got turned off recently (but I've seen chat about turning it back on). ❦ So, these 12 scorers are weighted and combined together, and the grand total affects search order. If this sounds a
little complicated…it is! It also means that it can be tricky to give a proper answer about why the search ranking
is "wrong", maybe you search for Also, be aware that as of October 2023 there's a big epic under way to add filtering to search results, which should help people find what they're looking for (and spares us from having to make the above algorithm better). | |
Computes a relevancy score for search results using the weighted average of various scorers. Scores are determined by various ways of comparing the text of the search string and the item's title or description, as well as by Metabase-specific features such as how many dashboards a card appears in or whether an item is pinned. Get the score for a result with Some of the scorers can be tweaked with configuration in [[metabase.search.config]]. | (ns metabase.search.in-place.scoring (:require [clojure.string :as str] [java-time.api :as t] [metabase.premium-features.core :refer [defenterprise]] [metabase.search.config :as search.config] [metabase.search.in-place.util :as search.util] [metabase.util :as u])) |
(defn- matches? [search-token match-token] (str/includes? match-token search-token)) | |
(defn- matches-in? [search-token match-tokens] (some #(matches? search-token %) match-tokens)) | |
(defn- tokens->string [tokens abbreviate?] (let [->string (partial str/join " ") context search.config/surrounding-match-context] (if (or (not abbreviate?) (<= (count tokens) (* 2 context))) (->string tokens) (str (->string (take context tokens)) "…" (->string (take-last context tokens)))))) | |
Breaks the matched-text into match/no-match chunks and returns a seq of them in order. Each chunk is a map with keys
| (defn- match-context [query-tokens match-tokens] (->> match-tokens (map (fn [match-token] {:text match-token :is_match (boolean (some #(matches? % match-token) query-tokens))})) (partition-by :is_match) (map (fn [matches-or-misses-maps] (let [is-match (:is_match (first matches-or-misses-maps)) text-tokens (map :text matches-or-misses-maps)] {:is_match is-match :text (tokens->string text-tokens (not is-match))}))))) |
Scores a search result. Returns a vector of score maps, each containing | (defn- text-scores-with [search-native-query weighted-scorers query-tokens search-result] ;; TODO is pmap over search-result worth it? (let [scores (for [column (let [search-columns-fn (requiring-resolve 'metabase.search.in-place.legacy/searchable-columns)] (search-columns-fn (:model search-result) search-native-query)) {:keys [scorer name weight] :as _ws} weighted-scorers :let [matched-text (-> search-result (get column) (search.config/column->string (:model search-result) column)) match-tokens (some-> matched-text search.util/normalize search.util/tokenize) raw-score (scorer query-tokens match-tokens)] :when (and matched-text (pos? raw-score))] {:score raw-score :name (str "text-" name) :weight weight :match matched-text :match-context-thunk #(match-context query-tokens match-tokens) :column column})] (if (seq scores) (vec scores) [{:score 0 :weight 0}]))) |
(defn- consecutivity-scorer [query-tokens match-tokens] (/ (search.util/largest-common-subseq-length matches? ;; See comment on largest-common-subseq-length re. its cache. This is a little conservative, but better to under- than over-estimate (take 30 query-tokens) (take 30 match-tokens)) (count query-tokens))) | |
(defn- occurrences [query-tokens match-tokens token-matches?] (reduce (fn [tally token] (if (token-matches? token match-tokens) (inc tally) tally)) 0 query-tokens)) | |
How many search tokens show up in the result? | (defn- total-occurrences-scorer [query-tokens match-tokens] (/ (occurrences query-tokens match-tokens matches-in?) (count query-tokens))) |
How many search tokens are exact matches (perfect string match, not | (defn- exact-match-scorer [query-tokens match-tokens] (/ (occurrences query-tokens match-tokens #(some (partial = %1) %2)) (count query-tokens))) |
How much of the result is covered by the search query? | (defn fullness-scorer [query-tokens match-tokens] (let [match-token-count (count match-tokens)] (if (zero? match-token-count) 0 (/ (occurrences query-tokens match-tokens matches-in?) match-token-count)))) |
(defn- prefix-counter [query-string item-string] (reduce (fn [cnt [a b]] (if (= a b) (inc cnt) (reduced cnt))) 0 (map vector query-string item-string))) | |
Tokens is a seq of strings, like ["abc" "def"] | (defn- count-token-chars [tokens] (reduce (fn [cnt x] (+ cnt (count x))) 0 tokens)) |
How much does the search query match the beginning of the result? | (defn prefix-scorer [query-tokens match-tokens] (let [query (u/lower-case-en (str/join " " query-tokens)) match (u/lower-case-en (str/join " " match-tokens))] (/ (prefix-counter query match) (count-token-chars query-tokens)))) |
(def ^:private match-based-scorers [{:scorer exact-match-scorer :name "exact-match" :weight 4} {:scorer consecutivity-scorer :name "consecutivity" :weight 2} {:scorer total-occurrences-scorer :name "total-occurrences" :weight 2} {:scorer fullness-scorer :name "fullness" :weight 1} {:scorer prefix-scorer :name "prefix" :weight 1}]) | |
(def ^:private model->sort-position (zipmap (reverse search.config/models-search-order) (range))) | |
(defn- model-score [{:keys [model]}] (/ (or (model->sort-position model) 0) (count model->sort-position))) | |
(defn- text-scores-with-match [result {:keys [search-string search-native-query]}] (if (seq search-string) (text-scores-with search-native-query match-based-scorers (search.util/tokenize (search.util/normalize search-string)) result) [{:score 0 :weight 0}])) | |
(defn- pinned-score [{:keys [model collection_position]}] ;; We experimented with favoring lower collection positions, but it wasn't good ;; So instead, just give a bonus for items that are pinned at all (if (and (#{"card" "dashboard"} model) ((fnil pos? 0) collection_position)) 1 0)) | |
(defn- bookmarked-score [{:keys [model bookmark]}] (if (and (#{"card" "collection" "dashboard"} model) bookmark) 1 0)) | |
(defn- dashboard-count-score [{:keys [model dashboardcard_count]}] (if (= model "card") (min (/ dashboardcard_count search.config/dashboard-count-ceiling) 1) 0)) | |
(defn- recency-score [{:keys [updated_at]}] (let [stale-time search.config/stale-time-in-days days-ago (if updated_at (t/time-between updated_at (t/offset-date-time) :days) stale-time)] (/ (max (- stale-time days-ago) 0) stale-time))) | |
Default weights and scores for a given result. | (defn weights-and-scores [result] [{:weight 2 :score (pinned-score result) :name "pinned"} {:weight 2 :score (bookmarked-score result) :name "bookmarked"} {:weight 3/2 :score (recency-score result) :name "recency"} {:weight 1 :score (dashboard-count-score result) :name "dashboard"} {:weight 1/2 :score (model-score result) :name "model"}]) |
Score a result, returning a collection of maps with score and weight. Should not include the text scoring, done separately. Should return a sequence of maps with {:weight number, :score number, :name string} | (defenterprise score-result metabase-enterprise.search.scoring [result] (weights-and-scores result)) |
(defn- sum-weights [weights] (reduce (fn [acc {:keys [weight] :or {weight 0}}] (+ acc weight)) 0 weights)) | |
(defn- compute-normalized-score [scores] (let [weight-sum (sum-weights scores)] (if (zero? weight-sum) 0 (let [score-sum (reduce (fn [acc {:keys [weight score] :or {weight 0 score 0}}] (+ acc (* score weight))) 0 scores)] (/ score-sum weight-sum))))) | |
Reweight | (defn force-weight [scores total] (let [total-weight (sum-weights scores) weight-calc-fn (if (contains? #{nil 0} total-weight) (fn weight-calc-fn [_] 0) (fn weight-calc-fn [weight] (* total (/ weight total-weight))))] (mapv #(update % :weight weight-calc-fn) scores))) |
This is used to control the total weight of text-based scorers in [[score-and-result]] | (def ^:const text-scores-weight 10) |
Returns a map with the normalized, combined score from relevant-scores as | (defn score-and-result [result {:keys [search-string search-native-query]}] (let [text-matches (-> (text-scores-with-match result {:search-string search-string :search-native-query search-native-query}) (force-weight text-scores-weight)) has-text-match? (some (comp pos? :score) text-matches) all-scores (into (vec (score-result result)) text-matches) relevant-scores (remove (comp zero? :score) all-scores) total-score (compute-normalized-score all-scores)] ;; Searches with a blank search string mean "show me everything, ranked"; ;; see https://github.com/metabase/metabase/pull/15604 for archived search. ;; If the search string is non-blank, results with no text match have a score of zero. (when (or has-text-match? (str/blank? search-string)) {:score total-score :result (assoc result :all-scores all-scores :relevant-scores relevant-scores)}))) |
Compare maps of scores and results. Must return -1, 0, or 1. The score is assumed to be a vector, and will be compared in order. | (defn compare-score [{score-1 :score} {score-2 :score}] (compare score-1 score-2)) |
Given a reducible collection (i.e., from | (defn top-results [reducible-results max-results xf] (->> reducible-results (transduce xf (u/sorted-take max-results compare-score)) rseq (map :result))) |