(ns metabase.lib.schema.metadata (:require [metabase.lib.schema.common :as lib.schema.common] [metabase.lib.schema.id :as lib.schema.id] [metabase.util.malli.registry :as mr])) | |
Column vs Field? Lately I've been using Column = any column returned by a query or stage of a query
Field = a Column that is associated with a capital-F Field in the application database, i.e. has an All Fields are Columns, but not all Columns are Fields. Also worth a mention: we also have | |
(mr/def ::column-source [:enum ;; these are for things from some sort of source other than the current stage; ;; they must be referenced with string names rather than Field IDs :source/card :source/native :source/previous-stage ;; these are for things that were introduced by the current stage of the query; `:field` references should be ;; referenced with Field IDs if available. ;; ;; default columns returned by the `:source-table` for the current stage. :source/table-defaults ;; specifically introduced by the corresponding top-level clauses. :source/fields :source/aggregations :source/breakouts ;; introduced by a join, not necessarily ultimately returned. :source/joins ;; Introduced by `:expressions`; not necessarily ultimately returned. :source/expressions ;; Not even introduced, but 'visible' because this column is implicitly joinable. :source/implicitly-joinable]) | |
The way FieldValues/remapping works is hella confusing, because it involves the FieldValues table and Dimension
table, and the | |
Possible options for column metadata | (def column-has-field-values-options
;; AUTOMATICALLY-SET VALUES, SET DURING SYNC
;;
;; `nil` -- means infer which widget to use based on logic in [[metabase.lib.field/infer-has-field-values]]; this
;; will either return `:search` or `:none`.
;;
;; This is the default state for Fields not marked `auto-list`. Admins cannot explicitly mark a Field as
;; `has_field_values` `nil`. This value is also subject to automatically change in the future if the values of a
;; Field change in such a way that it can now be marked `auto-list`. Fields marked `nil` do *not* have FieldValues
;; objects.
;;
#{;; The other automatically-set option. Automatically marked as a 'List' Field based on cardinality and other factors
;; during sync. Store a FieldValues object; use the List Widget. If this Field goes over the distinct value
;; threshold in a future sync, the Field will get switched back to `has_field_values = nil`.
;;
;; Note that when this comes back from the REST API or [[metabase.lib.field/field-values-search-info]] we always
;; return this as `:list` instead of `:auto-list`; this is done by [[metabase.lib.field/infer-has-field-values]].
;; I guess this is because the FE isn't supposed to need to care about whether this is `:auto-list` vs `:list`;
;; those distinctions are only important for sync I guess.
:auto-list
;;
;; EXPLICITLY-SET VALUES, SET BY AN ADMIN
;;
;; Admin explicitly marked this as a 'Search' Field, which means we should *not* keep FieldValues, and should use
;; Search Widget.
:search
;; Admin explicitly marked this as a 'List' Field, which means we should keep FieldValues, and use the List
;; Widget. Unlike `auto-list`, if this Field grows past the normal cardinality constraints in the future, it will
;; remain `List` until explicitly marked otherwise.
:list
;; Admin explicitly marked that this Field shall always have a plain-text widget, neither allowing search, nor
;; showing a list of possible values. FieldValues not kept.
:none}) |
(mr/def ::column.has-field-values (into [:enum] (sort column-has-field-values-options))) | |
(mr/def ::column.remapping.external "External remapping (Dimension) for a column. From the [[metabase.models.dimension]] with `type = external` associated with a `Field` in the application database. See [[metabase.query-processor.middleware.add-dimension-projections]] for what this means." [:map [:lib/type [:= :metadata.column.remapping/external]] [:id ::lib.schema.id/dimension] ;; from `dimension.name` [:name ::lib.schema.common/non-blank-string] ;; `dimension.human_readable_field_id` in the application database. ID of the Field to get human-readable values ;; from. e.g. if the column in question is `venues.category-id`, then this would be the ID of `categories.name` [:field-id ::lib.schema.id/field]]) | |
Internal remapping (FieldValues) for a column. From [[metabase.models.dimension]] with | (mr/def ::column.remapping.internal [:map [:lib/type [:= :metadata.column.remapping/internal]] [:id ::lib.schema.id/dimension] ;; from `dimension.name` [:name ::lib.schema.common/non-blank-string] ;; From `metabase_fieldvalues.values`. Original values [:values [:sequential :any]] ;; From `metabase_fieldvalues.human_readable_values`. Human readable remaps for the values at the same indexes in ;; `:values` [:human-readable-values [:sequential :any]]]) |
(mr/def ::column
"Malli schema for a valid map of column metadata, which can mean one of two things:
1. Metadata about a particular Field in the application database. This will always have an `:id`
2. Results metadata from a column in `data.cols` and/or `data.results_metadata.columns` in a Query Processor
response, or saved in something like `Card.result_metadata`. These *may* have an `:id`, or may not -- columns
coming back from native queries or things like `SELECT count(*)` aren't associated with any particular `Field`
and thus will not have an `:id`.
Now maybe these should be two different schemas, but `:id` being there or not is the only real difference; besides
that they are largely compatible. So they're the same for now. We can revisit this in the future if we actually want
to differentiate between the two versions."
[:map
{:error/message "Valid column metadata"}
[:lib/type [:= :metadata/column]]
;; column names are allowed to be empty strings in SQL Server :/
[:name :string]
;; TODO -- ignore `base_type` and make `effective_type` required; see #29707
[:base-type ::lib.schema.common/base-type]
;; This is nillable because internal remap columns have `:id nil`.
[:id {:optional true} [:maybe ::lib.schema.id/field]]
[:display-name {:optional true} [:maybe :string]]
[:effective-type {:optional true} [:maybe ::lib.schema.common/base-type]]
;; type of this column in the data warehouse, e.g. `TEXT` or `INTEGER`
[:database-type {:optional true} [:maybe :string]]
[:active {:optional true} :boolean]
;; if this is a field from another table (implicit join), this is the field in the current table that should be
;; used to perform the implicit join. e.g. if current table is `VENUES` and this field is `CATEGORIES.ID`, then the
;; `fk_field_id` would be `VENUES.CATEGORY_ID`. In a `:field` reference this is saved in the options map as
;; `:source-field`.
[:fk-field-id {:optional true} [:maybe ::lib.schema.id/field]]
;; `metabase_field.fk_target_field_id` in the application database; recorded during the sync process. This Field is
;; an foreign key, and points to this Field ID. This is mostly used to determine how to add implicit joins by
;; the [[metabase.query-processor.middleware.add-implicit-joins]] middleware.
[:fk-target-field-id {:optional true} [:maybe ::lib.schema.id/field]]
;; Join alias of the table we're joining against, if any. Not really 100% clear why we would need this on top
;; of [[metabase.lib.join/current-join-alias]], which stores the same info under a namespaced key. I think we can
;; remove it.
[:source-alias {:optional true} [:maybe ::lib.schema.common/non-blank-string]]
;; name of the expression where this column metadata came from. Should only be included for expressions introduced
;; at THIS STAGE of the query. If it's included elsewhere, that's an error. Thus this is the definitive way to know
;; if a column is "custom" in this stage (needs an `:expression` reference) or not.
[:lib/expression-name {:optional true} [:maybe ::lib.schema.common/non-blank-string]]
;; what top-level clause in the query this metadata originated from, if it is calculated (i.e., if this metadata
;; was generated by [[metabase.lib.metadata.calculation/metadata]])
[:lib/source {:optional true} [:ref ::column-source]]
;; ID of the Card this came from, if this came from Card results metadata. Mostly used for creating column groups.
[:lib/card-id {:optional true} [:maybe ::lib.schema.id/card]]
;;
;; this stuff is adapted from [[metabase.query-processor.util.add-alias-info]]. It is included in
;; the [[metabase.lib.metadata.calculation/metadata]]
;;
;; the alias that should be used to this clause on the LHS of a `SELECT <lhs> AS <rhs>` or equivalent, i.e. the
;; name of this clause as exported by the previous stage, source table, or join.
[:lib/source-column-alias {:optional true} [:maybe ::lib.schema.common/non-blank-string]]
;; the name we should export this column as, i.e. the RHS of a `SELECT <lhs> AS <rhs>` or equivalent. This is
;; guaranteed to be unique in each stage of the query.
[:lib/desired-column-alias {:optional true} [:maybe [:string {:min 1, :max 60}]]]
;; when column metadata is returned by certain things
;; like [[metabase.lib.aggregation/selected-aggregation-operators]] or [[metabase.lib.field/fieldable-columns]], it
;; might include this key, which tells you whether or not that column is currently selected or not already, e.g.
;; for [[metabase.lib.field/fieldable-columns]] it means its already present in `:fields`
[:selected? {:optional true} :boolean]
;;
;; REMAPPING & FIELD VALUES
;;
;; See notes above for more info. `:has-field-values` comes from the application database and is used to decide
;; whether to sync FieldValues when running sync, and what certain FE QB widgets should
;; do. (See [[metabase.lib.field/field-values-search-info]]). Note that all metadata providers may not return this
;; column. The JVM provider currently does not, since the QP doesn't need it for anything.
[:has-field-values {:optional true} [:maybe [:ref ::column.has-field-values]]]
;;
;; these next two keys are derived by looking at `FieldValues` and `Dimension` instances associated with a `Field`;
;; they are used by the Query Processor to add column remappings to query results. To see how this maps to stuff in
;; the application database, look at the implementation for fetching a `:metadata/column`
;; in [[metabase.lib.metadata.jvm]]. I don't think this is really needed on the FE, at any rate the JS metadata
;; provider doesn't add these keys.
[:lib/external-remap {:optional true} [:maybe [:ref ::column.remapping.external]]]
[:lib/internal-remap {:optional true} [:maybe [:ref ::column.remapping.internal]]]]) | |
(mr/def ::persisted-info.definition
"Definition spec for a cached table."
[:map
[:table-name ::lib.schema.common/non-blank-string]
[:field-definitions [:maybe [:sequential
[:map
[:field-name ::lib.schema.common/non-blank-string]
;; TODO check (isa? :type/Integer :type/*)
[:base-type ::lib.schema.common/base-type]]]]]]) | |
(mr/def ::persisted-info
"Persisted Info = Cached Table (?). See [[metabase.model-persistence.models.persisted-info]]"
[:map
[:active :boolean]
[:state ::lib.schema.common/non-blank-string]
[:table-name ::lib.schema.common/non-blank-string]
[:definition {:optional true} [:maybe [:ref ::persisted-info.definition]]]
[:query-hash {:optional true} [:maybe ::lib.schema.common/non-blank-string]]]) | |
(mr/def ::card.type [:enum :question :model :metric]) | |
(mr/def ::type [:enum :metadata/database :metadata/table :metadata/column :metadata/card :metadata/metric :metadata/segment]) | |
(mr/def ::card
"Schema for metadata about a specific Saved Question (which may or may not be a Model). More or less the same as
a [[metabase.models.card]], but with kebab-case keys. Note that the `:dataset-query` is not necessarily converted to
pMBQL yet. Probably safe to assume it is normalized however. Likewise, `:result-metadata` is probably not quite
massaged into a sequence of [[::column]] metadata just yet. See [[metabase.lib.card/card-metadata-columns]] that
converts these as needed."
[:map
{:error/message "Valid Card metadata"}
[:lib/type [:= :metadata/card]]
[:id ::lib.schema.id/card]
[:name ::lib.schema.common/non-blank-string]
[:database-id ::lib.schema.id/database]
;; saved query. This is possibly still a legacy query, but should already be normalized.
;; Call [[metabase.lib.convert/->pMBQL]] on it as needed
[:dataset-query {:optional true} :map]
;; vector of column metadata maps; these are ALMOST the correct shape to be [[ColumnMetadata]], but they're
;; probably missing `:lib/type` and probably using `:snake_case` keys.
[:result-metadata {:optional true} [:maybe [:sequential :map]]]
;; what sort of saved query this is, e.g. a normal Saved Question or a Model or a V2 Metric.
[:type {:optional true} [:maybe [:ref ::card.type]]]
;; Table ID is nullable in the application database, because native queries are not necessarily associated with a
;; particular Table (unless they are against MongoDB)... for MBQL queries it should be populated however.
[:table-id {:optional true} [:maybe ::lib.schema.id/table]]
;;
;; PERSISTED INFO: This comes from the [[metabase.model-persistence.models.persisted-info]] model.
;;
[:lib/persisted-info {:optional true} [:maybe [:ref ::persisted-info]]]]) | |
(mr/def ::segment
"More or less the same as a [[metabase.segments.models.segment]], but with kebab-case keys."
[:map
{:error/message "Valid Segment metadata"}
[:lib/type [:= :metadata/segment]]
[:id ::lib.schema.id/segment]
[:name ::lib.schema.common/non-blank-string]
[:table-id ::lib.schema.id/table]
;; the MBQL snippet defining this Segment; this may still be in legacy
;; format. [[metabase.lib.segment/segment-definition]] handles conversion to pMBQL if needed.
[:definition [:maybe :map]]
[:description {:optional true} [:maybe ::lib.schema.common/non-blank-string]]]) | |
converts these as needed. | (mr/def ::metric
[:map
{:error/message "Valid metric metadata"}
[:lib/type [:= :metadata/metric]]
[:id ::lib.schema.id/metric]
[:name ::lib.schema.common/non-blank-string]
[:database-id ::lib.schema.id/database]
;; The definition.
[:dataset-query {:optional true} :map]
;; vector of column metadata maps; these are ALMOST the correct shape to be [[ColumnMetadata]], but they're
;; probably missing `:lib/type` and probably using `:snake_case` keys.
[:result-metadata {:optional true} [:maybe [:sequential :map]]]
;; what sort of saved query this is, e.g. a normal Saved Question or a Model or a V2 Metric.
[:type [:= :metric]]
;; Table ID is nullable in the application database, because native queries are not necessarily associated with a
;; particular Table (unless they are against MongoDB)... for MBQL queries it should be populated however.
[:table-id {:optional true} [:maybe ::lib.schema.id/table]]
;;
;; PERSISTED INFO: This comes from the [[metabase.model-persistence.models.persisted-info]] model.
;;
[:lib/persisted-info {:optional true} [:maybe [:ref ::persisted-info]]]
[:metabase.lib.join/join-alias {:optional true} ::lib.schema.common/non-blank-string]]) |
(mr/def ::table
"Schema for metadata about a specific [[metabase.models.table]]. More or less the same as a [[metabase.models.table]],
but with kebab-case keys."
[:map
{:error/message "Valid Table metadata"}
[:lib/type [:= :metadata/table]]
[:id ::lib.schema.id/table]
[:name ::lib.schema.common/non-blank-string]
[:display-name {:optional true} [:maybe ::lib.schema.common/non-blank-string]]
[:schema {:optional true} [:maybe ::lib.schema.common/non-blank-string]]]) | |
(mr/def ::database
"Malli schema for the DatabaseMetadata as returned by `GET /api/database/:id/metadata` -- what should be available to
the frontend Query Builder."
[:map
{:error/message "Valid Database metadata"}
[:lib/type [:= :metadata/database]]
[:id ::lib.schema.id/database]
;; TODO -- this should validate against the driver features list in [[metabase.driver/features]] if we're in
;; Clj mode
[:dbms-version {:optional true} [:maybe :map]]
[:details {:optional true} :map]
[:engine {:optional true} :keyword]
[:features {:optional true} [:set :keyword]]
[:is-audit {:optional true} :boolean]
[:is-attached-dwh {:optional true} :boolean]
[:settings {:optional true} [:maybe :map]]]) | |
(mr/def ::metadata-provider "Schema for something that satisfies the [[metabase.lib.metadata.protocols/MetadataProvider]] protocol." [:ref :metabase.lib.metadata.protocols/metadata-provider]) | |
(mr/def ::metadata-providerable "Something that can be used to get a MetadataProvider. Either a MetadataProvider, or a map with a MetadataProvider in the key `:lib/metadata` (i.e., a query)." [:ref :metabase.lib.metadata.protocols/metadata-providerable]) | |
(mr/def ::stage
"Metadata about the columns returned by a particular stage of a pMBQL query. For example a single-stage native query
like
{:database 1
:lib/type :mbql/query
:stages [{:lib/type :mbql.stage/mbql
:native \"SELECT id, name FROM VENUES;\"}]}
might have stage metadata like
{:columns [{:name \"id\", :base-type :type/Integer}
{:name \"name\", :base-type :type/Text}]}
associated with the query's lone stage.
At some point in the near future we will hopefully attach this metadata directly to each stage in a query, so a
multi-stage query will have `:lib/stage-metadata` for each stage. The main goal is to facilitate things like
returning lists of visible or filterable columns for a given stage of a query. This is TBD, see #28717 for a WIP
implementation of this idea.
This is the same format as the results metadata returned with QP results in `data.results_metadata`. The `:columns`
portion of this (`data.results_metadata.columns`) is also saved as `Card.result_metadata` for Saved Questions.
Note that queries currently actually come back with both `data.results_metadata` AND `data.cols`; it looks like the
Frontend actually *merges* these together -- see `applyMetadataDiff` in
`frontend/src/metabase/query_builder/selectors.js` -- but this is ridiculous. Let's try to merge anything missing in
`results_metadata` into `cols` going forward so things don't need to be manually merged in the future."
[:map
[:lib/type [:= :metadata/results]]
[:columns [:sequential ::column]]]) | |