Module zillion.report¶
BaseCombinedResult¶
zillion.report.BaseCombinedResult
(warehouse, ds_query_results, primary_ds_dimensions, adhoc_datasources=None)A combination of datasource query results
Parameters:
- warehouse - (Warehouse) A zillion warehouse
- ds_query_results - (list) A list of DataSourceQueryResult objects
- primary_ds_dimensions - (list) A list of dimensions that will be used to create the hash primary key of the combined result table
- adhoc_datasources - (list, optional) A list of FieldManagers specific to this combined result
add_warning
(self, msg, log=True)clean_up
(self)Clean up any resources that can/should be cleaned up
create_table
(self)Create the combined result table
get_conn
(self)Get a database connection to the combined result database
get_cursor
(self, conn)Get a cursor from a database connection
get_final_result
(self, metrics, dimensions, row_filters, rollup, pivot, order_by, limit, limit_first)Get the final result from the combined result table
load_table
(self)Load the combined result table
DataSourceQuery¶
Bases: zillion.report.ExecutionStateMixin, tlbx.logging_utils.PrintMixin
zillion.report.DataSourceQuery
(warehouse, metrics, dimensions, criteria, table_set)Build a query to run against a particular datasource
Parameters:
- warehouse - (Warehouse) A zillion warehouse
- metrics - (OrderedDict) An OrderedDict mapping metric names to Metric objects
- dimensions - (OrderedDict) An OrderedDict mapping dimension names to Dimension objects
- criteria - (list) A list of criteria to be applied when querying. See the Report docs for more details.
- table_set - (TableSet) Build the query against this set of tables that supports the requested metrics and grain
add_metric
(self, metric)Add a metric to this query
Parameters:
- metric - (str) A metric name
covers_field
(self, field)Check whether a field is covered in this query
Parameters:
- field - (str) A field name
Returns:
(bool) - True if this field is covered in this query
covers_metric
(self, metric)Check whether a metric is covered in this query
Parameters:
- metric - (str) A metric name
Returns:
(bool) - True if this metric is covered in this query
execute
(self, timeout=None, label=None)Execute the datasource query
Parameters:
- timeout - (float, optional) A query timeout in seconds
- label - (str, optional) A label to apply to the SQL query
Returns:
(DataSourceQueryResult) - The result of the SQL query
get_conn
(self)Get a connection to this query's datasource
get_datasource
(self)Get a reference to the datasource for this query
get_datasource_name
(self)Get the name of the datasource used in this query
get_dialect_name
(self)Get the name of the datasource dialect
kill
(self, main_thread=None)Kill this datasource query
Parameters:
- main_thread - (Thread, optional) A reference to the thread that started the query. This is used as a backup for dialects that don't have a supported way to kill a query. An exception will be asynchronously raised in this thread. It is not guaranteed to actually interrupt the query.
DataSourceQueryResult¶
Bases: tlbx.logging_utils.PrintMixin
zillion.report.DataSourceQueryResult
(query, data, duration)The results for a DataSourceQuery
Parameters:
- query - (DataSourceQuery) The DataSourceQuery that was executed
- data - (iterable) The result rows
- duration - (float) The duration of the query execution in seconds
DataSourceQuerySummary¶
Bases: tlbx.logging_utils.PrintMixin
zillion.report.DataSourceQuerySummary
(query, data, duration)A summary of the execution results for a DataSourceQuery
Parameters:
- query - (DataSourceQuery) The DataSourceQuery that was executed
- data - (iterable) The result rows
- duration - (float) The duration of the query execution in seconds
format
(self)Return a formatted summary of the DataSourceQuery results
ExecutionStateMixin¶
zillion.report.ExecutionStateMixin
()A mixin to manage the state of a report or query
Report¶
Bases: zillion.report.ExecutionStateMixin
zillion.report.Report
(warehouse, metrics=None, dimensions=None, criteria=None, row_filters=None, rollup=None, pivot=None, order_by=None, limit=None, limit_first=False, adhoc_datasources=None, allow_partial=False)Build a report against a warehouse. On init DataSource queries are built, but nothing is executed.
Parameters:
- warehouse - (Warehouse) A zillion warehouse object to run the report against
- metrics - (list, optional) A list of metric names, or dicts in the case of AdHocMetrics. These will be the measures of your report, or the statistics you are interested in computing at the given dimension grain.
- dimensions - (list, optional) A list of dimension names to control the grain of the report. You can think of dimensions similarly to the "group by" in a SQL query.
-
criteria - (list, optional) A list of criteria to be applied when querying. Each criteria in the list is represented by a 3-item list or tuple. See
core.CRITERIA_OPERATIONS
for all supported operations. Note that some operations, such as "like", have varying behavior by datasource dialect. Some examples:- ["field_a", ">", 1]
- ["field_b", "=", "2020-04-01"]
- ["field_c", "like", "%example%"]
- ["field_d", "in", ["a", "b", "c"]]
-
row_filters - (list, optional) A list of criteria to apply at the final step (combined query layer) to filter which rows get returned. The format here is the same as for the criteria arg, though the operations are limited to the values of
core.ROW_FILTER_OPERATIONS
. -
rollup - (str or int, optional) Controls how metrics are rolled up / aggregated by dimension depth. If not passed no rollup will be computed. If the special value "totals" is passed, only a final tally rollup row will be added. If an int, then it controls the maximum depth to roll up the data, starting from the most granular (last) dimension of the report. Note that the rollup=3 case is like adding a totals row to the "=2" case, as a totals row is a rollup of all dimension levels. Setting rollup=len(dims) is equivalent to rollup="all". For example, if you ran a report with dimensions ["a", "b", "c"]:
- rollup="totals" - adds a single, final rollup row
- rollup="all" - rolls up all dimension levels
- rollup=1 - rolls up the first dimension only
- rollup=2 - rolls up the first two dimensions
- rollup=3 - rolls up all three dimensions
- Any other non-None value would raise an error
-
pivot - (list, optional) A list of dimensions to pivot to columns
- order_by - (list, optional) A list of (field, asc/desc) tuples that control the ordering of the returned result
- limit - (int, optional) A limit on the number of rows returned
- limit_first - (bool, optional) Whether to apply limits before rollups/ordering
- adhoc_datasources - (list, optional) A list of FieldManagers specific to this report
- allow_partial - (boolean, optional) Allow reports where only some metrics can meet the requested grain.
Notes:
The order_by and limit functionality is only applied on the final/combined result, NOT in your DataSource queries. In most cases when you are dealing with DataSource tables that are of a decent size you will want to make sure to include criteria that limit the scope of your query and/or take advantage of underlying table indexing. If you were to use order_by or limit without any criteria or dimensions, you would effectively select all rows from the underlying datasource table into memory (or at least try to).
delete
(warehouse, spec_id)Delete a saved report spec
Parameters:
- spec_id - (int) The ID of a ReportSpec to delete
execute
(self)Execute the datasource queries, combine the results, and do the final result selection. Save the ReportResult on the result attribute
from_params
(warehouse, params, adhoc_datasources=None)Build a report from a set of report params
Parameters:
- warehouse - (Warehouse) A zillion warehouse object
- params - (dict) A dict of Report params
- adhoc_datasources - (list, optional) A list of FieldManagers
get_dimension_grain
(self)Get the portion of the grain specific to request dimensions
get_grain
(self)Get the grain of this report, which accounts for dimension fields required in the requested dimensions, criteria, and formula-based fields.
get_json
(self)Get a JSON representation of the Report params
get_params
(self)Get a dict of params used to create the Report
kill
(self, soft=False, raise_if_failed=False)Kill a running report
Parameters:
- soft - (bool, optional) If true, set the report state to killed without attempting to kill any running datasource queries.
- raise_if_failed - (bool, optional) If true, raise FailedKillException if any exceptions occurred when trying to kill datasource queries. Otherwise a warning will be emitted.
load
(warehouse, spec_id, adhoc_datasources=None)Load a report from a spec ID
Parameters:
- warehouse - (Warehouse) A zillion warehouse object
- spec_id - (int) A ReportSpec ID
- adhoc_datasources - (list, optional) A list of FieldManagers
load_warehouse_id_for_report
(spec_id)Get the Warehouse ID for a particular report spec
Parameters:
- spec_id - (int) A ReportSpec ID
Returns:
(dict) - A Warehouse ID
save
(self, meta=None)Save the report spec and return the saved spec ID
Parameters:
- meta - (object, optional) A metadata object to be serialized as JSON and stored with the report
Returns:
(int) - The ID of the saved ReportSpec
ReportResult¶
Bases: tlbx.logging_utils.PrintMixin
zillion.report.ReportResult
(df, duration, query_summaries, metrics, dimensions, rollup, unsupported_grain_metrics=None, warnings=None)Encapsulates a report result as well as some additional helpers and summary statistics.
Parameters:
- df - (DataFrame) The DataFrame containing the final report result
- duration - (float) The report execution duration in seconds
- query_summaries - (list of DataSourceQuerySummary) Summaries of the underyling query results.
- metrics - (OrderedDict) A mapping of requested metrics to Metric objects
- dimensions - (OrderedDict) A mapping of requested dimensions to Dimension objects
- rollup - (str or int) See the Report docs for more details.
- unsupported_grain_metrics - (dict) A dictionary mapping metric names to reasons why they could not meet the requested grain of the Report.
SQLiteMemoryCombinedResult¶
Bases: zillion.report.BaseCombinedResult
zillion.report.SQLiteMemoryCombinedResult
(warehouse, ds_query_results, primary_ds_dimensions, adhoc_datasources=None)Combine query results in an in-memory SQLite database
add_warning
(self, msg, log=True)clean_up
(self)Clean up the SQLite combined result table
create_table
(self)Create a table in the SQLite database to store the combined result
get_conn
(self)Get a SQLite memory database connection
get_cursor
(self, conn)Get a SQLite cursor from the connection
get_final_result
(self, metrics, dimensions, row_filters, rollup, pivot, order_by, limit, limit_first)Get the final reseult from the combined result table
Parameters:
- metrics - (OrderedDict) An OrderedDict mapping metric names to Metric objects
- dimensions - (OrderedDict) An OrderedDict mapping dimension names to Dimension objects
- row_filters - (list) A list of criteria to filter which rows get returned
- rollup - (str or int) Controls how metrics are rolled up / aggregated by dimension. See the Report docs for more details.
- pivot - (list) A list of dimensions to pivot to columns
- order_by - (list) A list of (field, asc/desc) tuples that control the ordering of the returned result
- limit - (int) A limit on the number of rows returned
- limit_first - (bool, optional) Whether to apply limits before rollups/ordering
Returns:
(DataFrame) - A DataFrame with the final report result
Notes:
The default ordering of operations is meant to roughly parallel that of MySQL's rollup, having, order by and limit behavior. The operations are applied in the following order: technicals, rollups, rounding, order_by, row_filters, limit, pivot. If you set limit_first=True
the the row_filter and limit operations are moved ahead of the rollups: technicals, row_filters, limit, rollups, rounding, order_by, pivot.
load_table
(self)Load the combined result table