Relational algebra¶

The BlazingSQL engine executes a relational algebra plan. This plan is initially created by Apache Calcite, which in turn receives a SQL query. The initial relational algebra is converted into an physical plan, which is effectively a modified version of the original relational algebra plan, wherein some of the relational algebra steps is expanded into multiple steps.

SQL¶

select o_custkey, sum(o_totalprice) from orders where o_orderkey < 10 group by o_custkey

Relational Algebra¶

LogicalProject(o_custkey=[$0], EXPR$1=[CASE(=($2, 0), null:DOUBLE, $1)])
  LogicalAggregate(group=[{0}], EXPR$1=[$SUM0($1)], agg#1=[COUNT($1)])
    LogicalProject(o_custkey=[$1], o_totalprice=[$2])
      BindableTableScan(table=[[main, orders]], filters=[[<($0, 10)]], projects=[[0, 1, 3]], aliases=[[$f0, o_custkey, o_totalprice]])

Physical Plan Single GPU¶

LogicalProject(o_custkey=[$0], EXPR$1=[CASE(=($2, 0), null:DOUBLE, $1)])
  MergeAggregate(group=[{0}], EXPR$1=[$SUM0($1)], agg#1=[COUNT($1)])
    ComputeAggregate(group=[{0}], EXPR$1=[$SUM0($1)], agg#1=[COUNT($1)])
      LogicalProject(o_custkey=[$1], o_totalprice=[$2])
        BindableTableScan(table=[[main, orders]], filters=[[<($0, 10)]], projects=[[0, 1, 3]], aliases=[[$f0, o_custkey, o_totalprice]])

Physical Plan Multi GPU¶

LogicalProject(o_custkey=[$0], EXPR$1=[CASE(=($2, 0), null:DOUBLE, $1)])
  MergeAggregate(group=[{0}], EXPR$1=[$SUM0($1)], agg#1=[COUNT($1)])
    DistributeAggregate(group=[{0}], EXPR$1=[$SUM0($1)], agg#1=[COUNT($1)])
      ComputeAggregate(group=[{0}], EXPR$1=[$SUM0($1)], agg#1=[COUNT($1)])
        LogicalProject(o_custkey=[$1], o_totalprice=[$2])
          BindableTableScan(table=[[main, orders]], filters=[[<($0, 10)]], projects=[[0, 1, 3]], aliases=[[$f0, o_custkey, o_totalprice]])

The conversion of the relational algebra gets done by the function transform_json_tree. This function gets called by build_batch_graph.

This new relational algebra plan is converted into a graph and each node in the graph becomes an execution kernel, while each edge becomes a CacheMachine.

The graph is created by ral::batch::tree_processor that has a function called build_batch_graph. This produces the actual graph object, which is what contains all the execution kernels and CacheMachines. The graph has a function called execute() which is what actually starts the run() function of every execution kernel, each on its own thread.

Python User Interface BlazingSQL Execution Graph

BlazingSQL v0.19 documentation

Relational algebra¶

SQL¶

Relational Algebra¶

Physical Plan Single GPU¶

Physical Plan Multi GPU¶