Description
This issue is meant to track the work going on on the Gen4 planner.
The Gen4 planner is a new planner in Vitess that explores many different join alternatives and uses a little bit of cost analysis to pick the cheapest plan. In contrast, the V3 planner merges and join tables from left to right, and this made it important for the user to list tables in a good order so that the planner could produce an efficient route.
The gen4 algorithm that we will start implementing is based on the GOO paper (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.737), but the infrastructure for it can be reused for other models.
This is a larger rewrite of the vtgate planner. It introduces new passes and intermediate representations of the query.
The old code used these passes over the query:
Pass | Struct transformation |
---|---|
Parsing | String -> AST |
Rewriting (normalization) | AST -> AST |
Planning | AST -> logicalPlan (builder) |
WireUp | logicalPlan -> engine.Primitive |
This refactored planner now uses the following passes:
Pass | Struct transformation |
---|---|
Parsing | String -> AST |
Rewriting (normalization) | AST -> AST |
Semantic Analysis | AST -> AST" |
Build Operator Tree | AST" -> OperatorTree |
Optimize/Merge | OperatorTree -> joinTree |
Horizon Planning | joinTree -> logicalPlan |
WireUp | logicalPlan -> engine.Primitive |
By splitting the planning process into smaller pieces, each part can be simplified and extended to do more.
Here follows a short description of each new pass.
Semantic Analysis
Responsibilities: Scoping, Binding
Walks the AST and does scoping and binding, so whenever a column name is found, the planner has information about which tables is being referenced. Tables are given a TableSet
identifier - a bitmask struct that allows the planner to quickly find what dependencies every expression has.
Extract Query Graph
Responsibilities: Extract Subqueries, Create Query Graph
The query graph is an intermediate representation that is designed to allow the route planner to quickly consider many different solutions for the query. Instead of keeping the query in the AST, which is limited by the tree structure it has, we produce a graphy representation with all used tables (nodes) in one list, and edges between them in a separate list.
In this pass, subqueries are extracted into a list of queries and the relationships between them. This makes it easier for later passes to plan fully without having to switch back and forth between passes - when doing route planning, we can do all of route planning in one go and don't have to wait for SELECT expressions to be considered before planning subqueries used in SELECT expressions.
Route planning
Responsibilities: Plan how to route the query - plan FROM and WHERE
This pass uses dynamic programming to consider all combinations of tables in order to find the optimal plan. Optimal here means minimal number of route primitives in the plan.
At the end of this stage, we have a tree structure that represents all the route primitives needed and how they should be joined.
Horizon planning
Responsibilities: Plan projections, aggregations, grouping and ordering
Once we have a plan for how to route queries, we plan what projections we need from each route, and how to do ORDER BY/GROUP BY/LIMIT
et al.
Positive outcomes from this refactoring.
Why do this non-trivial piece of work?
We still have a number of query types that are not supported. In order to be able to support more queries, we needed to extend the planner. Instead of adding to the legacy planner which is not very easy to work with, we felt that it was time to introduce this new design, which not only will allow us to support these queries, it also sets us up to be able to do more optimisations in the future.
Known tasks:
- Solve inner joins, no subqueries or derived tables, no aggregation, limit, order by or any fancy
SELECT
expressions Planner refactoring #7103 - Make it possible to evaluate
A join B
vsB join A
Gen4 Planner: AxB vs BxA #7274 - Subqueries Gen4: Handling subquery in query graph #7313
- Derived tables
- Horizon planing - Grouping, Aggregation, Order BY, SELECT expressions [Gen4] Initial Horizon Planning #7775 Gen4: Support for order by column not present in projection #8070
- Limit Gen4: Add Limit clause support #7312
- information_schema queries
- Query hints
- Generate
Clone()
methods for AST structs Update AST helper generation #7558 - Table aliases [Gen4] Implemented table alias functionality in the semantic analysis #7629
- Plan SelectIN queries gen4: plan more opcodes #8254
- Use fallback planner Gen4 fallback planning #7370
- Outer Join planning gen4: outer joins #8312
Activity