版别	日期	补白
1.0	2024.2.18	文章首发

本文的的源码剖析全部依据TiDB6.5来做剖析。

1.引子

假如让你做一个散布式数据库的优化器，面对以下的SQL，你会想到什么好的办法去履行他们呢？

SELECT id, name FROM person WHERE age >= 18 or height > 180 limit 100;：从条件上看，咱们看到条件其实是二选一的： age >= 18 or height > 180。依据这种状况，咱们肯定会去挑选有索引的数据，假如都有索引or都没有，那么肯定挑选扫描行数最少的数据。假如有一些算子在里边的话，则额外需求考虑数据的就近准则——有些算子在部分架构下可以充分利用MPP的才能，而有些则不行。
SELECT orders.order_id, customers.customer_name, orders.order_date FROM orders LEFT JOIN customers ON orders.customer_id=customers.customer_id;散布式数据库中的join，最优的办法便是小表广播到大表数据所在的当地。那么首要咱们得知道谁是小表，谁是大表。

2.核算信息搜集

依据前面的两个比如，咱们可以发现——假如咱们需求找到SQL对应的最佳方案，咱们会需求一些表的元数据，或许说是核算信息。从常规的角度来说，以下核算信息是必须搜集的：

表的总行数
每列数据的平均巨细
每列数据的基数：即NDV（distinct value）
列的NULL值个数

假如是事务型的（行式存储），那么还要考虑索引平均长度、值的散布等等。

假如是剖析型的（列式存储），那么还需求考虑相关列的最大值、最小值等等。

而核算办法的搜集，会有多种办法。首要是考虑资源和精确性之间的Trade off。常见的有：

TopN：相关数据呈现次数前 n 的值。
直方图：用于描绘数据散布图的工具。依照数据的值巨细进行分桶，并用一些简略的数据来描绘每个桶，比方落在桶里的值的个数。
2D直方图：因为直方图无法反应列之间的关联，可以用2D直方图（联合散布）做到，但空间占用也比较多。
Count-Min Sketch：相似哈希表加上核算器的完成。可以用很少的数据来描绘全体数据的特性。
HyperLogLog：一种估算海量数据基数的办法。许多状况下，核算并不需求那么精确。工程方面要在使用资源和精确性里找平衡。因此有人提出用HLL，这是一种占用资源少，但能给出较为精确的近似结果的算法。

TiDB搜集的核算信息见：docs.pingcap.com/zh/tidb/v6.…

3.价值的评价

一个SQL真实的物理履行方案或许是有多个的。在以核算信息为基础之上，咱们还需求参加相应的权重，举个比如：

假如可以在内存中核算完结，就不用去反复建议本地IO
假如能在本地IO中完结，就不要去建议网络请求

等等…

这在TiDB的代码中也有对应的默许值。

DefOptCPUFactor                                = 3.0
DefOptCopCPUFactor                             = 3.0
DefOptTiFlashConcurrencyFactor                 = 24.0
DefOptNetworkFactor                            = 1.0
DefOptScanFactor                               = 1.5
DefOptDescScanFactor                           = 3.0
DefOptSeekFactor                               = 20.0
DefOptMemoryFactor                             = 0.001
DefOptDiskFactor                               = 1.5
DefOptConcurrencyFactor                        = 3.0

var defaultVer2Factors = costVer2Factors{
	TiDBTemp:      costVer2Factor{"tidb_temp_table_factor", 0.00},
	TiKVScan:      costVer2Factor{"tikv_scan_factor", 40.70},
	TiKVDescScan:  costVer2Factor{"tikv_desc_scan_factor", 61.05},
	TiFlashScan:   costVer2Factor{"tiflash_scan_factor", 11.60},
	TiDBCPU:       costVer2Factor{"tidb_cpu_factor", 49.90},
	TiKVCPU:       costVer2Factor{"tikv_cpu_factor", 49.90},
	TiFlashCPU:    costVer2Factor{"tiflash_cpu_factor", 2.40},
	TiDB2KVNet:    costVer2Factor{"tidb_kv_net_factor", 3.96},
	TiDB2FlashNet: costVer2Factor{"tidb_flash_net_factor", 2.20},
	TiFlashMPPNet: costVer2Factor{"tiflash_mpp_net_factor", 1.00},
	TiDBMem:       costVer2Factor{"tidb_mem_factor", 0.20},
	TiKVMem:       costVer2Factor{"tikv_mem_factor", 0.20},
	TiFlashMem:    costVer2Factor{"tiflash_mem_factor", 0.05},
	TiDBDisk:      costVer2Factor{"tidb_disk_factor", 200.00},
	TiDBRequest:   costVer2Factor{"tidb_request_factor", 6000000.00},
}

4.履行方案枚举与择优

当咱们可以评价出物理履行方案的价值时，将会枚举一切可以用上物理履行方案，并在其间挑选价值最小的物理履行方案。一般枚举分为两个门户：

自底向上：代表有System R。
自顶向下：代表有Cascade。

自底向上没法处理一个问题。当上层对下层的输出结果次序感兴趣时，那就不能只能从底层的视角来寻找部分最优。

举个比如，多表Join。一开始两个表Join或许HashJoin价值很低，可是Join下一个表时，用MergeJoin从全体来看价值更低。从这个case来看，自底向上来做方案取优并不适宜。

所以有了Cascade：

查找方案是自顶向下的。这意味着它可以防止部分最优而导致大局不优。从Operator Tree 自顶向下遍历时，可以做一系列改换：

- Implementation：逻辑算子可以转换成物理算子；例如Join转换成NestLoop或许HashJoin等
- Exploration：逻辑算子可以做等价改换；例如交流Inner Join的两个子节点，即可枚举Join次序

图片来自于：Cascades Optimizer——zhuanlan.zhihu.com/p/73545345

它是依据Volcano模型演进而来的。
用面向对象的办法进行建模，编写规矩时不需求关怀查找进程。相比传统优化器中面向进程去一条条的编写，的确是很大的改进。

5.TiDB的完成

大致的代码调用链为：

-- session/session.go
-- ExecuteStmt //SQL履行的进口
|-- executor/compiler.go
-- Compile //将SQL变成可履行的方案
|--planner/planner/optmize.go
-- Optimize //优化的进口
-- optimize //这儿有两个进口。一种是新的优化器进口，一种是老的优化器进口。依据装备来挑选。最终都会回来最优的物理履行方案。
    |-- planner/cascades/optmize.go
        --FindBestPlan 见5.1
        -- onPhasePreprocessing //见5.3
        -- implGroup
        |--planner/core/optmizer.go //见5.4
            -- DoOptimize
            -- physicalOptimize
            |--planner/core/find_best_task.go
            -- findBestTask
            -- enumeratePhysicalPlans4Task
            -- compareTaskCost
            -- getTaskPlanCost
            |-- planner/core/plan_cost_ver2.go
            -- getPlanCost

5.1 逻辑优化

中心进口为：


// FindBestPlan is the optimization entrance of the cascades planner. The
// optimization is composed of 3 phases: preprocessing, exploration and implementation.
//
// ------------------------------------------------------------------------------
// Phase 1: Preprocessing
// ------------------------------------------------------------------------------
//
// The target of this phase is to preprocess the plan tree by some heuristic
// rules which should always be beneficial, for example Column Pruning.
//
// ------------------------------------------------------------------------------
// Phase 2: Exploration
// ------------------------------------------------------------------------------
//
// The target of this phase is to explore all the logically equivalent
// expressions by exploring all the equivalent group expressions of each group.
//
// At the very beginning, there is only one group expression in a Group. After
// applying some transformation rules on certain expressions of the Group, all
// the equivalent expressions are found and stored in the Group. This procedure
// can be regarded as searching for a weak connected component in a directed
// graph, where nodes are expressions and directed edges are the transformation
// rules.
//
// ------------------------------------------------------------------------------
// Phase 3: Implementation
// ------------------------------------------------------------------------------
//
// The target of this phase is to search the best physical plan for a Group
// which satisfies a certain required physical property.
//
// In this phase, we need to enumerate all the applicable implementation rules
// for each expression in each group under the required physical property. A
// memo structure is used for a group to reduce the repeated search on the same
// required physical property.
func (opt *Optimizer) FindBestPlan(sctx sessionctx.Context, logical plannercore.LogicalPlan) (p plannercore.PhysicalPlan, cost float64, err error) {
	logical, err = opt.onPhasePreprocessing(sctx, logical)
	if err != nil {
		return nil, 0, err
	}
	rootGroup := memo.Convert2Group(logical)
	err = opt.onPhaseExploration(sctx, rootGroup)
	if err != nil {
		return nil, 0, err
	}
	p, cost, err = opt.onPhaseImplementation(sctx, rootGroup)
	if err != nil {
		return nil, 0, err
	}
	err = p.ResolveIndices()
	return p, cost, err
}

注释+代码很洁净，这儿总共做三件事

onPhasePreprocessing：注释很真实，说for example Column Pruning，结果里边真的只做了列裁剪。
onPhaseExploration：探究一切逻辑等价存在的或许
onPhaseImplementation：依据价值寻找最优的物理履行方案

这块官网的博客写的十分好，我就不重复了：cn.pingcap.com/blog/tidb-c…

5.2 核算信息搜集

这块代码首要会集在stats.go里，里边的中心数据结构是stats_info.go里的StatsInfo。调用链大致为：

|-- planner/cascades/optimizer.go
--fillGroupStats
|-- planner/core/stats.go
--DeriveStats

type LogicalPlan interface {
	Plan
    //......疏忽一些代码
	// DeriveStats derives statistic info for current plan node given child stats.
	// We need selfSchema, childSchema here because it makes this method can be used in
	// cascades planner, where LogicalPlan might not record its children or schema.
	DeriveStats(childStats []*property.StatsInfo, selfSchema *expression.Schema, childSchema []*expression.Schema, colGroups [][]*expression.Column) (*property.StatsInfo, error)
    //......疏忽一些代码
}

有许多结构体完成了这个办法：

搜集核算信息首要是analyze.go#Next办法中调用的#analyzeWorker。

5.3 新版别物理履行方案的挑选

代码进口为：

// implGroup finds the best Implementation which satisfies the required
// physical property for a Group. The best Implementation should have the
// lowest cost among all the applicable Implementations.
//
// g:			the Group to be implemented.
// reqPhysProp: the required physical property.
// costLimit:   the maximum cost of all the Implementations.
func (opt *Optimizer) implGroup(g *memo.Group, reqPhysProp *property.PhysicalProperty, costLimit float64) (memo.Implementation, error) {
	groupImpl := g.GetImpl(reqPhysProp)
	if groupImpl != nil {
		if groupImpl.GetCost() <= costLimit {
			return groupImpl, nil
		}
		return nil, nil
	}
	// Handle implementation rules for each equivalent GroupExpr.
	var childImpls []memo.Implementation
	err := opt.fillGroupStats(g)
	if err != nil {
		return nil, err
	}
	outCount := math.Min(g.Prop.Stats.RowCount, reqPhysProp.ExpectedCnt)
	for elem := g.Equivalents.Front(); elem != nil; elem = elem.Next() {
		curExpr := elem.Value.(*memo.GroupExpr)
		impls, err := opt.implGroupExpr(curExpr, reqPhysProp)
		if err != nil {
			return nil, err
		}
		for _, impl := range impls {
			childImpls = childImpls[:0]
			for i, childGroup := range curExpr.Children {
				childImpl, err := opt.implGroup(childGroup, impl.GetPlan().GetChildReqProps(i), impl.GetCostLimit(costLimit, childImpls...))
				if err != nil {
					return nil, err
				}
				if childImpl == nil {
					impl.SetCost(math.MaxFloat64)
					break
				}
				childImpls = append(childImpls, childImpl)
			}
			if impl.GetCost() == math.MaxFloat64 {
				continue
			}
			implCost := impl.CalcCost(outCount, childImpls...)
			if implCost > costLimit {
				continue
			}
			if groupImpl == nil || groupImpl.GetCost() > implCost {
				groupImpl = impl.AttachChildren(childImpls...)
				costLimit = implCost
			}
		}
	}
	// Handle enforcer rules for required physical property.
	for _, rule := range GetEnforcerRules(g, reqPhysProp) {
		newReqPhysProp := rule.NewProperty(reqPhysProp)
		enforceCost := rule.GetEnforceCost(g)
		childImpl, err := opt.implGroup(g, newReqPhysProp, costLimit-enforceCost)
		if err != nil {
			return nil, err
		}
		if childImpl == nil {
			continue
		}
		impl := rule.OnEnforce(reqPhysProp, childImpl)
		implCost := enforceCost + childImpl.GetCost()
		impl.SetCost(implCost)
		if groupImpl == nil || groupImpl.GetCost() > implCost {
			groupImpl = impl
			costLimit = implCost
		}
	}
	if groupImpl == nil || groupImpl.GetCost() == math.MaxFloat64 {
		return nil, nil
	}
	g.InsertImpl(reqPhysProp, groupImpl)
	return groupImpl, nil
}

这儿个函数会找出潜在符合条件的物理履行方案，并不断的查找。但它有一个剪枝逻辑——会记录当前最小的cost，假如一个履行方案自上向下查找时，假如超过了这个cost，则直接回来。这便是在第3节提到的自顶向下的优化。

接下来咱们看一下memo.Implementation的定义：


package memo
import (
	plannercore "github.com/pingcap/tidb/planner/core"
)
// Implementation defines the interface for cost of physical plan.
type Implementation interface {
	CalcCost(outCount float64, children ...Implementation) float64
	SetCost(cost float64)
	GetCost() float64
	GetPlan() plannercore.PhysicalPlan
	// AttachChildren is used to attach children implementations and returns it self.
	AttachChildren(children ...Implementation) Implementation
	// GetCostLimit gets the costLimit for implementing the next childGroup.
	GetCostLimit(costLimit float64, children ...Implementation) float64
}

其间CalcCost办法便是用来核算物理履行方案的价值。总共有这么多结构体完成了它：

5.3.1 价值的评价

咱们以最初的比如，讲解价值的评价。

select价值核算办法

// CalcCost calculates the cost of the table scan Implementation.
func (impl *TableScanImpl) CalcCost(outCount float64, _ ...memo.Implementation) float64 {
	ts := impl.plan.(*plannercore.PhysicalTableScan)
	width := impl.tblColHists.GetTableAvgRowSize(impl.plan.SCtx(), impl.tblCols, kv.TiKV, true)
	sessVars := ts.SCtx().GetSessionVars()
	impl.cost = outCount * sessVars.GetScanFactor(ts.Table) * width
	if ts.Desc {
		impl.cost = outCount * sessVars.GetDescScanFactor(ts.Table) * width
	}
	return impl.cost
}
// GetScanFactor returns the session variable scanFactor
// returns 0 when tbl is a temporary table.
func (s *SessionVars) GetScanFactor(tbl *model.TableInfo) float64 {
	if tbl != nil {
		if tbl.TempTableType != model.TempTableNone {
			return 0
		}
	}
	return s.scanFactor 
}
// CalcCost implements Implementation interface.
func (impl *IndexScanImpl) CalcCost(outCount float64, _ ...memo.Implementation) float64 {
	is := impl.plan.(*plannercore.PhysicalIndexScan)
	sessVars := is.SCtx().GetSessionVars()
	rowSize := impl.tblColHists.GetIndexAvgRowSize(is.SCtx(), is.Schema().Columns, is.Index.Unique)
	cost := outCount * rowSize * sessVars.GetScanFactor(is.Table)
	if is.Desc {
		cost = outCount * rowSize * sessVars.GetDescScanFactor(is.Table)
	}
	cost += float64(len(is.Ranges)) * sessVars.GetSeekFactor(is.Table)
	impl.cost = cost
	return impl.cost
}

这儿咱们以全表扫描表和射中索引的代码为比如。其间默许的scanFactor是1.5。这儿可以看到indexScan和tableScan少了一个因数：width。这个变量代表了所需列的平均巨细。这么看基本上是indexScan最优了。

这儿的代码笔者认为有点不高雅——当Desc时，其实之前的Cost是没必要算一遍的，糟蹋CPU资源。

join价值核算办法

// CalcCost implements Implementation CalcCost interface.
func (impl *HashJoinImpl) CalcCost(_ float64, children ...memo.Implementation) float64 {
	hashJoin := impl.plan.(*plannercore.PhysicalHashJoin)
	// The children here are only used to calculate the cost.
	hashJoin.SetChildren(children[0].GetPlan(), children[1].GetPlan())
	selfCost := hashJoin.GetCost(children[0].GetPlan().StatsCount(), children[1].GetPlan().StatsCount(), false, 0, nil)
	impl.cost = selfCost + children[0].GetCost() + children[1].GetCost()
	return impl.cost
}
// CalcCost implements Implementation CalcCost interface.
func (impl *MergeJoinImpl) CalcCost(_ float64, children ...memo.Implementation) float64 {
	mergeJoin := impl.plan.(*plannercore.PhysicalMergeJoin)
	// The children here are only used to calculate the cost.
	mergeJoin.SetChildren(children[0].GetPlan(), children[1].GetPlan())
	selfCost := mergeJoin.GetCost(children[0].GetPlan().StatsCount(), children[1].GetPlan().StatsCount(), 0)
	impl.cost = selfCost + children[0].GetCost() + children[1].GetCost()
	return impl.cost
}

具体的核算都在plan_cost_v1.go里：


// GetCost computes cost of hash join operator itself.
func (p *PhysicalHashJoin) GetCost(lCnt, rCnt float64, isMPP bool, costFlag uint64, op *physicalOptimizeOp) float64 {
	buildCnt, probeCnt := lCnt, rCnt
	build := p.children[0]
	// Taking the right as the inner for right join or using the outer to build a hash table.
	if (p.InnerChildIdx == 1 && !p.UseOuterToBuild) || (p.InnerChildIdx == 0 && p.UseOuterToBuild) {
		buildCnt, probeCnt = rCnt, lCnt
		build = p.children[1]
	}
	sessVars := p.ctx.GetSessionVars()
	oomUseTmpStorage := variable.EnableTmpStorageOnOOM.Load()
	memQuota := sessVars.MemTracker.GetBytesLimit() // sessVars.MemQuotaQuery && hint
	rowSize := getAvgRowSize(build.statsInfo(), build.Schema().Columns)
	spill := oomUseTmpStorage && memQuota > 0 && rowSize*buildCnt > float64(memQuota) && p.storeTp != kv.TiFlash
	// Cost of building hash table.
	cpuFactor := sessVars.GetCPUFactor()
	diskFactor := sessVars.GetDiskFactor()
	memoryFactor := sessVars.GetMemoryFactor()
	concurrencyFactor := sessVars.GetConcurrencyFactor()
	cpuCost := buildCnt * cpuFactor
	memoryCost := buildCnt * memoryFactor
	diskCost := buildCnt * diskFactor * rowSize
	// Number of matched row pairs regarding the equal join conditions.
	helper := &fullJoinRowCountHelper{
		sctx:            p.SCtx(),
		cartesian:       false,
		leftProfile:     p.children[0].statsInfo(),
		rightProfile:    p.children[1].statsInfo(),
		leftJoinKeys:    p.LeftJoinKeys,
		rightJoinKeys:   p.RightJoinKeys,
		leftSchema:      p.children[0].Schema(),
		rightSchema:     p.children[1].Schema(),
		leftNAJoinKeys:  p.LeftNAJoinKeys,
		rightNAJoinKeys: p.RightNAJoinKeys,
	}
	numPairs := helper.estimate()
	// For semi-join class, if `OtherConditions` is empty, we already know
	// the join results after querying hash table, otherwise, we have to
	// evaluate those resulted row pairs after querying hash table; if we
	// find one pair satisfying the `OtherConditions`, we then know the
	// join result for this given outer row, otherwise we have to iterate
	// to the end of those pairs; since we have no idea about when we can
	// terminate the iteration, we assume that we need to iterate half of
	// those pairs in average.
	if p.JoinType == SemiJoin || p.JoinType == AntiSemiJoin ||
		p.JoinType == LeftOuterSemiJoin || p.JoinType == AntiLeftOuterSemiJoin {
		if len(p.OtherConditions) > 0 {
			numPairs *= 0.5
		} else {
			numPairs = 0
		}
	}
	if hasCostFlag(costFlag, CostFlagUseTrueCardinality) {
		numPairs = getOperatorActRows(p)
	}
	// Cost of querying hash table is cheap actually, so we just compute the cost of
	// evaluating `OtherConditions` and joining row pairs.
	probeCost := numPairs * cpuFactor
	probeDiskCost := numPairs * diskFactor * rowSize
	// Cost of evaluating outer filter.
	if len(p.LeftConditions)+len(p.RightConditions) > 0 {
		// Input outer count for the above compution should be adjusted by SelectionFactor.
		probeCost *= SelectionFactor
		probeDiskCost *= SelectionFactor
		probeCost += probeCnt * cpuFactor
	}
	diskCost += probeDiskCost
	probeCost /= float64(p.Concurrency)
	// Cost of additional concurrent goroutines.
	cpuCost += probeCost + float64(p.Concurrency+1)*concurrencyFactor
	// Cost of traveling the hash table to resolve missing matched cases when building the hash table from the outer table
	if p.UseOuterToBuild {
		if spill {
			// It runs in sequence when build data is on disk. See handleUnmatchedRowsFromHashTableInDisk
			cpuCost += buildCnt * cpuFactor
		} else {
			cpuCost += buildCnt * cpuFactor / float64(p.Concurrency)
		}
		diskCost += buildCnt * diskFactor * rowSize
	}
	if spill {
		memoryCost *= float64(memQuota) / (rowSize * buildCnt)
	} else {
		diskCost = 0
	}
	if op != nil {
		setPhysicalHashJoinCostDetail(p, op, spill, buildCnt, probeCnt, cpuFactor, rowSize, numPairs,
			cpuCost, probeCost, memoryCost, diskCost, probeDiskCost,
			diskFactor, memoryFactor, concurrencyFactor,
			memQuota)
	}
	return cpuCost + memoryCost + diskCost
}
// GetCost computes cost of merge join operator itself.
func (p *PhysicalMergeJoin) GetCost(lCnt, rCnt float64, costFlag uint64) float64 {
	outerCnt := lCnt
	innerCnt := rCnt
	innerKeys := p.RightJoinKeys
	innerSchema := p.children[1].Schema()
	innerStats := p.children[1].statsInfo()
	if p.JoinType == RightOuterJoin {
		outerCnt = rCnt
		innerCnt = lCnt
		innerKeys = p.LeftJoinKeys
		innerSchema = p.children[0].Schema()
		innerStats = p.children[0].statsInfo()
	}
	helper := &fullJoinRowCountHelper{
		sctx:          p.SCtx(),
		cartesian:     false,
		leftProfile:   p.children[0].statsInfo(),
		rightProfile:  p.children[1].statsInfo(),
		leftJoinKeys:  p.LeftJoinKeys,
		rightJoinKeys: p.RightJoinKeys,
		leftSchema:    p.children[0].Schema(),
		rightSchema:   p.children[1].Schema(),
	}
	numPairs := helper.estimate()
	if p.JoinType == SemiJoin || p.JoinType == AntiSemiJoin ||
		p.JoinType == LeftOuterSemiJoin || p.JoinType == AntiLeftOuterSemiJoin {
		if len(p.OtherConditions) > 0 {
			numPairs *= 0.5
		} else {
			numPairs = 0
		}
	}
	if hasCostFlag(costFlag, CostFlagUseTrueCardinality) {
		numPairs = getOperatorActRows(p)
	}
	sessVars := p.ctx.GetSessionVars()
	probeCost := numPairs * sessVars.GetCPUFactor()
	// Cost of evaluating outer filters.
	var cpuCost float64
	if len(p.LeftConditions)+len(p.RightConditions) > 0 {
		probeCost *= SelectionFactor
		cpuCost += outerCnt * sessVars.GetCPUFactor()
	}
	cpuCost += probeCost
	// For merge join, only one group of rows with same join key(not null) are cached,
	// we compute average memory cost using estimated group size.
	NDV, _ := getColsNDVWithMatchedLen(innerKeys, innerSchema, innerStats)
	memoryCost := (innerCnt / NDV) * sessVars.GetMemoryFactor()
	return cpuCost + memoryCost
}

HashJoin要考虑到内存不够的状况，因此在核算到数据不够时，会将对应的数据压入硬盘。但它对数据的次序并无要求，因此可以并发的去做。见：

	// Cost of traveling the hash table to resolve missing matched cases when building the hash table from the outer table
	if p.UseOuterToBuild {
		if spill {
			// It runs in sequence when build data is on disk. See handleUnmatchedRowsFromHashTableInDisk
			cpuCost += buildCnt * cpuFactor
		} else {
			cpuCost += buildCnt * cpuFactor / float64(p.Concurrency)
		}
		diskCost += buildCnt * diskFactor * rowSize
	}

而MergeJoin的价值显然会更小，但可以选则到这个方案也有较高的要求：当两个关联表要 Join 的字段需求按排好的次序读取时，适用 Merge Join 算法。

5.4 老版别物理履行方案的挑选

5.4.1 价值的评价

这块代码首要是在plan_cost_ver1.go和plan_cost_ver2.go。v2对价值公式进行了更精确的回归校准，调整了部分价值公式，比此前版别的价值公式愈加精确。代码上也更为简练：v2只暴露出了一个公共办法，内部通过不同的类型做转发。

// GetPlanCost returns the cost of this plan.
func GetPlanCost(p PhysicalPlan, taskType property.TaskType, option *PlanCostOption) (float64, error) {
	return getPlanCost(p, taskType, option)
}
func getPlanCost(p PhysicalPlan, taskType property.TaskType, option *PlanCostOption) (float64, error) {
	if p.SCtx().GetSessionVars().CostModelVersion == modelVer2 {
		planCost, err := p.getPlanCostVer2(taskType, option)
		return planCost.cost, err
	}
	return p.getPlanCostVer1(taskType, option)
}

依据不同的PhysicalPlan类型，会找到不同绑定办法：

v1的部分办法展示：

select价值核算办法

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
// plan-cost = child-cost + filter-cost
func (p *PhysicalSelection) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {
		return p.planCostVer2, nil
	}
	inputRows := getCardinality(p.children[0], option.CostFlag)
	cpuFactor := getTaskCPUFactorVer2(p, taskType)
	filterCost := filterCostVer2(option, inputRows, p.Conditions, cpuFactor)
	childCost, err := p.children[0].getPlanCostVer2(taskType, option)
	if err != nil {
		return zeroCostVer2, err
	}
	p.planCostVer2 = sumCostVer2(filterCost, childCost)
	p.planCostInit = true
	return p.planCostVer2, nil
}

这部分代码简略易读。价值便是子查询的价值+筛选的价值。

那么问题来了，中索引的和不中索引的价值应该是不一样的。这儿没有体现出来啊。仔细看childCost, err := p.children[0].getPlanCostVer2(taskType, option)，这儿是会去获取子物理履行方案的价值。

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
func (p *PointGetPlan) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {
		return p.planCostVer2, nil
	}
	if p.accessCols == nil { // from fast plan code path
		p.planCostVer2 = zeroCostVer2
		p.planCostInit = true
		return zeroCostVer2, nil
	}
	rowSize := getAvgRowSize(p.stats, p.schema.Columns)
	netFactor := getTaskNetFactorVer2(p, taskType)
	p.planCostVer2 = netCostVer2(option, 1, rowSize, netFactor)
	p.planCostInit = true
	return p.planCostVer2, nil
}
func netCostVer2(option *PlanCostOption, rows, rowSize float64, netFactor costVer2Factor) costVer2 {
	return newCostVer2(option, netFactor,
		rows*rowSize*netFactor.Value,
		func() string { return fmt.Sprintf("net(%v*rowsize(%v)*%v)", rows, rowSize, netFactor) })
}
// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
// plan-cost = rows * log2(row-size) * scan-factor
// log2(row-size) is from experiments.
func (p *PhysicalTableScan) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {
		return p.planCostVer2, nil
	}
	rows := getCardinality(p, option.CostFlag)
	var rowSize float64
	if p.StoreType == kv.TiKV {
		rowSize = getAvgRowSize(p.stats, p.tblCols) // consider all columns if TiKV
	} else { // TiFlash
		rowSize = getAvgRowSize(p.stats, p.schema.Columns)
	}
	rowSize = math.Max(rowSize, 2.0)
	scanFactor := getTaskScanFactorVer2(p, p.StoreType, taskType)
	p.planCostVer2 = scanCostVer2(option, rows, rowSize, scanFactor)
	// give TiFlash a start-up cost to let the optimizer prefers to use TiKV to process small table scans.
	if p.StoreType == kv.TiFlash {
		p.planCostVer2 = sumCostVer2(p.planCostVer2, scanCostVer2(option, 10000, rowSize, scanFactor))
	}
	p.planCostInit = true
	return p.planCostVer2, nil
}
func scanCostVer2(option *PlanCostOption, rows, rowSize float64, scanFactor costVer2Factor) costVer2 {
	if rowSize < 1 {
		rowSize = 1
	}
	return newCostVer2(option, scanFactor,
		// rows * log(row-size) * scanFactor, log2 from experiments
		rows*math.Log2(rowSize)*scanFactor.Value,
		func() string { return fmt.Sprintf("scan(%v*logrowsize(%v)*%v)", rows, rowSize, scanFactor) })
}

scanFactor的价值默许是40.7，netFactor的价值默许是3.96。结合代码来看，射中索引的价值更优。

join价值核算办法

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
// plan-cost = build-child-cost + build-filter-cost +
// (probe-cost + probe-filter-cost) / concurrency
// probe-cost = probe-child-cost * build-rows / batchRatio
func (p *PhysicalIndexJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	return p.getIndexJoinCostVer2(taskType, option, 0)
}
func (p *PhysicalIndexHashJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	return p.getIndexJoinCostVer2(taskType, option, 1)
}
func (p *PhysicalIndexMergeJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	return p.getIndexJoinCostVer2(taskType, option, 2)
}
func (p *PhysicalIndexJoin) getIndexJoinCostVer2(taskType property.TaskType, option *PlanCostOption, indexJoinType int) (costVer2, error) {
	if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {
		return p.planCostVer2, nil
	}
	build, probe := p.children[1-p.InnerChildIdx], p.children[p.InnerChildIdx]
	buildRows := getCardinality(build, option.CostFlag)
	buildRowSize := getAvgRowSize(build.Stats(), build.Schema().Columns)
	probeRowsOne := getCardinality(probe, option.CostFlag)
	probeRowsTot := probeRowsOne * buildRows
	probeRowSize := getAvgRowSize(probe.Stats(), probe.Schema().Columns)
	buildFilters, probeFilters := p.LeftConditions, p.RightConditions
	probeConcurrency := float64(p.ctx.GetSessionVars().IndexLookupJoinConcurrency())
	cpuFactor := getTaskCPUFactorVer2(p, taskType)
	memFactor := getTaskMemFactorVer2(p, taskType)
	requestFactor := getTaskRequestFactorVer2(p, taskType)
	buildFilterCost := filterCostVer2(option, buildRows, buildFilters, cpuFactor)
	buildChildCost, err := build.getPlanCostVer2(taskType, option)
	if err != nil {
		return zeroCostVer2, err
	}
	buildTaskCost := newCostVer2(option, cpuFactor,
		buildRows*10*cpuFactor.Value,
		func() string { return fmt.Sprintf("cpu(%v*10*%v)", buildRows, cpuFactor) })
	startCost := newCostVer2(option, cpuFactor,
		10*3*cpuFactor.Value,
		func() string { return fmt.Sprintf("cpu(10*3*%v)", cpuFactor) })
	probeFilterCost := filterCostVer2(option, probeRowsTot, probeFilters, cpuFactor)
	probeChildCost, err := probe.getPlanCostVer2(taskType, option)
	if err != nil {
		return zeroCostVer2, err
	}
	var hashTableCost costVer2
	switch indexJoinType {
	case 1: // IndexHashJoin
		hashTableCost = hashBuildCostVer2(option, buildRows, buildRowSize, float64(len(p.RightJoinKeys)), cpuFactor, memFactor)
	case 2: // IndexMergeJoin
		hashTableCost = newZeroCostVer2(traceCost(option))
	default: // IndexJoin
		hashTableCost = hashBuildCostVer2(option, probeRowsTot, probeRowSize, float64(len(p.LeftJoinKeys)), cpuFactor, memFactor)
	}
	// IndexJoin executes a batch of rows at a time, so the actual cost of this part should be
	//  `innerCostPerBatch * numberOfBatches` instead of `innerCostPerRow * numberOfOuterRow`.
	// Use an empirical value batchRatio to handle this now.
	// TODO: remove this empirical value.
	batchRatio := 6.0
	probeCost := divCostVer2(mulCostVer2(probeChildCost, buildRows), batchRatio)
	// Double Read Cost
	doubleReadCost := newZeroCostVer2(traceCost(option))
	if p.ctx.GetSessionVars().IndexJoinDoubleReadPenaltyCostRate > 0 {
		batchSize := float64(p.ctx.GetSessionVars().IndexJoinBatchSize)
		taskPerBatch := 1024.0 // TODO: remove this magic number
		doubleReadTasks := buildRows / batchSize * taskPerBatch
		doubleReadCost = doubleReadCostVer2(option, doubleReadTasks, requestFactor)
		doubleReadCost = mulCostVer2(doubleReadCost, p.ctx.GetSessionVars().IndexJoinDoubleReadPenaltyCostRate)
	}
	p.planCostVer2 = sumCostVer2(startCost, buildChildCost, buildFilterCost, buildTaskCost, divCostVer2(sumCostVer2(doubleReadCost, probeCost, probeFilterCost, hashTableCost), probeConcurrency))
	p.planCostInit = true
	return p.planCostVer2, nil
}

关键在于：

	switch indexJoinType {
	case 1: // IndexHashJoin
		hashTableCost = hashBuildCostVer2(option, buildRows, buildRowSize, float64(len(p.RightJoinKeys)), cpuFactor, memFactor)
	case 2: // IndexMergeJoin
		hashTableCost = newZeroCostVer2(traceCost(option))
	default: // IndexJoin
		hashTableCost = hashBuildCostVer2(option, probeRowsTot, probeRowSize, float64(len(p.LeftJoinKeys)), cpuFactor, memFactor)
	}

对应办法：

func hashBuildCostVer2(option *PlanCostOption, buildRows, buildRowSize, nKeys float64, cpuFactor, memFactor costVer2Factor) costVer2 {
	// TODO: 1) consider types of keys, 2) dedicated factor for build-probe hash table
	hashKeyCost := newCostVer2(option, cpuFactor,
		buildRows*nKeys*cpuFactor.Value,
		func() string { return fmt.Sprintf("hashkey(%v*%v*%v)", buildRows, nKeys, cpuFactor) })
	hashMemCost := newCostVer2(option, memFactor,
		buildRows*buildRowSize*memFactor.Value,
		func() string { return fmt.Sprintf("hashmem(%v*%v*%v)", buildRows, buildRowSize, memFactor) })
	hashBuildCost := newCostVer2(option, cpuFactor,
		buildRows*cpuFactor.Value,
		func() string { return fmt.Sprintf("hashbuild(%v*%v)", buildRows, cpuFactor) })
	return sumCostVer2(hashKeyCost, hashMemCost, hashBuildCost)
}
func newZeroCostVer2(trace bool) (ret costVer2) {
	if trace {
		ret.trace = &costTrace{make(map[string]float64), ""}
	}
	return
}

简略的看一下代码，咱们可以发现，从大多数的场景来看，依照价值从小到大来排，这几种Join是MergeJoin<HashJoin<IndexJoin。

5.4.2履行方案枚举与择优

总得来说这块代码较为简略，实质便是枚举一切符合条件的物理履行方案，并挑选出价值最小的履行方案，故不再列举代码。有兴趣的同学可以依据以下纲要自行翻阅：

|--planner/core/find_best_task.go
-- findBestTask
-- enumeratePhysicalPlans4Task
-- compareTaskCost
-- getTaskPlanCost
|-- planner/core/plan_cost_ver2.go
-- getPlanCost

6.其他

6.1 参阅与引用的文章

Cascades Optimizer：zhuanlan.zhihu.com/p/73545345
揭秘 TiDB 新优化器：Cascades Planner 原理解析：cn.pingcap.com/blog/tidb-c…
TiDB文档-核算信息简介：docs.pingcap.com/zh/tidb/v6.…
TiDB文档-过错索引的处理方案：docs.pingcap.com/zh/tidb/v6.…
The Volcano Optimizer Generator: Extensibility and Efficient Search：15721.courses.cs.cmu.edu/spring2019/…
The Cascades Framework for Query Optimization：15721.courses.cs.cmu.edu/spring2018/…

6.2 常识补充：code generation && vectorized execution

数据库引擎履行器中十分知名的两种优化办法，code generation和 vectorized execution。

code generation首要是依据上下文来生成一整段优化过的代码，这与那种嵌套大量if…else、for循环、虚办法的代码彻底相反，彻底面向功能考虑。

vectorized execution依据拉模型。相比于一次拉一个tuple来说，它的批量拉取减少了多次拉取的开销，一起还可以使用到SIMD。依据这种场景，vectorized execution的优化愈加适用于列式数据库。

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

读TiDB源码聊设计：浅析HTAP的SQL优化器

1.引子

2.核算信息搜集

3.价值的评价

4.履行方案枚举与择优

5.TiDB的完成

5.1 逻辑优化

5.2 核算信息搜集

5.3 新版别物理履行方案的挑选

5.3.1 价值的评价

select价值核算办法

join价值核算办法

5.4 老版别物理履行方案的挑选

5.4.1 价值的评价

select价值核算办法

join价值核算办法

5.4.2履行方案枚举与择优

6.其他

6.1 参阅与引用的文章

6.2 常识补充：code generation && vectorized execution

近期文章

近期评论

读TiDB源码聊设计：浅析HTAP的SQL优化器

1.引子

2.核算信息搜集

3.价值的评价

4.履行方案枚举与择优

5.TiDB的完成

5.1 逻辑优化

5.2 核算信息搜集

5.3 新版别 物理履行方案的挑选

5.3.1 价值的评价

select价值核算办法

join价值核算办法

5.4 老版别 物理履行方案的挑选

5.4.1 价值的评价

select价值核算办法

join价值核算办法

5.4.2履行方案枚举与择优

6.其他

6.1 参阅与引用的文章

6.2 常识补充：code generation && vectorized execution

近期文章

近期评论

5.3 新版别物理履行方案的挑选

5.4 老版别物理履行方案的挑选