版别 日期 补白
1.0 2024.2.18 文章首发

本文的的源码剖析全部依据TiDB6.5来做剖析。

1.引子

假如让你做一个散布式数据库的优化器,面对以下的SQL,你会想到什么好的办法去履行他们呢?

  • SELECT id, name FROM person WHERE age >= 18 or height > 180 limit 100;:从条件上看,咱们看到条件其实是二选一的: age >= 18 or height > 180。依据这种状况,咱们肯定会去挑选有索引的数据,假如都有索引or都没有,那么肯定挑选扫描行数最少的数据。假如有一些算子在里边的话,则额外需求考虑数据的就近准则——有些算子在部分架构下可以充分利用MPP的才能,而有些则不行。
  • SELECT orders.order_id, customers.customer_name, orders.order_date FROM orders LEFT JOIN customers ON orders.customer_id=customers.customer_id;散布式数据库中的join,最优的办法便是小表广播到大表数据所在的当地。那么首要咱们得知道谁是小表,谁是大表。

2.核算信息搜集

依据前面的两个比如,咱们可以发现——假如咱们需求找到SQL对应的最佳方案,咱们会需求一些表的元数据,或许说是核算信息。从常规的角度来说,以下核算信息是必须搜集的:

  • 表的总行数
  • 每列数据的平均巨细
  • 每列数据的基数:即NDV(distinct value)
  • 列的NULL值个数

假如是事务型的(行式存储),那么还要考虑索引平均长度、值的散布等等。

假如是剖析型的(列式存储),那么还需求考虑相关列的最大值、最小值等等。

而核算办法的搜集,会有多种办法。首要是考虑资源和精确性之间的Trade off。常见的有:

  • TopN:相关数据呈现次数前 n 的值。
  • 直方图:用于描绘数据散布图的工具。依照数据的值巨细进行分桶,并用一些简略的数据来描绘每个桶,比方落在桶里的值的个数。
  • 2D直方图:因为直方图无法反应列之间的关联,可以用2D直方图(联合散布)做到,但空间占用也比较多。
  • Count-Min Sketch:相似哈希表加上核算器的完成。可以用很少的数据来描绘全体数据的特性。
  • HyperLogLog:一种估算海量数据基数的办法。许多状况下,核算并不需求那么精确。工程方面要在使用资源和精确性里找平衡。因此有人提出用HLL,这是一种占用资源少,但能给出较为精确的近似结果的算法。

TiDB搜集的核算信息见:docs.pingcap.com/zh/tidb/v6.…

3.价值的评价

一个SQL真实的物理履行方案或许是有多个的。在以核算信息为基础之上,咱们还需求参加相应的权重,举个比如:

  1. 假如可以在内存中核算完结,就不用去反复建议本地IO
  2. 假如能在本地IO中完结,就不要去建议网络请求

等等…

这在TiDB的代码中也有对应的默许值。

DefOptCPUFactor                                = 3.0
DefOptCopCPUFactor                             = 3.0
DefOptTiFlashConcurrencyFactor                 = 24.0
DefOptNetworkFactor                            = 1.0
DefOptScanFactor                               = 1.5
DefOptDescScanFactor                           = 3.0
DefOptSeekFactor                               = 20.0
DefOptMemoryFactor                             = 0.001
DefOptDiskFactor                               = 1.5
DefOptConcurrencyFactor                        = 3.0
var defaultVer2Factors = costVer2Factors{
	TiDBTemp:      costVer2Factor{"tidb_temp_table_factor", 0.00},
	TiKVScan:      costVer2Factor{"tikv_scan_factor", 40.70},
	TiKVDescScan:  costVer2Factor{"tikv_desc_scan_factor", 61.05},
	TiFlashScan:   costVer2Factor{"tiflash_scan_factor", 11.60},
	TiDBCPU:       costVer2Factor{"tidb_cpu_factor", 49.90},
	TiKVCPU:       costVer2Factor{"tikv_cpu_factor", 49.90},
	TiFlashCPU:    costVer2Factor{"tiflash_cpu_factor", 2.40},
	TiDB2KVNet:    costVer2Factor{"tidb_kv_net_factor", 3.96},
	TiDB2FlashNet: costVer2Factor{"tidb_flash_net_factor", 2.20},
	TiFlashMPPNet: costVer2Factor{"tiflash_mpp_net_factor", 1.00},
	TiDBMem:       costVer2Factor{"tidb_mem_factor", 0.20},
	TiKVMem:       costVer2Factor{"tikv_mem_factor", 0.20},
	TiFlashMem:    costVer2Factor{"tiflash_mem_factor", 0.05},
	TiDBDisk:      costVer2Factor{"tidb_disk_factor", 200.00},
	TiDBRequest:   costVer2Factor{"tidb_request_factor", 6000000.00},
}

4.履行方案枚举与择优

当咱们可以评价出物理履行方案的价值时,将会枚举一切可以用上物理履行方案,并在其间挑选价值最小的物理履行方案。一般枚举分为两个门户:

  1. 自底向上:代表有System R。
  2. 自顶向下:代表有Cascade。

自底向上没法处理一个问题。当上层对下层的输出结果次序感兴趣时,那就不能只能从底层的视角来寻找部分最优。

举个比如,多表Join。一开始两个表Join或许HashJoin价值很低,可是Join下一个表时,用MergeJoin从全体来看价值更低。从这个case来看,自底向上来做方案取优并不适宜。

所以有了Cascade:

  1. 查找方案是自顶向下的。这意味着它可以防止部分最优而导致大局不优。从Operator Tree 自顶向下遍历时,可以做一系列改换:
    • Implementation:逻辑算子可以转换成物理算子;例如Join转换成NestLoop或许HashJoin等
    • Exploration:逻辑算子可以做等价改换;例如交流Inner Join的两个子节点,即可枚举Join次序

读TiDB源码聊规划:浅析HTAP的SQL优化器

图片来自于:Cascades Optimizer——zhuanlan.zhihu.com/p/73545345

  1. 它是依据Volcano模型演进而来的。
  2. 用面向对象的办法进行建模,编写规矩时不需求关怀查找进程。相比传统优化器中面向进程去一条条的编写,的确是很大的改进。

5.TiDB的完成

大致的代码调用链为:

-- session/session.go
-- ExecuteStmt //SQL履行的进口
|-- executor/compiler.go
-- Compile //将SQL变成可履行的方案
|--planner/planner/optmize.go
-- Optimize //优化的进口
-- optimize //这儿有两个进口。一种是新的优化器进口,一种是老的优化器进口。依据装备来挑选。最终都会回来最优的物理履行方案。
    |-- planner/cascades/optmize.go
        --FindBestPlan 见5.1
        -- onPhasePreprocessing //见5.3
        -- implGroup
        |--planner/core/optmizer.go //见5.4
            -- DoOptimize
            -- physicalOptimize
            |--planner/core/find_best_task.go
            -- findBestTask
            -- enumeratePhysicalPlans4Task
            -- compareTaskCost
            -- getTaskPlanCost
            |-- planner/core/plan_cost_ver2.go
            -- getPlanCost

5.1 逻辑优化

中心进口为:


// FindBestPlan is the optimization entrance of the cascades planner. The
// optimization is composed of 3 phases: preprocessing, exploration and implementation.
//
// ------------------------------------------------------------------------------
// Phase 1: Preprocessing
// ------------------------------------------------------------------------------
//
// The target of this phase is to preprocess the plan tree by some heuristic
// rules which should always be beneficial, for example Column Pruning.
//
// ------------------------------------------------------------------------------
// Phase 2: Exploration
// ------------------------------------------------------------------------------
//
// The target of this phase is to explore all the logically equivalent
// expressions by exploring all the equivalent group expressions of each group.
//
// At the very beginning, there is only one group expression in a Group. After
// applying some transformation rules on certain expressions of the Group, all
// the equivalent expressions are found and stored in the Group. This procedure
// can be regarded as searching for a weak connected component in a directed
// graph, where nodes are expressions and directed edges are the transformation
// rules.
//
// ------------------------------------------------------------------------------
// Phase 3: Implementation
// ------------------------------------------------------------------------------
//
// The target of this phase is to search the best physical plan for a Group
// which satisfies a certain required physical property.
//
// In this phase, we need to enumerate all the applicable implementation rules
// for each expression in each group under the required physical property. A
// memo structure is used for a group to reduce the repeated search on the same
// required physical property.
func (opt *Optimizer) FindBestPlan(sctx sessionctx.Context, logical plannercore.LogicalPlan) (p plannercore.PhysicalPlan, cost float64, err error) {
	logical, err = opt.onPhasePreprocessing(sctx, logical)
	if err != nil {
		return nil, 0, err
	}
	rootGroup := memo.Convert2Group(logical)
	err = opt.onPhaseExploration(sctx, rootGroup)
	if err != nil {
		return nil, 0, err
	}
	p, cost, err = opt.onPhaseImplementation(sctx, rootGroup)
	if err != nil {
		return nil, 0, err
	}
	err = p.ResolveIndices()
	return p, cost, err
}

注释+代码很洁净,这儿总共做三件事

  1. onPhasePreprocessing:注释很真实,说for example Column Pruning,结果里边真的只做了列裁剪。
  2. onPhaseExploration:探究一切逻辑等价存在的或许
  3. onPhaseImplementation:依据价值寻找最优的物理履行方案

这块官网的博客写的十分好,我就不重复了:cn.pingcap.com/blog/tidb-c…

5.2 核算信息搜集

这块代码首要会集在stats.go里,里边的中心数据结构是stats_info.go里的StatsInfo。调用链大致为:

|-- planner/cascades/optimizer.go
--fillGroupStats
|-- planner/core/stats.go
--DeriveStats
type LogicalPlan interface {
	Plan
    //......疏忽一些代码
	// DeriveStats derives statistic info for current plan node given child stats.
	// We need selfSchema, childSchema here because it makes this method can be used in
	// cascades planner, where LogicalPlan might not record its children or schema.
	DeriveStats(childStats []*property.StatsInfo, selfSchema *expression.Schema, childSchema []*expression.Schema, colGroups [][]*expression.Column) (*property.StatsInfo, error)
    //......疏忽一些代码
}

有许多结构体完成了这个办法:

读TiDB源码聊规划:浅析HTAP的SQL优化器

  1. 搜集核算信息首要是analyze.go#Next办法中调用的#analyzeWorker。

5.3 新版别 物理履行方案的挑选

代码进口为:

// implGroup finds the best Implementation which satisfies the required
// physical property for a Group. The best Implementation should have the
// lowest cost among all the applicable Implementations.
//
// g:			the Group to be implemented.
// reqPhysProp: the required physical property.
// costLimit:   the maximum cost of all the Implementations.
func (opt *Optimizer) implGroup(g *memo.Group, reqPhysProp *property.PhysicalProperty, costLimit float64) (memo.Implementation, error) {
	groupImpl := g.GetImpl(reqPhysProp)
	if groupImpl != nil {
		if groupImpl.GetCost() <= costLimit {
			return groupImpl, nil
		}
		return nil, nil
	}
	// Handle implementation rules for each equivalent GroupExpr.
	var childImpls []memo.Implementation
	err := opt.fillGroupStats(g)
	if err != nil {
		return nil, err
	}
	outCount := math.Min(g.Prop.Stats.RowCount, reqPhysProp.ExpectedCnt)
	for elem := g.Equivalents.Front(); elem != nil; elem = elem.Next() {
		curExpr := elem.Value.(*memo.GroupExpr)
		impls, err := opt.implGroupExpr(curExpr, reqPhysProp)
		if err != nil {
			return nil, err
		}
		for _, impl := range impls {
			childImpls = childImpls[:0]
			for i, childGroup := range curExpr.Children {
				childImpl, err := opt.implGroup(childGroup, impl.GetPlan().GetChildReqProps(i), impl.GetCostLimit(costLimit, childImpls...))
				if err != nil {
					return nil, err
				}
				if childImpl == nil {
					impl.SetCost(math.MaxFloat64)
					break
				}
				childImpls = append(childImpls, childImpl)
			}
			if impl.GetCost() == math.MaxFloat64 {
				continue
			}
			implCost := impl.CalcCost(outCount, childImpls...)
			if implCost > costLimit {
				continue
			}
			if groupImpl == nil || groupImpl.GetCost() > implCost {
				groupImpl = impl.AttachChildren(childImpls...)
				costLimit = implCost
			}
		}
	}
	// Handle enforcer rules for required physical property.
	for _, rule := range GetEnforcerRules(g, reqPhysProp) {
		newReqPhysProp := rule.NewProperty(reqPhysProp)
		enforceCost := rule.GetEnforceCost(g)
		childImpl, err := opt.implGroup(g, newReqPhysProp, costLimit-enforceCost)
		if err != nil {
			return nil, err
		}
		if childImpl == nil {
			continue
		}
		impl := rule.OnEnforce(reqPhysProp, childImpl)
		implCost := enforceCost + childImpl.GetCost()
		impl.SetCost(implCost)
		if groupImpl == nil || groupImpl.GetCost() > implCost {
			groupImpl = impl
			costLimit = implCost
		}
	}
	if groupImpl == nil || groupImpl.GetCost() == math.MaxFloat64 {
		return nil, nil
	}
	g.InsertImpl(reqPhysProp, groupImpl)
	return groupImpl, nil
}

这儿个函数会找出潜在符合条件的物理履行方案,并不断的查找。但它有一个剪枝逻辑——会记录当前最小的cost,假如一个履行方案自上向下查找时,假如超过了这个cost,则直接回来。这便是在第3节提到的自顶向下的优化。

接下来咱们看一下memo.Implementation的定义:


package memo
import (
	plannercore "github.com/pingcap/tidb/planner/core"
)
// Implementation defines the interface for cost of physical plan.
type Implementation interface {
	CalcCost(outCount float64, children ...Implementation) float64
	SetCost(cost float64)
	GetCost() float64
	GetPlan() plannercore.PhysicalPlan
	// AttachChildren is used to attach children implementations and returns it self.
	AttachChildren(children ...Implementation) Implementation
	// GetCostLimit gets the costLimit for implementing the next childGroup.
	GetCostLimit(costLimit float64, children ...Implementation) float64
}

其间CalcCost办法便是用来核算物理履行方案的价值。总共有这么多结构体完成了它:

读TiDB源码聊规划:浅析HTAP的SQL优化器

5.3.1 价值的评价

咱们以最初的比如,讲解价值的评价。

select价值核算办法

// CalcCost calculates the cost of the table scan Implementation.
func (impl *TableScanImpl) CalcCost(outCount float64, _ ...memo.Implementation) float64 {
	ts := impl.plan.(*plannercore.PhysicalTableScan)
	width := impl.tblColHists.GetTableAvgRowSize(impl.plan.SCtx(), impl.tblCols, kv.TiKV, true)
	sessVars := ts.SCtx().GetSessionVars()
	impl.cost = outCount * sessVars.GetScanFactor(ts.Table) * width
	if ts.Desc {
		impl.cost = outCount * sessVars.GetDescScanFactor(ts.Table) * width
	}
	return impl.cost
}
// GetScanFactor returns the session variable scanFactor
// returns 0 when tbl is a temporary table.
func (s *SessionVars) GetScanFactor(tbl *model.TableInfo) float64 {
	if tbl != nil {
		if tbl.TempTableType != model.TempTableNone {
			return 0
		}
	}
	return s.scanFactor 
}
// CalcCost implements Implementation interface.
func (impl *IndexScanImpl) CalcCost(outCount float64, _ ...memo.Implementation) float64 {
	is := impl.plan.(*plannercore.PhysicalIndexScan)
	sessVars := is.SCtx().GetSessionVars()
	rowSize := impl.tblColHists.GetIndexAvgRowSize(is.SCtx(), is.Schema().Columns, is.Index.Unique)
	cost := outCount * rowSize * sessVars.GetScanFactor(is.Table)
	if is.Desc {
		cost = outCount * rowSize * sessVars.GetDescScanFactor(is.Table)
	}
	cost += float64(len(is.Ranges)) * sessVars.GetSeekFactor(is.Table)
	impl.cost = cost
	return impl.cost
}

这儿咱们以全表扫描表和射中索引的代码为比如。其间默许的scanFactor是1.5。这儿可以看到indexScan和tableScan少了一个因数:width。这个变量代表了所需列的平均巨细。这么看基本上是indexScan最优了。

这儿的代码笔者认为有点不高雅——当Desc时,其实之前的Cost是没必要算一遍的,糟蹋CPU资源。

join价值核算办法

// CalcCost implements Implementation CalcCost interface.
func (impl *HashJoinImpl) CalcCost(_ float64, children ...memo.Implementation) float64 {
	hashJoin := impl.plan.(*plannercore.PhysicalHashJoin)
	// The children here are only used to calculate the cost.
	hashJoin.SetChildren(children[0].GetPlan(), children[1].GetPlan())
	selfCost := hashJoin.GetCost(children[0].GetPlan().StatsCount(), children[1].GetPlan().StatsCount(), false, 0, nil)
	impl.cost = selfCost + children[0].GetCost() + children[1].GetCost()
	return impl.cost
}
// CalcCost implements Implementation CalcCost interface.
func (impl *MergeJoinImpl) CalcCost(_ float64, children ...memo.Implementation) float64 {
	mergeJoin := impl.plan.(*plannercore.PhysicalMergeJoin)
	// The children here are only used to calculate the cost.
	mergeJoin.SetChildren(children[0].GetPlan(), children[1].GetPlan())
	selfCost := mergeJoin.GetCost(children[0].GetPlan().StatsCount(), children[1].GetPlan().StatsCount(), 0)
	impl.cost = selfCost + children[0].GetCost() + children[1].GetCost()
	return impl.cost
}

具体的核算都在plan_cost_v1.go里:


// GetCost computes cost of hash join operator itself.
func (p *PhysicalHashJoin) GetCost(lCnt, rCnt float64, isMPP bool, costFlag uint64, op *physicalOptimizeOp) float64 {
	buildCnt, probeCnt := lCnt, rCnt
	build := p.children[0]
	// Taking the right as the inner for right join or using the outer to build a hash table.
	if (p.InnerChildIdx == 1 && !p.UseOuterToBuild) || (p.InnerChildIdx == 0 && p.UseOuterToBuild) {
		buildCnt, probeCnt = rCnt, lCnt
		build = p.children[1]
	}
	sessVars := p.ctx.GetSessionVars()
	oomUseTmpStorage := variable.EnableTmpStorageOnOOM.Load()
	memQuota := sessVars.MemTracker.GetBytesLimit() // sessVars.MemQuotaQuery && hint
	rowSize := getAvgRowSize(build.statsInfo(), build.Schema().Columns)
	spill := oomUseTmpStorage && memQuota > 0 && rowSize*buildCnt > float64(memQuota) && p.storeTp != kv.TiFlash
	// Cost of building hash table.
	cpuFactor := sessVars.GetCPUFactor()
	diskFactor := sessVars.GetDiskFactor()
	memoryFactor := sessVars.GetMemoryFactor()
	concurrencyFactor := sessVars.GetConcurrencyFactor()
	cpuCost := buildCnt * cpuFactor
	memoryCost := buildCnt * memoryFactor
	diskCost := buildCnt * diskFactor * rowSize
	// Number of matched row pairs regarding the equal join conditions.
	helper := &fullJoinRowCountHelper{
		sctx:            p.SCtx(),
		cartesian:       false,
		leftProfile:     p.children[0].statsInfo(),
		rightProfile:    p.children[1].statsInfo(),
		leftJoinKeys:    p.LeftJoinKeys,
		rightJoinKeys:   p.RightJoinKeys,
		leftSchema:      p.children[0].Schema(),
		rightSchema:     p.children[1].Schema(),
		leftNAJoinKeys:  p.LeftNAJoinKeys,
		rightNAJoinKeys: p.RightNAJoinKeys,
	}
	numPairs := helper.estimate()
	// For semi-join class, if `OtherConditions` is empty, we already know
	// the join results after querying hash table, otherwise, we have to
	// evaluate those resulted row pairs after querying hash table; if we
	// find one pair satisfying the `OtherConditions`, we then know the
	// join result for this given outer row, otherwise we have to iterate
	// to the end of those pairs; since we have no idea about when we can
	// terminate the iteration, we assume that we need to iterate half of
	// those pairs in average.
	if p.JoinType == SemiJoin || p.JoinType == AntiSemiJoin ||
		p.JoinType == LeftOuterSemiJoin || p.JoinType == AntiLeftOuterSemiJoin {
		if len(p.OtherConditions) > 0 {
			numPairs *= 0.5
		} else {
			numPairs = 0
		}
	}
	if hasCostFlag(costFlag, CostFlagUseTrueCardinality) {
		numPairs = getOperatorActRows(p)
	}
	// Cost of querying hash table is cheap actually, so we just compute the cost of
	// evaluating `OtherConditions` and joining row pairs.
	probeCost := numPairs * cpuFactor
	probeDiskCost := numPairs * diskFactor * rowSize
	// Cost of evaluating outer filter.
	if len(p.LeftConditions)+len(p.RightConditions) > 0 {
		// Input outer count for the above compution should be adjusted by SelectionFactor.
		probeCost *= SelectionFactor
		probeDiskCost *= SelectionFactor
		probeCost += probeCnt * cpuFactor
	}
	diskCost += probeDiskCost
	probeCost /= float64(p.Concurrency)
	// Cost of additional concurrent goroutines.
	cpuCost += probeCost + float64(p.Concurrency+1)*concurrencyFactor
	// Cost of traveling the hash table to resolve missing matched cases when building the hash table from the outer table
	if p.UseOuterToBuild {
		if spill {
			// It runs in sequence when build data is on disk. See handleUnmatchedRowsFromHashTableInDisk
			cpuCost += buildCnt * cpuFactor
		} else {
			cpuCost += buildCnt * cpuFactor / float64(p.Concurrency)
		}
		diskCost += buildCnt * diskFactor * rowSize
	}
	if spill {
		memoryCost *= float64(memQuota) / (rowSize * buildCnt)
	} else {
		diskCost = 0
	}
	if op != nil {
		setPhysicalHashJoinCostDetail(p, op, spill, buildCnt, probeCnt, cpuFactor, rowSize, numPairs,
			cpuCost, probeCost, memoryCost, diskCost, probeDiskCost,
			diskFactor, memoryFactor, concurrencyFactor,
			memQuota)
	}
	return cpuCost + memoryCost + diskCost
}
// GetCost computes cost of merge join operator itself.
func (p *PhysicalMergeJoin) GetCost(lCnt, rCnt float64, costFlag uint64) float64 {
	outerCnt := lCnt
	innerCnt := rCnt
	innerKeys := p.RightJoinKeys
	innerSchema := p.children[1].Schema()
	innerStats := p.children[1].statsInfo()
	if p.JoinType == RightOuterJoin {
		outerCnt = rCnt
		innerCnt = lCnt
		innerKeys = p.LeftJoinKeys
		innerSchema = p.children[0].Schema()
		innerStats = p.children[0].statsInfo()
	}
	helper := &fullJoinRowCountHelper{
		sctx:          p.SCtx(),
		cartesian:     false,
		leftProfile:   p.children[0].statsInfo(),
		rightProfile:  p.children[1].statsInfo(),
		leftJoinKeys:  p.LeftJoinKeys,
		rightJoinKeys: p.RightJoinKeys,
		leftSchema:    p.children[0].Schema(),
		rightSchema:   p.children[1].Schema(),
	}
	numPairs := helper.estimate()
	if p.JoinType == SemiJoin || p.JoinType == AntiSemiJoin ||
		p.JoinType == LeftOuterSemiJoin || p.JoinType == AntiLeftOuterSemiJoin {
		if len(p.OtherConditions) > 0 {
			numPairs *= 0.5
		} else {
			numPairs = 0
		}
	}
	if hasCostFlag(costFlag, CostFlagUseTrueCardinality) {
		numPairs = getOperatorActRows(p)
	}
	sessVars := p.ctx.GetSessionVars()
	probeCost := numPairs * sessVars.GetCPUFactor()
	// Cost of evaluating outer filters.
	var cpuCost float64
	if len(p.LeftConditions)+len(p.RightConditions) > 0 {
		probeCost *= SelectionFactor
		cpuCost += outerCnt * sessVars.GetCPUFactor()
	}
	cpuCost += probeCost
	// For merge join, only one group of rows with same join key(not null) are cached,
	// we compute average memory cost using estimated group size.
	NDV, _ := getColsNDVWithMatchedLen(innerKeys, innerSchema, innerStats)
	memoryCost := (innerCnt / NDV) * sessVars.GetMemoryFactor()
	return cpuCost + memoryCost
}

HashJoin要考虑到内存不够的状况,因此在核算到数据不够时,会将对应的数据压入硬盘。但它对数据的次序并无要求,因此可以并发的去做。见:

	// Cost of traveling the hash table to resolve missing matched cases when building the hash table from the outer table
	if p.UseOuterToBuild {
		if spill {
			// It runs in sequence when build data is on disk. See handleUnmatchedRowsFromHashTableInDisk
			cpuCost += buildCnt * cpuFactor
		} else {
			cpuCost += buildCnt * cpuFactor / float64(p.Concurrency)
		}
		diskCost += buildCnt * diskFactor * rowSize
	}

而MergeJoin的价值显然会更小,但可以选则到这个方案也有较高的要求:当两个关联表要 Join 的字段需求按排好的次序读取时,适用 Merge Join 算法。

5.4 老版别 物理履行方案的挑选

5.4.1 价值的评价

这块代码首要是在plan_cost_ver1.goplan_cost_ver2.go。v2对价值公式进行了更精确的回归校准,调整了部分价值公式,比此前版别的价值公式愈加精确。代码上也更为简练:v2只暴露出了一个公共办法,内部通过不同的类型做转发。

// GetPlanCost returns the cost of this plan.
func GetPlanCost(p PhysicalPlan, taskType property.TaskType, option *PlanCostOption) (float64, error) {
	return getPlanCost(p, taskType, option)
}
func getPlanCost(p PhysicalPlan, taskType property.TaskType, option *PlanCostOption) (float64, error) {
	if p.SCtx().GetSessionVars().CostModelVersion == modelVer2 {
		planCost, err := p.getPlanCostVer2(taskType, option)
		return planCost.cost, err
	}
	return p.getPlanCostVer1(taskType, option)
}

依据不同的PhysicalPlan类型,会找到不同绑定办法:

读TiDB源码聊规划:浅析HTAP的SQL优化器

v1的部分办法展示:

读TiDB源码聊规划:浅析HTAP的SQL优化器

select价值核算办法

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
// plan-cost = child-cost + filter-cost
func (p *PhysicalSelection) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {
		return p.planCostVer2, nil
	}
	inputRows := getCardinality(p.children[0], option.CostFlag)
	cpuFactor := getTaskCPUFactorVer2(p, taskType)
	filterCost := filterCostVer2(option, inputRows, p.Conditions, cpuFactor)
	childCost, err := p.children[0].getPlanCostVer2(taskType, option)
	if err != nil {
		return zeroCostVer2, err
	}
	p.planCostVer2 = sumCostVer2(filterCost, childCost)
	p.planCostInit = true
	return p.planCostVer2, nil
}

这部分代码简略易读。价值便是子查询的价值+筛选的价值。

那么问题来了,中索引的和不中索引的价值应该是不一样的。这儿没有体现出来啊。仔细看childCost, err := p.children[0].getPlanCostVer2(taskType, option),这儿是会去获取子物理履行方案的价值。

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
func (p *PointGetPlan) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {
		return p.planCostVer2, nil
	}
	if p.accessCols == nil { // from fast plan code path
		p.planCostVer2 = zeroCostVer2
		p.planCostInit = true
		return zeroCostVer2, nil
	}
	rowSize := getAvgRowSize(p.stats, p.schema.Columns)
	netFactor := getTaskNetFactorVer2(p, taskType)
	p.planCostVer2 = netCostVer2(option, 1, rowSize, netFactor)
	p.planCostInit = true
	return p.planCostVer2, nil
}
func netCostVer2(option *PlanCostOption, rows, rowSize float64, netFactor costVer2Factor) costVer2 {
	return newCostVer2(option, netFactor,
		rows*rowSize*netFactor.Value,
		func() string { return fmt.Sprintf("net(%v*rowsize(%v)*%v)", rows, rowSize, netFactor) })
}
// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
// plan-cost = rows * log2(row-size) * scan-factor
// log2(row-size) is from experiments.
func (p *PhysicalTableScan) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {
		return p.planCostVer2, nil
	}
	rows := getCardinality(p, option.CostFlag)
	var rowSize float64
	if p.StoreType == kv.TiKV {
		rowSize = getAvgRowSize(p.stats, p.tblCols) // consider all columns if TiKV
	} else { // TiFlash
		rowSize = getAvgRowSize(p.stats, p.schema.Columns)
	}
	rowSize = math.Max(rowSize, 2.0)
	scanFactor := getTaskScanFactorVer2(p, p.StoreType, taskType)
	p.planCostVer2 = scanCostVer2(option, rows, rowSize, scanFactor)
	// give TiFlash a start-up cost to let the optimizer prefers to use TiKV to process small table scans.
	if p.StoreType == kv.TiFlash {
		p.planCostVer2 = sumCostVer2(p.planCostVer2, scanCostVer2(option, 10000, rowSize, scanFactor))
	}
	p.planCostInit = true
	return p.planCostVer2, nil
}
func scanCostVer2(option *PlanCostOption, rows, rowSize float64, scanFactor costVer2Factor) costVer2 {
	if rowSize < 1 {
		rowSize = 1
	}
	return newCostVer2(option, scanFactor,
		// rows * log(row-size) * scanFactor, log2 from experiments
		rows*math.Log2(rowSize)*scanFactor.Value,
		func() string { return fmt.Sprintf("scan(%v*logrowsize(%v)*%v)", rows, rowSize, scanFactor) })
}

scanFactor的价值默许是40.7,netFactor的价值默许是3.96。结合代码来看,射中索引的价值更优。

join价值核算办法

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
// plan-cost = build-child-cost + build-filter-cost +
// (probe-cost + probe-filter-cost) / concurrency
// probe-cost = probe-child-cost * build-rows / batchRatio
func (p *PhysicalIndexJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	return p.getIndexJoinCostVer2(taskType, option, 0)
}
func (p *PhysicalIndexHashJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	return p.getIndexJoinCostVer2(taskType, option, 1)
}
func (p *PhysicalIndexMergeJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {
	return p.getIndexJoinCostVer2(taskType, option, 2)
}
func (p *PhysicalIndexJoin) getIndexJoinCostVer2(taskType property.TaskType, option *PlanCostOption, indexJoinType int) (costVer2, error) {
	if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {
		return p.planCostVer2, nil
	}
	build, probe := p.children[1-p.InnerChildIdx], p.children[p.InnerChildIdx]
	buildRows := getCardinality(build, option.CostFlag)
	buildRowSize := getAvgRowSize(build.Stats(), build.Schema().Columns)
	probeRowsOne := getCardinality(probe, option.CostFlag)
	probeRowsTot := probeRowsOne * buildRows
	probeRowSize := getAvgRowSize(probe.Stats(), probe.Schema().Columns)
	buildFilters, probeFilters := p.LeftConditions, p.RightConditions
	probeConcurrency := float64(p.ctx.GetSessionVars().IndexLookupJoinConcurrency())
	cpuFactor := getTaskCPUFactorVer2(p, taskType)
	memFactor := getTaskMemFactorVer2(p, taskType)
	requestFactor := getTaskRequestFactorVer2(p, taskType)
	buildFilterCost := filterCostVer2(option, buildRows, buildFilters, cpuFactor)
	buildChildCost, err := build.getPlanCostVer2(taskType, option)
	if err != nil {
		return zeroCostVer2, err
	}
	buildTaskCost := newCostVer2(option, cpuFactor,
		buildRows*10*cpuFactor.Value,
		func() string { return fmt.Sprintf("cpu(%v*10*%v)", buildRows, cpuFactor) })
	startCost := newCostVer2(option, cpuFactor,
		10*3*cpuFactor.Value,
		func() string { return fmt.Sprintf("cpu(10*3*%v)", cpuFactor) })
	probeFilterCost := filterCostVer2(option, probeRowsTot, probeFilters, cpuFactor)
	probeChildCost, err := probe.getPlanCostVer2(taskType, option)
	if err != nil {
		return zeroCostVer2, err
	}
	var hashTableCost costVer2
	switch indexJoinType {
	case 1: // IndexHashJoin
		hashTableCost = hashBuildCostVer2(option, buildRows, buildRowSize, float64(len(p.RightJoinKeys)), cpuFactor, memFactor)
	case 2: // IndexMergeJoin
		hashTableCost = newZeroCostVer2(traceCost(option))
	default: // IndexJoin
		hashTableCost = hashBuildCostVer2(option, probeRowsTot, probeRowSize, float64(len(p.LeftJoinKeys)), cpuFactor, memFactor)
	}
	// IndexJoin executes a batch of rows at a time, so the actual cost of this part should be
	//  `innerCostPerBatch * numberOfBatches` instead of `innerCostPerRow * numberOfOuterRow`.
	// Use an empirical value batchRatio to handle this now.
	// TODO: remove this empirical value.
	batchRatio := 6.0
	probeCost := divCostVer2(mulCostVer2(probeChildCost, buildRows), batchRatio)
	// Double Read Cost
	doubleReadCost := newZeroCostVer2(traceCost(option))
	if p.ctx.GetSessionVars().IndexJoinDoubleReadPenaltyCostRate > 0 {
		batchSize := float64(p.ctx.GetSessionVars().IndexJoinBatchSize)
		taskPerBatch := 1024.0 // TODO: remove this magic number
		doubleReadTasks := buildRows / batchSize * taskPerBatch
		doubleReadCost = doubleReadCostVer2(option, doubleReadTasks, requestFactor)
		doubleReadCost = mulCostVer2(doubleReadCost, p.ctx.GetSessionVars().IndexJoinDoubleReadPenaltyCostRate)
	}
	p.planCostVer2 = sumCostVer2(startCost, buildChildCost, buildFilterCost, buildTaskCost, divCostVer2(sumCostVer2(doubleReadCost, probeCost, probeFilterCost, hashTableCost), probeConcurrency))
	p.planCostInit = true
	return p.planCostVer2, nil
}

关键在于:

	switch indexJoinType {
	case 1: // IndexHashJoin
		hashTableCost = hashBuildCostVer2(option, buildRows, buildRowSize, float64(len(p.RightJoinKeys)), cpuFactor, memFactor)
	case 2: // IndexMergeJoin
		hashTableCost = newZeroCostVer2(traceCost(option))
	default: // IndexJoin
		hashTableCost = hashBuildCostVer2(option, probeRowsTot, probeRowSize, float64(len(p.LeftJoinKeys)), cpuFactor, memFactor)
	}

对应办法:

func hashBuildCostVer2(option *PlanCostOption, buildRows, buildRowSize, nKeys float64, cpuFactor, memFactor costVer2Factor) costVer2 {
	// TODO: 1) consider types of keys, 2) dedicated factor for build-probe hash table
	hashKeyCost := newCostVer2(option, cpuFactor,
		buildRows*nKeys*cpuFactor.Value,
		func() string { return fmt.Sprintf("hashkey(%v*%v*%v)", buildRows, nKeys, cpuFactor) })
	hashMemCost := newCostVer2(option, memFactor,
		buildRows*buildRowSize*memFactor.Value,
		func() string { return fmt.Sprintf("hashmem(%v*%v*%v)", buildRows, buildRowSize, memFactor) })
	hashBuildCost := newCostVer2(option, cpuFactor,
		buildRows*cpuFactor.Value,
		func() string { return fmt.Sprintf("hashbuild(%v*%v)", buildRows, cpuFactor) })
	return sumCostVer2(hashKeyCost, hashMemCost, hashBuildCost)
}
func newZeroCostVer2(trace bool) (ret costVer2) {
	if trace {
		ret.trace = &costTrace{make(map[string]float64), ""}
	}
	return
}

简略的看一下代码,咱们可以发现,从大多数的场景来看,依照价值从小到大来排,这几种Join是MergeJoin<HashJoin<IndexJoin。

5.4.2履行方案枚举与择优

总得来说这块代码较为简略,实质便是枚举一切符合条件的物理履行方案,并挑选出价值最小的履行方案,故不再列举代码。有兴趣的同学可以依据以下纲要自行翻阅:

|--planner/core/find_best_task.go
-- findBestTask
-- enumeratePhysicalPlans4Task
-- compareTaskCost
-- getTaskPlanCost
|-- planner/core/plan_cost_ver2.go
-- getPlanCost

6.其他

6.1 参阅与引用的文章

6.2 常识补充:code generation && vectorized execution

数据库引擎履行器中十分知名的两种优化办法,code generation和 vectorized execution。

code generation首要是依据上下文来生成一整段优化过的代码,这与那种嵌套大量if…else、for循环、虚办法的代码彻底相反,彻底面向功能考虑。

vectorized execution依据拉模型。相比于一次拉一个tuple来说,它的批量拉取减少了多次拉取的开销,一起还可以使用到SIMD。依据这种场景,vectorized execution的优化愈加适用于列式数据库。

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。