Trino on k8s 编排部署进阶篇-六虎

一、概述

Trino on Kubernetes（Trino在Kubernetes上的布置）是将Trino查询引擎与Kubernetes容器编列平台相结合，以实现在Kubernetes集群上布置、办理和运转Trino的解决方案。

Trino（之前称为Presto SQL）是一个高功能的分布式SQL查询引擎，旨在处理大规模数据集和杂乱查询。Kubernetes是一个盛行的开源容器编列平台，用于自动化容器的布置、扩展和办理。

将Trino布置在Kubernetes上能够带来一些优势：

弹性扩展：Kubernetes供给了自动化的容器扩展功能，能够依据作业负载的需求自动添加或削减Trino的实例数。这样，能够依据查询负载的改变进行弹性伸缩，提高功能和资源利用率。
高可用性：Kubernetes具有容错和故障恢复的能力。经过在Kubernetes集群中布置多个Trino实例，能够实现高可用性架构，当其中一个实例失败时，其他实例能够接管作业，保证体系的可用性。
资源办理：Kubernetes供给了资源调度和办理的功能，能够操控Trino实例运用的计算资源、存储资源和网络资源。经过恰当装备资源限制和请求，能够有效地办理Trino查询的资源耗费，避免资源冲突和争用。
简化布置和办理：Kubernetes供给了声明性的装备和自动化的布置机制，能够简化Trino的布置和办理进程。经过运用Kubernetes的规范工具和API，能够轻松地进行Trino实例的创立、装备和监控。
生态体系整合：Kubernetes具有丰厚的生态体系和集成能力，能够与其他工具和平台进行无缝集成。例如，能够与存储体系（如Hadoop HDFS、Amazon S3）和其他数据处理工具（如Apache Spark）集成，实现数据的无缝访问和处理。

需求留意的是，将Trino布置在Kubernetes上需求恰当的装备和调优，以保证功能和可靠性。此外，关于大规模和杂乱的查询场景，或许需求考虑数据分片、数据划分和数据本地性等方面的优化。

总之，Trino on Kubernetes供给了一种灵敏、可扩展和高效的方法来布置和办理Trino查询引擎，使其能够更好地适应大数据环境中的查询需求。

这儿只是讲解布置进程，想了解更多的trino的内容，可参阅我以下几篇文章：

大数据Hadoop之——基于内存型SQL查询引擎Presto（Presto-Trino环境布置）
【大数据】Presto（Trino）SQL 语法进阶
【大数据】Presto（Trino）REST API 与履行计划介绍
【大数据】Presto（Trino）装备参数以及 SQL语法

假如想单机容器布置，能够参阅我这篇文章：【大数据】经过 docker-compose 快速布置 Presto（Trino）保姆级教程

二、k8s 布置布置

k8s 环境布置这儿不重复讲解了，重点是 Hadoop on k8s，不知道怎样布置k8s环境的能够参阅我以下几篇文章：

【云原生】k8s 环境快速布置（一小时以内布置完）
【云原生】k8s 离线布置讲解和实战操作

三、开端编列布置 Trino

1）构建镜像 Dockerfile

FROM registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
RUN rm -f /etc/localtime && ln -sv /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo "Asia/Shanghai" > /etc/timezone
RUN export LANG=zh_CN.UTF-8
# 创立用户和用户组，跟yaml编列里的user: 10000:10000
RUN groupadd --system --gid=10000 hadoop && useradd --system --home-dir /home/hadoop --uid=10000 --gid=hadoop hadoop -m
# 装置sudo
RUN yum -y install sudo ; chmod 640 /etc/sudoers
# 给hadoop添加sudo权限
RUN echo "hadoop ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
RUN yum -y install install net-tools telnet wget nc
RUN mkdir /opt/apache/
# 添加装备 JDK
ADD zulu20.30.11-ca-jdk20.0.1-linux_x64.tar.gz /opt/apache/
ENV JAVA_HOME /opt/apache/zulu20.30.11-ca-jdk20.0.1-linux_x64
ENV PATH $JAVA_HOME/bin:$PATH
# 添加装备 trino server
ENV TRINO_VERSION 416
ADD trino-server-${TRINO_VERSION}.tar.gz /opt/apache/
ENV TRINO_HOME /opt/apache/trino
RUN ln -s /opt/apache/trino-server-${TRINO_VERSION} $TRINO_HOME
# 创立装备目录和数据源catalog目录
RUN mkdir -p ${TRINO_HOME}/etc/catalog
# 添加装备 trino cli
COPY trino-cli-416-executable.jar $TRINO_HOME/bin/trino-cli
# copy bootstrap.sh
COPY bootstrap.sh /opt/apache/
RUN chmod +x /opt/apache/bootstrap.sh ${TRINO_HOME}/bin/trino-cli
RUN chown -R hadoop:hadoop /opt/apache
WORKDIR $TRINO_HOME

bootstrap.sh 脚本内容

#!/usr/bin/env sh
wait_for() {
    if [ -n "$1" -a  -z -n "$2" ];then
       echo Waiting for $1 to listen on $2...
       while ! nc -z $1 $2; do echo waiting...; sleep 1s; done
    fi
}
start_trino() {
   wait_for $1 $2
   ${TRINO_HOME}/bin/launcher run --verbose
}
case $1 in
        trino-coordinator)
                start_trino coordinator $@
                ;;
        trino-worker)
                start_trino worker $@
                ;;
        *)
                echo "请输入正确的服务启动命令~"
        ;;
esac

构建镜像：

docker build -t registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/trino-k8s:416 . --no-cache
### 参数解说
# -t：指定镜像名称
# . ：当时目录Dockerfile
# -f：指定Dockerfile途径
#  --no-cache：不缓存

2）values.yaml 文件装备

# Default values for trino.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
image:
  repository: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/trino-k8s
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart version.
  tag: 416
imagePullSecrets:
  - name: registry-credentials
server:
  workers: 1
  node:
    environment: production
    dataDir: /opt/apache/trino/data
    pluginDir: /opt/apache/trino/plugin
  log:
    trino:
      level: INFO
  config:
    path: /opt/apache/trino/etc
    http:
      port: 8080
    https:
      enabled: false
      port: 8443
      keystore:
        path: ""
    # Trino supports multiple authentication types: PASSWORD, CERTIFICATE, OAUTH2, JWT, KERBEROS
    # For more info: https://trino.io/docs/current/security/authentication-types.html
    authenticationType: ""
    query:
      maxMemory: "1GB"
      maxMemoryPerNode: "512MB"
    memory:
      heapHeadroomPerNode: "512MB"
  exchangeManager:
    name: "filesystem"
    baseDir: "/tmp/trino-local-file-system-exchange-manager"
  workerExtraConfig: ""
  coordinatorExtraConfig: ""
  autoscaling:
    enabled: false
    maxReplicas: 5
    targetCPUUtilizationPercentage: 50
accessControl: {}
  # type: configmap
  # refreshPeriod: 60s
  # # Rules file is mounted to /etc/trino/access-control
  # configFile: "rules.json"
  # rules:
  #   rules.json: |-
  #     {
  #       "catalogs": [
  #         {
  #           "user": "admin",
  #           "catalog": "(mysql|system)",
  #           "allow": "all"
  #         },
  #         {
  #           "group": "finance|human_resources",
  #           "catalog": "postgres",
  #           "allow": true
  #         },
  #         {
  #           "catalog": "hive",
  #           "allow": "all"
  #         },
  #         {
  #           "user": "alice",
  #           "catalog": "postgresql",
  #           "allow": "read-only"
  #         },
  #         {
  #           "catalog": "system",
  #           "allow": "none"
  #         }
  #       ],
  #       "schemas": [
  #         {
  #           "user": "admin",
  #           "schema": ".*",
  #           "owner": true
  #         },
  #         {
  #           "user": "guest",
  #           "owner": false
  #         },
  #         {
  #           "catalog": "default",
  #           "schema": "default",
  #           "owner": true
  #         }
  #       ]
  #     }
additionalNodeProperties: {}
additionalConfigProperties: {}
additionalLogProperties: {}
additionalExchangeManagerProperties: {}
eventListenerProperties: {}
#additionalCatalogs: {}
additionalCatalogs:
  mysql: |-
    connector.name=mysql
    connection-url=jdbc:mysql://mysql-primary.mysql:3306
    connection-user=root
    connection-password=WyfORdvwVm
  hive: |-
    connector.name=hive
    hive.metastore.uri=thrift://hadoop-hadoop-hive-metastore.hadoop:9083
    hive.allow-drop-table=true
    hive.allow-rename-table=true
    #hive.config.resources=/tmp/core-site.xml,/tmp/hdfs-site.xml
# Array of EnvVar (https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#envvar-v1-core)
env: []
initContainers: {}
  # coordinator:
  #   - name: init-coordinator
  #     image: busybox:1.28
  #     imagePullPolicy: IfNotPresent
  #     command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
  # worker:
  #   - name: init-worker
  #     image: busybox:1.28
  #     command: ['sh', '-c', 'echo The worker is running! && sleep 3600']
securityContext:
  runAsUser: 10000
  runAsGroup: 10000
service:
  #type: ClusterIP
  type: NodePort
  port: 8080
  nodePort: 31880
nodeSelector: {}
tolerations: []
affinity: {}
auth: {}
  # Set username and password
  # https://trino.io/docs/current/security/password-file.html#file-format
  # passwordAuth: "username:encrypted-password-with-htpasswd"
serviceAccount:
  # Specifies whether a service account should be created
  create: false
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name: ""
  # Annotations to add to the service account
  annotations: {}
secretMounts: []
coordinator:
  jvm:
    maxHeapSize: "2G"
    gcMethod:
      type: "UseG1GC"
      g1:
        heapRegionSize: "32M"
  additionalJVMConfig: {}
  resources: {}
    # We usually recommend not to specify default resources and to leave this as a conscious
    # choice for the user. This also increases chances charts run on environments with little
    # resources, such as Minikube. If you do want to specify resources, uncomment the following
    # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
    # limits:
    #   cpu: 100m
    #   memory: 128Mi
    # requests:
    #   cpu: 100m
    #   memory: 128Mi
worker:
  jvm:
    maxHeapSize: "2G"
    gcMethod:
      type: "UseG1GC"
      g1:
        heapRegionSize: "32M"
  additionalJVMConfig: {}
  resources: {}
    # We usually recommend not to specify default resources and to leave this as a conscious
    # choice for the user. This also increases chances charts run on environments with little
    # resources, such as Minikube. If you do want to specify resources, uncomment the following
    # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
    # limits:
    #   cpu: 100m
    #   memory: 128Mi
    # requests:
    #   cpu: 100m
    #   memory: 128Mi

3）trino catalog configmap yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ template "trino.catalog" . }}
  labels:
    app: {{ template "trino.name" . }}
    chart: {{ template "trino.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
    role: catalogs
data:
  tpch.properties: |
    connector.name=tpch
    tpch.splits-per-node=4
  tpcds.properties: |
    connector.name=tpcds
    tpcds.splits-per-node=4
{{- range $catalogName, $catalogProperties := .Values.additionalCatalogs }}
  {{ $catalogName }}.properties: |
    {{- $catalogProperties | nindent 4 }}
{{- end }}

这儿只是列举出中心布置装备，最下面会供给git下载地址，有任何疑问欢迎留言或私信~

4）开端装置

cd trino-on-kubernetes
# 装置
helm install trino ./ -n trino --create-namespace
# 更新
# helm upgrade trino ./ -n trino
# 卸载
# helm uninstall trino -n trino
# 检查
kubectl get pods,svc -n trino

5）测验验证

coordinator_name=`kubectl get pods -n trino|grep coordinator|awk '{print $1}'`
# 登录
kubectl exec -it $coordinator_name -n trino -- /opt/apache/trino/bin/trino-cli --server http://trino-coordinator:8080 --catalog=hive --schema=default --user=hadoop
# 检查数据源
show catalogs;
select * from system.runtime.nodes;

四、装备 k8s hive 数据源

hive on k8s 能够参阅我这篇文章：Hadoop on k8s 快速布置进阶精简篇

在 trino-on-kubernetes/values.yaml 文件中添加数据源

重新更新装备并重启 trino节点

helm upgrade trino ./ -n trino
# 重启，由于修正configmap是不会动态刷新的，得重启才生效
kubectl delete pod -n trino `kubectl get pods -n trino|awk 'NR!=1{print $1}'`
coordinator_name=`kubectl get pods -n hadoop|grep coordinator|awk '{print $1}'`
# 登录
kubectl exec -it $coordinator_name -n trino -- ${TRINO_HOME}/bin/trino-cli --server http://trino-coordinator:8080 --catalog=hive --schema=default --user=hadoop
# 检查数据源
show catalogs;
# 检查mysql库
show schemas from hive;
# 检查表
show tables from hive.default;
create schema hive.test;
# 创立表
CREATE TABLE hive.test.movies (
  movie_id bigint,
  title varchar,
  rating real, -- real类似与float类型
  genres varchar,
  release_year int
)
WITH (
  format = 'ORC',
  partitioned_by = ARRAY['release_year'] -- 留意这儿的分区字段必须是上面次序的最终一个
);
#加载数据到Hive表
INSERT INTO hive.test.movies
VALUES 
(1, 'Toy Story', 8.3, 'Animation|Adventure|Comedy', 1995), 
(2, 'Jumanji', 6.9, 'Action|Adventure|Family', 1995), 
(3, 'Grumpier Old Men', 6.5, 'Comedy|Romance', 1995);
# 查询数据
select * from hive.test.movies;

五、快速布置中心操作步骤（假如只关注布置，可直接跳转这儿）

假如只是想快速布置，上面的内容就能够直接疏忽了，直接履行下面步骤即可：

1）装置 git

# 1、装置 git
yum -y install git

2）下载trino装置包

git clone git@github.com:HBigdata/trino-on-kubernetes.git
cd trino-on-kubernetes

3）装备数据源

cat -n values.yaml

3）装备资源限制 requests 和 limits

4）修复 trino 装备

JVM 内存装备

5）开端布置

# git clone git@github.com:HBigdata/trino-on-kubernetes.git
# cd trino-on-kubernetes
# 装置
helm install trino ./ -n trino --create-namespace
# 更新
helm upgrade trino ./ -n trino
# 卸载
helm uninstall trino -n trino

6）测验验证

coordinator_name=`kubectl get pods -n trino|grep coordinator|awk '{print $1}'`
# 登录
kubectl exec -it $coordinator_name -n trino -- ${TRINO_HOME}/bin/trino-cli --server http://trino-coordinator:8080 --catalog=hive --schema=default --user=hadoop
# 检查数据源
show catalogs;
# 检查mysql库
show schemas from hive;
# 检查表
show tables from hive.default;
create schema hive.test;
# 创立表
CREATE TABLE hive.test.movies (
  movie_id bigint,
  title varchar,
  rating real, -- real类似与float类型
  genres varchar,
  release_year int
)
WITH (
  format = 'ORC',
  partitioned_by = ARRAY['release_year'] -- 留意这儿的分区字段必须是上面次序的最终一个
);
#加载数据到Hive表
INSERT INTO hive.test.movies
VALUES 
(1, 'Toy Story', 8.3, 'Animation|Adventure|Comedy', 1995), 
(2, 'Jumanji', 6.9, 'Action|Adventure|Family', 1995), 
(3, 'Grumpier Old Men', 6.5, 'Comedy|Romance', 1995);
# 查询数据
select * from hive.test.movies;

到这儿完成 trino on k8s 布置和可用性演示就完成了，有任何疑问请关注我公众号：大数据与云原生技能共享，加群交流或私信交流，如本篇文章对您有所协助，麻烦帮助一键三连（点赞、转发、保藏）~

Trino on k8s 编排部署进阶篇

一、概述

二、k8s 布置布置

三、开端编列布置 Trino

1）构建镜像 Dockerfile

2）values.yaml 文件装备

3）trino catalog configmap yaml

4）开端装置

5）测验验证

四、装备 k8s hive 数据源

五、快速布置中心操作步骤（假如只关注布置，可直接跳转这儿）

1）装置 git

2）下载trino装置包

3）装备数据源

3）装备资源限制 requests 和 limits

4）修复 trino 装备

5）开端布置

6）测验验证

作者信息

推广

Trino on k8s 编排部署进阶篇

一、概述

二、k8s 布置布置

三、开端编列布置 Trino

1）构建镜像 Dockerfile

2）values.yaml 文件装备

3）trino catalog configmap yaml

4）开端装置

5）测验验证

四、装备 k8s hive 数据源

五、快速布置中心操作步骤（假如只关注布置，可直接跳转这儿）

1）装置 git

2）下载trino装置包

3）装备数据源

3）装备资源限制 requests 和 limits

4）修复 trino 装备

5）开端布置

6）测验验证

相关文章

什么是 SPI 机制

应用火山引擎 DataTester“避坑”，抖音实现用 A/B 实验快速试错

【自定义 View】一个易用且好看的阴影控件

git实操

作者信息

推广