前言

记录下我装置sealos的踩坑历程,全网根本没有什么相似的牢靠材料,或许是因为太小众了吧,希望能帮助到查找到此文的人.

sealos是什么

Sealos 是以 kubernetes 为内核的云操作体系发行版, 单机操作体系如同 linux 发行版别能够在上面装置和运用各种单机使用,如 PPT,Word,Excel 等。 云操作体系只需求把这些单机使用替换成各种云使用,如数据库,目标存储,消息队列等,就很简单理解了,这些使用都是分布式高可用的。 Sealos 便是能支撑运转各种分布式使用的云操作体系。有了 Sealos 就拥有了一朵云。
首要材料参阅这儿介绍 | sealos 这儿不做赘述

材料

  • 介绍 | sealos
  • labring/sealos: Sealos is a Kubernetes distribution, a general-purpose Cloud Operating System designed for managing cloud-native applications. Demo: https://cloud.sealos.io (github.com)
  • sealerio/sealer: Build, Share and Run Both Your Kubernetes Cluster and Distributed Applications (Project under CNCF) — sealerio/sealer:构建、同享和运转您的 Kubernetes 集群和分布式使用程序(CNCF 下的项目) (github.com)
  • 用到的根本镜像能够从这儿找到: labring’s Profile | Docker Hub
  • 装置4.17版别 github.com/labring/sea…
  • 装置crictl指令 github.com/kubernetes-…
  • 指令参阅 Kubernetes 生命周期管理 | sealos

架构

  • 本身材料中没有画,要么从代码中提炼
  • 阅览代码,了解设计模式和代码架构,了解根底操作和完结

装置

官方操作

4.0版别的sealos

# 装置前必读
1.现在只支撑root用户,不支撑非root和sudo
2.现在只支撑在集群内的节点履行装置指令
3.提早卸载掉已装置的docker
4.3.0版别的k8s离线包无法运用4.0版别的sealos装置
5.run指令时如果密码有特别字符,请加英文单引号
6.离线装置示例:
4.0离线装置示例:
---
# 镜像打包, 在有外网的机器上履行
sealos pull labring/kubernetes:v1.24.0
sealos pull labring/calico:v3.22.1
sealos save -o kubernetes.tar labring/kubernetes:v1.24.0
sealos save -o calico.tar labring/calico:v3.22.1
---
# 加载镜像, 内网机器履行
sealos load -i kubernetes.tar
sealos load -i calico.tar

主机

主机 用处
10.55.10.107 方案作为sealos的装置机,以及master节点
10.55.10.106 node节点1
10.55.10.97 node节点2

能够挑选打通免密,方便定位问题

ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys
vim authorized_keys # 增加秘钥
vim /etc/ssh/sshd_config # 修正答应root登录 PermitRootLogin yes
systemctl restart sshd

前置检查和文件预备

# 主机只要挂载的/data01磁盘支撑overlay,所以注定了无法向上面官方文档给出的那么简单的就能装置完结
[root@test-d-010055010107 data01]# xfs_info  /data01
meta-data=/dev/vdb               isize=512    agcount=4, agsize=5242880 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=20971520, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=10240, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
lsmod | grep -e ip_vs -e nf_conntrack_ipv4

# 文件预备,从远处pull下来,然后save成镜像包
ctr image import kubernetes.tar
ctr image import calico.tar
ctr images export calico.tar docker.io/labring/calico:v3.22.1
wget https://github.com/labring/sealos/releases/download/v4.1.4/sealos_4.1.4_linux_amd64.tar.gz \
   && tar zxvf sealos_4.1.4_linux_amd64.tar.gz sealos && chmod +x sealos && mv sealos /usr/bin

# sealos_4.1.4 和 sealos_4.1.7 在Global Flags当地有差异,并且4.1.4有bug无法完结当前主机集群的正常部署,需求运用4.1.7版别

单机装置

# 遇到文件格局问题,需求指定主目录
[root@test-d-010055010107 data01]# ./sealos run
Error: kernel does not support overlay fs: overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior. Reformat the filesystem with ftype=1 to enable d_type support. Running without d_type is not supported.: driver not supported
kernel does not support overlay fs: overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior. Reformat the filesystem with ftype=1 to enable d_type support. Running without d_type is not supported.: driver not supported

# 加载镜像包有问题,需求指定镜像解包格局
[root@test-d-010055010107 data01]# sealos --root /data01/ --runroot /data01/ load -i kubernetes.tar
Error: loading index: open /var/tmp/oci1097864579/index.json: no such file or directory
loading index: open /var/tmp/oci1097864579/index.json: no such file or directory

# 常用指令
mkdir /data01/sealos
sealos --debug --root /data01/sealos --runroot /data01/sealos/docker load -i calico.tar -t docker-archive
sealos --debug --root /data01/sealos --runroot /data01/sealos/docker load -i new-kubernetes.tar -t oci-archive
sealos load --help
sealos --debug --root /data01/sealos --runroot /data01/sealos/docker run localhost/labring/kuberentes:v1.24 --single # 通过镜像名有问题,这儿直接用镜像id
sealos --debug --root /data01/sealos --runroot /data01/sealos/docker run 133c6a0a0d5f --single

# 重置装置
sealos --debug --root /data01/sealos --runroot /data01/sealos/docker reset

# 简化指令
alias s="sealos --debug --root /data01/sealos --runroot /data01/sealos/docker "
s run 133c6a0a0d5f --single
[root@test-d-010055010107 sealos]# s images
REPOSITORY                     TAG       IMAGE ID       CREATED        SIZE
docker.io/labring/kubernetes   v1.24     133c6a0a0d5f   10 days ago    635 MB
docker.io/labring/helm         v3.8.2    1123e8b4b455   7 months ago   45.1 MB
docker.io/labring/calico       v3.22.1   29516dc98b4b   9 months ago   546 MB

# sealos version must >= v4.1.0
s reset
s run 133c6a0a0d5f 1123e8b4b455 29516dc98b4b --single

# 手动履行image-cri-shim发动,仍是有问题,检查有报错
/usr/bin/image-cri-shim -f /etc/image-cri-shim.yaml
fatal failed to setup image_shim, cri/shim: failed to register image service: falling using CRI v1 image API, please using other cri support v1 CRI API
fatal failed to setup image_shim, cri/shim: failed to register image service: falling using CRI v1alpha2 image API, please using other cri support v1alpha2 CRI API

# 排查containerd,看到有报错信息
[root@test-d-010055010107 sealos]# systemctl status containerd -l
● containerd.service - containerd container runtime
   Loaded: loaded (/etc/systemd/system/containerd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2023-03-27 17:49:48 CST; 16h ago
     Docs: https://containerd.io
 Main PID: 7077 (containerd)
   Memory: 13.9M
   CGroup: /system.slice/containerd.service
           └─7077 /usr/bin/containerd
Mar 27 17:49:48 test-d-010055010107 systemd[1]: Starting containerd container runtime...
Mar 27 17:49:48 test-d-010055010107 containerd[7077]: time="2023-03-27T17:49:48.229104592+08:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.overlayfs" error="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs does not support d_type. If the backing filesystem is xfs, please reformat with ftype=1 to enable d_type support"
Mar 27 17:49:48 test-d-010055010107 containerd[7077]: time="2023-03-27T17:49:48.229191393+08:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
Mar 27 17:49:48 test-d-010055010107 containerd[7077]: time="2023-03-27T17:49:48.229403283+08:00" level=warning msg="could not use snapshotter overlayfs in metadata plugin" error="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs does not support d_type. If the backing filesystem is xfs, please reformat with ftype=1 to enable d_type support"
Mar 27 17:49:48 test-d-010055010107 containerd[7077]: time="2023-03-27T17:49:48.229420619+08:00" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
Mar 27 17:49:48 test-d-010055010107 containerd[7077]: time="2023-03-27T17:49:48.238313538+08:00" level=warning msg="failed to load plugin io.containerd.grpc.v1.cri" error="failed to create CRI service: failed to find snapshotter \"overlayfs\""
Mar 27 17:49:48 test-d-010055010107 systemd[1]: Started containerd container runtime.

# 怀疑是 containerd 没有装置成功,测验装置crictl指令来看看
tar zxvf crictl-v1.25.0-linux-amd6.tar.gz  -C /usr/local/bin

# 检查信息,确定是这个问题,测验修正
[root@test-d-010055010107 sealos]# crictl info
E0328 10:07:11.802780   10291 remote_runtime.go:948] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
FATA[0000] getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService

# 检查containerd关于overlayfs的装备,以及修正目录
cp -r /var/lib/container* /data01/
vim /etc/containerd/config.toml 修正 root = "/data01/containerd"

# 顺畅发动containerd和image-cri-shim
systemctl restart containerd
systemctl restart image-cri-shim

# 遇到了 /root/.sealos/default/etc/admin.conf 找不到的问题,看着issue需求升级到4.1.7版别,问题解决但又然后发现重复装置有问题,无法继续上次装置
s reset # 重新开始

# 可是装置出来的containerd仍是在/var/lib/containerd,需求找到改变此途径的方法,翻阅文档猜想指定criData环境变量或许有用

# 改变指令
s run 133c6a0a0d5f --single --env criData=/data01/containerd

# 确实有用,会把containerd装置到/data01/containerd,可是/root/.sealos/default/Clusterfile中显现的criData仍是/var/lib/containerd

# 成功装置

# 可是节点一直未安排妥当
[root@test-d-010055010107 sealos]# kubectl get node
NAME                             STATUS     ROLES           AGE     VERSION
test-d-010055010107   NotReady   control-plane   8m56s   v1.24.0
KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
[root@test-d-010055010107 sealos]# crictl ps -a
CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
5d3572591a876       77b49675beae1       12 minutes ago      Running             kube-proxy                0                   dc61529f47415       kube-proxy-vjjqv
9559b3a7d80ec       aebe758cef4cd       12 minutes ago      Running             etcd                      0                   1a1846fb97f25       etcd-test-d-010055010107
00a5f23d7d227       529072250ccc6       12 minutes ago      Running             kube-apiserver            0                   b65e60cdc8996       kube-apiserver-test-d-010055010107
91b737d89b72e       e3ed7dee73e93       12 minutes ago      Running             kube-scheduler            0                   e682c3fb7cc11       kube-scheduler-test-d-010055010107
dd3a2ea10b7c7       88784fb4ac2f6       12 minutes ago      Running             kube-controller-manager   0                   d3177bd65479c       kube-controller-manager-test-d-010055010107
[root@test-d-010055010107 sealos]# kubectl get pod -A
NAMESPACE     NAME                                                     READY   STATUS    RESTARTS   AGE
kube-system   coredns-6d4b75cb6d-qfnf5                                 0/1     Pending   0          3h24m
kube-system   coredns-6d4b75cb6d-xzjz5                                 0/1     Pending   0          3h24m
kube-system   etcd-test-d-010055010107                      1/1     Running   0          3h24m
kube-system   kube-apiserver-test-d-010055010107            1/1     Running   0          3h24m
kube-system   kube-controller-manager-test-d-010055010107   1/1     Running   0          3h24m
kube-system   kube-proxy-vjjqv                                         1/1     Running   0          3h24m
kube-system   kube-scheduler-test-d-010055010107            1/1     Running   0          3h24m
[root@test-d-010055010107 sealos]# journalctl -xeu kubelet
Mar 28 11:43:40 test-d-010055010107 kubelet[20385]: E0328 11:43:40.678552   20385 kubelet.go:2344] "Container runtime network not ready" networkReady="NetworkReady=f
Mar 28 11:43:45 test-d-010055010107 kubelet[20385]: E0328 11:43:45.679314   20385 kubelet.go:2344] "Container runtime network not ready" networkReady="NetworkReady=f

# 看issue上是说没有装置calico导致的,重新装置
s reset # 并不会删除/root/.sealos
s run 133c6a0a0d5f 1123e8b4b455 29516dc98b4b --single --env criData=/data01/containerd

# 看着一切正常
[root@test-d-010055010107 sealos]# kubectl get pod -A
NAMESPACE         NAME                                                     READY   STATUS    RESTARTS   AGE
calico-system     calico-kube-controllers-6b44b54755-qsmkl                 0/1     Pending   0          115s
calico-system     calico-node-7grz7                                        1/1     Running   0          115s
calico-system     calico-typha-6f9598cfd9-2sr27                            1/1     Running   0          115s
kube-system       coredns-6d4b75cb6d-6fncr                                 1/1     Running   0          2m2s
kube-system       coredns-6d4b75cb6d-b8czk                                 1/1     Running   0          2m2s
kube-system       etcd-test-d-010055010107                      1/1     Running   1          2m16s
kube-system       kube-apiserver-test-d-010055010107            1/1     Running   1          2m18s
kube-system       kube-controller-manager-test-d-010055010107   1/1     Running   1          2m16s
kube-system       kube-proxy-wnp2g                                         1/1     Running   0          2m3s
kube-system       kube-scheduler-test-d-010055010107            1/1     Running   1          2m16s
tigera-operator   tigera-operator-d7957f5cc-5wfc4                          1/1     Running   0          2m2s
[root@test-d-010055010107 sealos]#
[root@test-d-010055010107 sealos]#
[root@test-d-010055010107 sealos]# kubectl get node
NAME                             STATUS   ROLES           AGE     VERSION
test-d-010055010107   Ready    control-plane   2m25s   v1.24.0

集群装置

有了单机装置的经验,该踩的坑都踩了,直接开始装置集群

# 测验集群装置
alias s="sealos --debug --root /data01/sealos --runroot /data01/sealos/docker "
s run 133c6a0a0d5f 1123e8b4b455 29516dc98b4b -e defaultVIP=10.55.10.108 -e criData=/data01/containerd  --masters 10.55.10.107 --nodes 10.55.10.97,10.55.10.106 --passwd 112233
passwd 112233
[root@test-d-010055010107 ~]# kubectl get node -o wide
NAME                             STATUS   ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
test-d-010055010097   Ready    <none>          65s   v1.24.0   10.55.10.97    <none>        CentOS Linux 7 (Core)   3.10.0-693.11.6.el7.x86_64   containerd://1.7.0
test-d-010055010106   Ready    <none>          76s   v1.24.0   10.55.10.106   <none>        CentOS Linux 7 (Core)   3.10.0-693.11.6.el7.x86_64   containerd://1.7.0
test-d-010055010107   Ready    control-plane   95s   v1.24.0   10.55.10.107   <none>        CentOS Linux 7 (Core)   3.10.0-693.11.6.el7.x86_64   containerd://1.7.0

# 看着没啥问题

解决问题用到的参阅衔接

  • unsupported graph driver: vfs Issue #1576 sealerio/sealer (github.com)
  • 概览 | sealer 有些问题或许也要参阅这个文档
  • Question: Can sealos load -i use docker save -o image.tar? Issue #2526 labring/sealos — 问:sealos能够加载-i use docker保存-o image.tar吗?问题#2526 labring/sealos (github.com)
  • crictl装置 – 小吉猫 – 博客园 (cnblogs.com)
  • (22条消息) Containerd 装置过程以及踩的坑_/var/lib/containerd_Aisaka81的博客-CSDN博客
  • error Applied to cluster error: read admin.conf error in guest: open /root/.sealos/default/etc/admin.conf: no such file or directory Issue #2548 labring/sealos (github.com)
  • sealos4.0初次装置失败,再次装置没有任何提示且装置未成功 Issue #1207 labring/sealos (github.com)
  • 单机装置API Server未起来,kubelet也无法发动 Issue #2313 labring/sealos (github.com)
  • BUG: 单节点部署节点Notready Issue #1663 labring/sealos (github.com)
  • sealos NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized Issue #704 labring/sealos (github.com)
  • linux journalctl 指令 – sparkdev – 博客园 (cnblogs.com) Linux体系检查日志指令

感触

  • 版别变化多,指令参数有改动,bug躲藏的深
  • 需求耐心抽丝剥茧的排查遇到的问题,能够提早装置些k8s定位问题依靠的指令如ctr/crictl
  • 也加入了官方的钉钉群,但根本不答复问题和咨询
  • 关注issue,也是仅有有价值的参阅材料了
  • 禁止转载 sealos踩坑记录 – ()