原 在docker中模拟不同主机快速搭建GBase 8cV5集群环境
Tags: 原创高可用安装部署GBase分布式数据库GBase 8c
- 环境准备
- 申请环境
- 环境配置
- 增加swap空间并调整swap配置
- 所有节点安装依赖包
- 修改主机名
- 修改内核参数
- 所有节点创建用户
- 配置互信
- 解压安装包
- 开始安装
- 编辑集群部署文件 gbase8c.yml
- 执行安装脚本
- 安装日志位置
- 状态检查
- 数据库启停
- 停止数据库服务
- 启动数据库服务
- 连接和 SQL 测试
- 卸载集群
- 环境变量
- 修改密码
- 配置远程登录
- gbase 8c修改参数的命令
- 安装错误解决
- gaussDB state is Coredump
- error while loading shared libraries: libcjson.so.1
- Current gtm center group num 1 is out of range [0, 0]
- Rpc request failed:dn1_1 save node info
- OPENSSL_1_1_1 not defined in file libcrypto.so.1.1
- Exception: Failed to obtain host name. The cmd is hostname
- install or upgrade dependency {'patch' : Mone} failed
- 巡检脚本
- 总结
- 参考
环境准备
申请环境
宿主机:32g内存,8g swap,需要保证每台机器至少4g内存+8g swap,否则不能安装。。。
IP | hostname | 角色 |
---|---|---|
172.72.3.30 | gbase8c_1 | gha_server(高可用服务)、dcs(分布式配置存储)、gtm(全局事务管理)、coordinator(协调器) |
172.72.3.31 | gbase8c_2 | datanode(数据节点) 1 |
172.72.3.32 | gbase8c_3 | datanode(数据节点) 2 |
名词 | 角色 | 功能 | 配置方式 |
---|---|---|---|
GHA Server | 高可用(highavailability)管理器 | 管理整个集群各节点的高可用状态,类似于patroni | 主备高可用架,主备之间可以配置同步或异步方式 |
DCS/HA Center | 集群状态管理器 | 存储各个节点的高可用状态,负责在故障情况下判断集群各个节点状态。 | 采用Raft的复制协议 |
GTM | 全局事务管理器(Global TransactionManager) | 负责生成并维护全局时间戳,保证集群数据一致性 | 主备高可用架构,主备之间可以配置同步或异步方式 |
CN/Coordinator | 协调器 | 对外提供接口,负责进行SQL解析和优化、生成执行计划,并协调数据节点进行数据查询和写入。 | 采用完全对等的部署方式 |
DN/Datanode | 数据节点 | 用于处理存储本节点相关的元数据以及所在的业务数据的分片。 | 主备高可用架构,主备之间可以配置同步或异步方式 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | -- 网卡 docker network create --subnet=172.72.0.0/16 lhrnw docker rm -f gbase8c_1 docker run -itd --name gbase8c_1 -h gbase8c_1 \ --net=lhrnw --ip 172.72.3.30 \ -p 63330:5432 \ -v /sys/fs/cgroup:/sys/fs/cgroup \ --privileged=true \ --add-host='gbase8c_1:172.72.3.30' \ --add-host='gbase8c_2:172.72.3.31' \ --add-host='gbase8c_3:172.72.3.32' \ lhrbest/lhrcentos76:9.0 \ /usr/sbin/init docker rm -f gbase8c_2 docker run -itd --name gbase8c_2 -h gbase8c_2 \ --net=lhrnw --ip 172.72.3.31 \ -v /sys/fs/cgroup:/sys/fs/cgroup \ --privileged=true \ --add-host='gbase8c_1:172.72.3.30' \ --add-host='gbase8c_2:172.72.3.31' \ --add-host='gbase8c_3:172.72.3.32' \ lhrbest/lhrcentos76:9.0 \ /usr/sbin/init docker rm -f gbase8c_3 docker run -itd --name gbase8c_3 -h gbase8c_3 \ --net=lhrnw --ip 172.72.3.32 \ -v /sys/fs/cgroup:/sys/fs/cgroup \ --privileged=true \ --add-host='gbase8c_1:172.72.3.30' \ --add-host='gbase8c_2:172.72.3.31' \ --add-host='gbase8c_3:172.72.3.32' \ lhrbest/lhrcentos76:9.0 \ /usr/sbin/init docker cp GBase8cV5_S3.0.0B76_centos7.8_x86_64.tar.gz gbase8c_1:/soft/ docker cp sshUserSetup.sh gbase8c_1:/soft/ |
环境配置
增加swap空间并调整swap配置
若内存为4g或更小,则需要增加swap空间,并增加swappiness参数,否则内存耗尽,系统会很卡,导致集群状态不对:
1 2 3 4 5 6 7 8 9 10 11 12 13 | dd if=/dev/zero of=/root/swapfile bs=10M count=3200 chmod -R 0600 /root/swapfile mkswap /root/swapfile swapon /root/swapfile echo '/root/swapfile swap swap defaults 0 0' >> /etc/fstab swapon -s echo 20 > /proc/sys/vm/swappiness echo 'vm.swappiness=20' >> /etc/sysctl.conf cat /proc/sys/vm/swappiness cat /etc/sysctl.conf | grep swappiness |
每台机器内存至少4G,且需要配置swap 8G,否则会报错“Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4496 Mbytes) is larger.” 、“could not create shared memory segment: Cannot allocate memory”、“This error usually means that openGauss's request for a shared memory segment exceeded available memory or swap space, or exceeded your kernel's SHMALL parameter. You can either reduce the request size or reconfigure the kernel with larger SHMALL. To reduce the request size (currently 1841225728 bytes), reduce openGauss's shared memory usage, perhaps by reducing shared_buffers.”
1 | shared_buffers = 126MB |
也可以修改参数:
1 2 3 4 5 6 7 8 9 | sudo sed -i '/shared_buffers = 1GB/c shared_buffers = 256MB' /home/gbase/data/gtm/gtm1/postgresql.conf sudo sed -i '/shared_buffers = 1GB/c shared_buffers = 256MB' /home/gbase/data/coord/cn1/postgresql.conf sudo sed -i '/shared_buffers = 1GB/c shared_buffers = 256MB' /home/gbase/data/dn1/dn1_1/postgresql.conf sudo sed -i '/shared_buffers = 1GB/c shared_buffers = 256MB' /home/gbase/data/dn2/dn2_1/postgresql.conf sudo sed -i '/cstore_buffers = 1GB/c cstore_buffers = 32MB' /home/gbase/data/gtm/gtm1/postgresql.conf sudo sed -i '/cstore_buffers = 1GB/c cstore_buffers = 32MB' /home/gbase/data/coord/cn1/postgresql.conf sudo sed -i '/cstore_buffers = 1GB/c cstore_buffers = 32MB' /home/gbase/data/dn1/dn1_1/postgresql.conf sudo sed -i '/cstore_buffers = 1GB/c cstore_buffers = 32MB' /home/gbase/data/dn2/dn2_1/postgresql.conf |
所有节点安装依赖包
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | yum install -y libaio-devel flex bison ncurses-devel \ glibc-devel patch readline-devel bzip2 firewalld \ crontabs net-tools openssh-server openssh-clients which sshpass \ ntp chrony systemctl disable firewalld systemctl stop firewalld systemctl status firewalld -- 重启OS sed -i s/SELINUX=enforcing/SELINUX=disabled/g /etc/selinux/config setenforce 0 getenforce -- 安装 ntp 组件或 chronyd 组件,确保集群各个节点之间的时间同步 systemctl unmask ntpd systemctl enable ntpd systemctl start ntpd systemctl status ntpd systemctl status ntpd systemctl disable chronyd systemctl stop chronyd systemctl status chronyd |
修改主机名
注意修改三个节点的 IP 地址,这里我使用如下三个 IP,并分别修改 hostname。
1 2 3 | node 1: hostnamectl set-hostname gbase8c_1.lhr.com node 2: hostnamectl set-hostname gbase8c_2.lhr.com node 3: hostnamectl set-hostname gbase8c_3.lhr.com |
docker的话可以不用管。
修改内核参数
1 2 3 4 5 6 7 8 9 | cat >> /etc/sysctl.conf <<"EOF" kernel.shmmax = 4398046511104 kernel.shmmni = 4096 kernel.shmall = 4000000000 kernel.sem = 32000 1024000000 500 32000 EOF sysctl -p |
若kernel.shmmax配置过小,会报共享内存相关的错误:
FATAL: could not create shared memory segment: Invalid argument
DETAIL: Failed system call was shmget(key=6666001, size=4714683328, 03600).
HINT: This error usually means that openGauss's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 4714683328 bytes), reduce openGauss's shared memory usage, perhaps by reducing shared_buffers.
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.
The openGauss documentation contains more information about shared memory configuration.
所有节点创建用户
1 2 3 4 | useradd gbase echo "gbase:lhr" | chpasswd echo "gbase ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers |
配置互信
以root用户只在主节点操作:
1 2 3 4 | ./sshUserSetup.sh -user gbase -hosts "gbase8c_1 gbase8c_2 gbase8c_3" -advanced exverify –confirm -- 所有节点 chmod 600 /home/gbase/.ssh/config |
解压安装包
只在主节点操作,解压安装包 GBase8cV5_S3.0.0B76_centos7.8_x86_64.tar.gz
:
1 2 3 4 5 6 | su - gbase mkdir -p /home/gbase/gbase_package cp /soft/GBase8cV5_S3.0.0B76_centos7.8_x86_64.tar.gz /home/gbase/gbase_package cd /home/gbase/gbase_package tar -zxvf GBase8cV5_S3.0.0B76_centos7.8_x86_64.tar.gz tar zxf GBase8cV5_S3.0.0B76_CentOS_x86_64_om.tar.gz |
示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | [gbase@gbase8c_1 gbase_package]$ ll -h total 514M drwxrwxr-x 5 gbase gbase 165 Feb 27 16:48 dependency -rw-r--r-- 1 root root 257M Mar 17 16:36 GBase8cV5_S3.0.0B76_centos7.8_x86_64.tar.gz -rw-rw-r-- 1 gbase gbase 65 Feb 27 16:48 GBase8cV5_S3.0.0B76_CentOS_x86_64_om.sha256 -rw-rw-r-- 1 gbase gbase 99M Feb 27 16:48 GBase8cV5_S3.0.0B76_CentOS_x86_64_om.tar.gz -rw-rw-r-- 1 gbase gbase 1012K Feb 27 16:48 GBase8cV5_S3.0.0B76_CentOS_x86_64_pgpool.tar.gz -rw-rw-r-- 1 gbase gbase 65 Feb 27 16:48 GBase8cV5_S3.0.0B76_CentOS_x86_64.sha256 -rw-rw-r-- 1 gbase gbase 158M Feb 27 16:48 GBase8cV5_S3.0.0B76_CentOS_x86_64.tar.bz2 -rw-rw-r-- 1 gbase gbase 2.6K Feb 27 16:48 gbase.yml drwxrwxr-x 11 gbase gbase 4.0K Feb 27 16:48 gha -rw-rw-r-- 1 gbase gbase 188 Feb 27 16:48 gha_ctl.ini drwxrwxr-x 2 gbase gbase 96 Feb 27 16:48 lib -rw-rw-r-- 1 gbase gbase 729 Feb 27 16:48 package_info.json drwxr-xr-x 4 gbase gbase 28 Mar 16 2021 python3.8 drwxrwxr-x 10 gbase gbase 4.0K Feb 27 16:48 script drwxrwxr-x 2 gbase gbase 330 Feb 27 16:48 simpleInstall -rw-rw-r-- 1 gbase gbase 118 Feb 27 16:48 ubuntu_version.json drwx------ 6 gbase gbase 87 Jul 2 2022 venv -rw-rw-r-- 1 gbase gbase 36 Feb 27 16:48 version.cfg [gbase@gbase8c_1 gbase_package]$ |
开始安装
只在主节点操作。
编辑集群部署文件 gbase8c.yml
1 2 | cd /home/gbase/gbase_package mv gbase.yml gbase.yml.bak |
- gbase.yml 修改如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | cat > /home/gbase/gbase_package/gbase.yml <<"EOF" gha_server: - gha_server1: host: 172.72.3.30 port: 20001 dcs: - host: 172.72.3.30 port: 2379 gtm: - gtm1: host: 172.72.3.30 agent_host: 172.72.3.30 role: primary port: 6666 agent_port: 8001 work_dir: /home/gbase/data/gtm/gtm1 coordinator: - cn1: host: 172.72.3.30 agent_host: 172.72.3.30 role: primary port: 5432 agent_port: 8003 work_dir: /home/gbase/data/coord/cn1 datanode: - dn1: - dn1_1: host: 172.72.3.31 agent_host: 172.72.3.31 role: primary port: 15432 agent_port: 8005 work_dir: /home/gbase/data/dn1/dn1_1 - dn2: - dn2_1: host: 172.72.3.32 agent_host: 172.72.3.32 role: primary port: 20010 agent_port: 8007 work_dir: /home/gbase/data/dn2/dn2_1 env: # cluster_type allowed values: multiple-nodes, single-inst, default is multiple-nodes cluster_type: multiple-nodes pkg_path: /home/gbase/gbase_package # 安装包所在路径 prefix: /home/gbase/gbase_db # 运行目录 version: V5_S3.0.0B76 # 与安装包版本一致 GBase8cV5_S3.0.0B76 user: gbase port: 22 # constant: # virtual_ip: 172.72.3.36/24 EOF |
执行安装脚本
1 2 3 | su - gbase cd /home/gbase/gbase_package/script ./gha_ctl install -c gbase -p /home/gbase/gbase_package -f |
注释:
-c 参数:数据库名称,默认 gbase
-p 参数:配置文件路径,默认 /home/gbase-f参数:若集群已存在,可以强制安装
安装日志位置
1 2 3 | tailf /tmp/gha_ctl/gha_ctl.log cd /home/gbase/gbase_db/log/om/ |
执行时间约 5 分钟,安装结束后,脚本会提示:
1 2 3 4 | { "ret":0, "msg":"Success" } |
集群安装成功!
状态检查
执行
1 2 | /home/gbase/gbase_package/script/gha_ctl monitor -l http://172.72.3.30:2379 /home/gbase/gbase_package/script/gha_ctl monitor -l http://172.72.3.30:2379 -H |
结果如下,说明集群安装正常,数据服务启动中
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | [gbase@gbase8c_1 ~]$ /home/gbase/gbase_package/script/gha_ctl monitor -l http://172.72.3.30:2379 -H +----+-------------+-------------+-------+---------+--------+ | No | name | host | port | state | leader | +----+-------------+-------------+-------+---------+--------+ | 0 | gha_server1 | 172.72.3.30 | 20001 | running | True | +----+-------------+-------------+-------+---------+--------+ +----+------+-------------+------+---------------------------+---------+---------+ | No | name | host | port | work_dir | state | role | +----+------+-------------+------+---------------------------+---------+---------+ | 0 | gtm1 | 172.72.3.30 | 6666 | /home/gbase/data/gtm/gtm1 | running | primary | +----+------+-------------+------+---------------------------+---------+---------+ +----+------+-------------+------+----------------------------+---------+---------+ | No | name | host | port | work_dir | state | role | +----+------+-------------+------+----------------------------+---------+---------+ | 0 | cn1 | 172.72.3.30 | 5432 | /home/gbase/data/coord/cn1 | running | primary | +----+------+-------------+------+----------------------------+---------+---------+ +----+-------+-------+-------------+-------+----------------------------+---------+---------+ | No | group | name | host | port | work_dir | state | role | +----+-------+-------+-------------+-------+----------------------------+---------+---------+ | 0 | dn1 | dn1_1 | 172.72.3.31 | 15432 | /home/gbase/data/dn1/dn1_1 | running | primary | | 1 | dn2 | dn2_1 | 172.72.3.32 | 20010 | /home/gbase/data/dn2/dn2_1 | running | primary | +----+-------+-------+-------------+-------+----------------------------+---------+---------+ +----+-------------------------+--------+---------+----------+ | No | url | name | state | isLeader | +----+-------------------------+--------+---------+----------+ | 0 | http://172.72.3.30:2379 | node_0 | healthy | True | +----+-------------------------+--------+---------+----------+ [gbase@gbase8c_1 ~]$ /home/gbase/gbase_package/script/gha_ctl monitor -l http://172.72.3.30:2379 { "cluster": "gbase", "version": "V5_S3.0.0B76", "server": [ { "name": "gha_server1", "host": "172.72.3.30", "port": "20001", "state": "running", "isLeader": true } ], "gtm": [ { "name": "gtm1", "host": "172.72.3.30", "port": "6666", "workDir": "/home/gbase/data/gtm/gtm1", "agentPort": "8001", "state": "running", "role": "primary", "agentHost": "172.72.3.30" } ], "coordinator": [ { "name": "cn1", "host": "172.72.3.30", "port": "5432", "workDir": "/home/gbase/data/coord/cn1", "agentPort": "8003", "state": "running", "role": "primary", "agentHost": "172.72.3.30", "central": true } ], "datanode": { "dn1": [ { "name": "dn1_1", "host": "172.72.3.31", "port": "15432", "workDir": "/home/gbase/data/dn1/dn1_1", "agentPort": "8005", "state": "running", "role": "primary", "agentHost": "172.72.3.31" } ], "dn2": [ { "name": "dn2_1", "host": "172.72.3.32", "port": "20010", "workDir": "/home/gbase/data/dn2/dn2_1", "agentPort": "8007", "state": "running", "role": "primary", "agentHost": "172.72.3.32" } ] }, "dcs": { "clusterState": "healthy", "members": [ { "url": "http://172.72.3.30:2379", "id": "b337bb545c63ee23", "name": "node_0", "isLeader": true, "state": "healthy" } ] } } [gbase@gbase8c_1 ~]$ |
数据库启停
停止数据库服务
1 2 3 4 5 6 7 8 9 10 | su - gbase /home/gbase/gbase_package/script/gha_ctl stop all -l http://172.72.3.30:2379 -- 或 systemctl stop coordinator_gbase_cn1.service systemctl stop datanode_gbase_dn1_1.service systemctl stop gtm_gbase_gtm1.service systemctl stop server_gbase_gha_server1.service systemctl stop etcd.service |
启动数据库服务
1 2 3 4 | systemctl start etcd.service su - gbase /home/gbase/gbase_package/script/gha_ctl start all -l http://172.72.3.30:2379 |
连接和 SQL 测试
在主节点 gbase8c_1
执行 $ gsql -d postgres -p 5432
,出现 postgres=#
操作符说明客户端工具 gsql 成功连接 GBase 8c 数据库。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | [gbase@gbase8c_1 script]$ gsql -d postgres -p 5432 gsql ((multiple_nodes GBase8cV5 3.0.0B76 build 47948f99) compiled at 2023-02-27 16:04:20 commit 0 last mr 1232 ) Non-SSL connection (SSL connection is recommended when requiring high-security) Type "help" for help. testdb=# select version(); version --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 9.2.4 (multiple_nodes GBase8cV5 3.0.0B76 build 47948f99) compiled at 2023-02-27 16:04:20 commit 0 last mr 1232 on x86_64-unknown-linux-gnu, compiled by g++ (GCC) 7.3.0, 64-bit (1 row) testdb=# postgres=# create database testdb; CREATE DATABASE postgres=# \c testdb Non-SSL connection (SSL connection is recommended when requiring high-security) You are now connected to database "testdb" as user "gbase". testdb=# create table student(ID int, Name varchar(10)); CREATE TABLE testdb=# insert into student values(1, 'Mike'),(2,'John'); INSERT 0 2 testdb=# select * from student; id | name ----+------ 1 | Mike 2 | John (2 rows) testdb=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+-------+----------+---------+-------+------------------- postgres | gbase | UTF8 | C | C | template0 | gbase | UTF8 | C | C | =c/gbase + | | | | | gbase=CTc/gbase template1 | gbase | UTF8 | C | C | =c/gbase + | | | | | gbase=CTc/gbase testdb | gbase | UTF8 | C | C | (4 rows) testdb=# \d student Table "public.student" Column | Type | Modifiers --------+-----------------------+----------- id | integer | name | character varying(10) | testdb=# |
卸载集群
在主节点(172.72.3.30)执行以下命令
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | -- 1、停止所有节点的集群服务 gha_ctl stop all -l http://172.72.3.30:2379 -- 2、集群程序的卸载: gha_ctl uninstall -l http://172.72.3.30:2379 -- 3、移除 dcs 集群: cd /home/gbase/gbase_package/script ./gha_ctl destroy dcs -l http://172.72.3.30:2379 netstat -tulnp systemctl list-unit| grep gbase systemctl stop coordinator_gbase_cn1.service systemctl stop datanode_gbase_dn1_1.service systemctl stop gtm_gbase_gtm1.service systemctl stop server_gbase_gha_server1.service systemctl stop etcd.service systemctl disable coordinator_gbase_cn1.service systemctl disable datanode_gbase_dn1_1.service systemctl disable gtm_gbase_gtm1.service systemctl disable server_gbase_gha_server1.service systemctl disable etcd.service pkill -9 python3 pkill -9 gaussdb rm -rf /home/gbase/data rm -rf /home/gbase/gbase_db |