Greenplum 5.21.1 集群安装部署详述

简单来说GPDB是一个分布式数据库软件,其可以管理和处理分布在多个不同主机上的海量数据。对于GPDB来说,一个DB实例实际上是由多个独立的PostgreSQL实例组成的,它们分布在不同的物

    <p>简单来说GPDB是一个分布式数据库软件,其可以管理和处理分布在多个不同主机上的海量数据。对于GPDB来说,一个DB实例实际上是由多个独立的PostgreSQL实例组成的,它们分布在不同的物理主机上,协同工作,呈现给用户的是一个DB的效果。Master是GPDB系统的访问入口,其负责处理客户端的连接及SQL 命令、协调系统中的其他Instance(Segment)工作,Segment负责管理和处理用户数据。</p>

环境准备:操作系统:CentOS Linux release 7.6.1810 (Core) 64位master 1台(架构图中的主节点),Standby 1台(架构图中的从节点),Segment 2台。共4台服务器。

一、Master主机 Root 用户上操作

1. 修改/etc/hosts文件,添加下面内容(注:4台服务器相同的配置)

vim /etc/hosts

192.168.18.130 gp-master 192.168.18.131 gp-standby 192.168.18.132 gp-node1 192.168.18.133 gp-node2

2. 服务器关闭selinux,防火墙4台服务器相互开放,测试环境可以直接先关闭防火墙。(注:4台服务器相同的配置)

关闭Firewalld

systemctl stop firewalldsystemctl disable firewalld

永久关闭Selinux

vim /etc/selinux/conf

  1. This file controls the state of SELinux on the system.
  2. SELINUX= can take one of these three values:
  3. enforcing - SELinux security policy is enforced.
  4. permissive - SELinux prints warnings instead of enforcing.
  5. disabled - No SELinux policy is loaded. SELINUX=disabled
  6. SELINUXTYPE= can take one of three two values:
  7. targeted - Targeted processes are protected,
  8. minimum - Modification of targeted policy. Only selected processes are protected.
  9. mls - Multi Level Security protection. SELINUXTYPE=targeted

注:查看Selinux运行状态:getenforce,CLI界面非永久生效设置SeLinux:setenforce 0(0-1对应关闭和开启)

3. 操作系统参数设置

 vim /etc/sysctl.conf  (注:4台服务器相同的配置)

kernel.shmmax = 500000000 kernel.shmmni = 4096 kernel.shmall = 4000000000 kernel.sem = 250 512000 100 2048 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 net.ipv4.tcp_syncookies = 1 net.ipv4.ip_forward = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.ipv4.conf.default.arp_filter = 1 net.core.netdev_max_backlog = 10000 vm.overcommit_memory = 2 kernel.msgmni = 2048 net.ipv4.ip_local_port_range = 1025 65535

 vim /etc/security/limits.conf  (注:4台服务器相同的配置)

  • soft nofile 65536
  • hard nofile 65536
  • soft nproc 131072
  • hard nproc 131072

磁盘预读参数及 deadline算法修改 (注:4台服务器相同的配置)

blockdev --setra 65536 /dev/sda echo deadline > /sys/block/sda/queue/scheduler

注:磁盘盘符sda需根据自己的实际情况进行配置

软件下载地址:https://network.pivotal.io/products/pivotal-gpdb,下载:greenplum-db-5.21.1-rhel7-x86_64.rpm

在Master主机上安装GP二进制文件,也就是主机名是mdw的服务器。(注:在master上安装即可,后面通过批量的方法安装剩下的服务器)

rpm -ivh greenplum-db-5.21.1-rhel7-x86_64.rpm

注:默认安装目录:/usr/local

 在Master上添加gpadmin用户

adduser gpadmin echo gpadmin | passwd --stdin gpadmin

 注:设置密码为了后面gpssh-exkeys -f hostfile_allhosts 使用

 在Master上给gpadmin用户提权

[root@gp-master ~]# visudo
gpadmin ALL=(ALL) ALL gpadmin ALL=(ALL) NOPASSWD:ALL

在Master主机上赋予gpadmin用户Greenplum文件夹的的权限

chown -R gpadmin.gpadmin /usr/local/greenplum-db*

 二、Master 主机 Gpadmin用户上操作

准备用于批量安装软件以及后续集群的初始化文件,hostfile_allhosts,hostfile_segments,hostfile_mshosts,存放到/home/gpadmin

su - gpadmin

vim hostfile_allhosts

gp-mastergp-standbygp-node1gp-node2

vim hostfile_segments

gp-node1 gp-node2

vim hostfile_mshosts

gp-master gp-standby

设置各主机之间免密登录

gpssh-exkeys -f hostfile_allhosts

注:需输入gpadmin用户的密码,此处为:gpadmin

设置用于安装Greenplum的文件夹权限

gpssh -f hostfile_allhosts => sudo chown gpadmin.gpadmin /usr/local => exit

创建及赋权master/standby主机元数据存储目录

gpssh -f hostfile_mshosts =>sudo mkdir /data/greenplum_data/gpmaster =>sudo chown -R gpadmin.gpadmin /data=>exit

创建及赋权Segments主机数据存储目录

gpssh -f hostfile_segments =>sudo mkdir /data/greenplum_data/{primary,mirror}=>sudo chown -R gpadmin.gpadmin /data=>exit

批量安装软件(GP)

cd /home/gpadmin/ source /usr/local/greenplum-db/greenplum_path.sh gpseginstall -f hostfile_allhosts -u gpadmin -p gpadmin

设置NTP同步

 Yum下载安装NTP服务器,已安装的可以略过

sudo yum install ntp -y

若出现如下报错,可看下一步解决方法

There was a problem importing one of the Python modules required to run yum. The error leading to this problem was:

No module named yum

Please install a package which provides this module, or verify that the module is installed correctly.

It's possible that the above module doesn't match the current version of Python, which is: 2.7.13 (r266:84292, Jan 22 2014, 09:37:14)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]

If you cannot solve this problem yourself, please go to
the yum faq at: http://yum.baseurl.org/wiki/Faq

View Code

解决方法:

unset PYTHONHOME unset PYTHONPATH unset LD_LIBRARY_PATH

 再进行yum安装之后,再修改回来,使得GP能正常使用

source /usr/local/greenplum-db/greenplum_path.sh

注:报错原因:在安装GP集群之后,会在master节点中的环境变量中会增加 PYTHONHOME,PYTHONPATH,LD_LIBRARY_PATH几项,并且会修改原本的path。

补充:LD_LIBRARY_PATH 该环境变量主要用于指定查找共享库(动态链接库)时除了默认路径之外的其他路径。

在每个Segment主机,编辑/etc/ntp.conf文件。设置第一个server参数指向Master主机,第二个server参数指向Standby主机。如下面:

sudo vim /etc/ntp.conf

server gp-master prefer server gp-standby

在Standby主机,编辑/etc/ntp.conf文件。设置第一个server参数指向Master主机,第二个参数指向数据中心的时间服务器。

sudo vim /etc/ntp.conf

server gp-master prefer

在Master主机,使用NTP守护进程同步所有Segment主机的系统时钟。例如,使用gpssh来完成:

gpssh -f hostfile_allhosts -v -e 'ntpd'

 输出如下代表成功:

[root@gp-master gpadmin]# gpssh -f all_hosts -v -e 'ntpd' [WARN] Reference default values as $MASTER_DATA_DIRECTORY/gpssh.conf could not be found Using delaybeforesend 0.05 and prompt_validation_timeout 1.0

[Reset ...] [INFO] login mdw [INFO] login smdw [INFO] login sdw1 [INFO] login sdw2 [ mdw] ntpd [smdw] ntpd [sdw1] ntpd [sdw2] ntpd [INFO] completed successfully

[Cleanup...]

配置Greenplum初始化文件

cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpinitsystem_config chmod 775 gpinitsystem_config

 相关配置如下:

[gpadmin@gp-master ~]$ cat gpinitsystem_config

  1. FILE NAME: gpinitsystem_config

  2. Configuration file needed by the gpinitsystem

################################################

REQUIRED PARAMETERS

################################################

Name of this Greenplum system enclosed in quotes.

ARRAY_NAME="Greenplum Data Platform"

Naming convention for utility-generated data directories.

SEG_PREFIX=gpseg

Base number by which primary segment port numbers

are calculated.

PORT_BASE=40000

File system location(s) where primary segment data directories

will be created. The number of locations in the list dictate

the number of primary segments that will get created per

physical host (if multiple addresses for a host are listed in

the hostfile, the number of segments will be spread evenly across

the specified interface addresses).

declare -a DATA_DIRECTORY=(/data/greenplum_data/primary)

OS-configured hostname or IP address of the master host.

MASTER_HOSTNAME=k8s-master

File system location where the master data directory

will be created.

MASTER_DIRECTORY=/data/greenplum_data/gpmaster

Port number for the master instance.

MASTER_PORT=5432

Shell utility used to connect to remote hosts.

TRUSTED_SHELL=ssh

Maximum log file segments between automatic WAL checkpoints.

CHECK_POINT_SEGMENTS=8

Default server-side character set encoding.

ENCODING=UTF-8

################################################

OPTIONAL MIRROR PARAMETERS

################################################

Base number by which mirror segment port numbers

are calculated.

MIRROR_PORT_BASE=43000

Base number by which primary file replication port

numbers are calculated.

REPLICATION_PORT_BASE=34000

Base number by which mirror file replication port

numbers are calculated.

MIRROR_REPLICATION_PORT_BASE=44000

File system location(s) where mirror segment data directories

will be created. The number of mirror locations must equal the

number of primary locations as specified in the

DATA_DIRECTORY parameter.

declare -a MIRROR_DATA_DIRECTORY=(/data/greenplum_data/mirror)

################################################

OTHER OPTIONAL PARAMETERS

################################################

Create a database of this name after initialization.

DATABASE_NAME=testDB

Specify the location of the host address file here instead of

with the the -h option of gpinitsystem.

MACHINE_LIST_FILE=/home/gpadmin/hostfile_segments

View Code

 运行初始化工具初始化数据库

source /usr/local/greenplum-db/greenplum_path.sh gpinitsystem -c gpinitsystem_config

初始化日志:

20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Review options for gpinitstandby 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:------------------------------------------------------- 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-The Master /data/master/gpseg-1/pg_hba.conf post gpinitsystem 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-has been configured to allow all hosts within this new 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-new array must be explicitly added to this file 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-located in the /usr/local/greenplum-db/./docs directory 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-------------------------------------------------------

现在只有1个master,2个segment,没有standby,那么接下来把standby加入集群。

在Master服务器上执行

gpinitstandby -s gp-standby

输出如下:

[gpadmin@mdw ~]$ gpinitstandby -s smdw 20160827:16:59:24:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Validating environment and parameters for standby initialization... 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Checking for filespace directory /data/master/gpseg-1 on smdw 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------ 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master initialization parameters 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------ 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master hostname = mdw 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master data directory = /data/master/gpseg-1 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master port = 5432 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master hostname = smdw 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master port = 5432 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master data directory = /data/master/gpseg-1 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum update system catalog = On 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------ 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:- Filespace locations 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------ 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-pg_system -> /data/master/gpseg-1 Do you want to continue with standby master initialization? Yy|Nn (default=N): > y 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-The packages on smdw are consistent. 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Adding standby master to catalog... 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Database catalog updated successfully. 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Updating pg_hba.conf file... 20160827:16:59:37:023346 gpinitstandby:mdw:gpadmin-[INFO]:-pg_hba.conf files updated successfully. 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Updating filespace flat files... 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Filespace flat file updated successfully. 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Starting standby master 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Checking if standby master is running on host: smdw in directory: /data/master/gpseg-1 20160827:16:59:40:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files... 20160827:16:59:46:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully. 20160827:16:59:46:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Successfully created standby master on gp-standby

View Code

查看启动进程:

[gpadmin@gp-master ~]$ ps -ef | grep postgres gpadmin 10975 1 0 00:57 ? 00:00:00 /usr/local/greenplum-db-5.21.1/bin/postgres -D /data/greenplum_data/gpmaster/gpseg-1 -p 5432 --gp_dbid=1 --gp_num_contents_in_cluster=2 --silent-mode=true -i -M master --gp_contentid=-1 -x 0 -E gpadmin 10976 10975 0 00:57 ? 00:00:00 postgres: 5432, master logger process
gpadmin 10979 10975 0 00:57 ? 00:00:00 postgres: 5432, stats collector process
gpadmin 10980 10975 0 00:57 ? 00:00:01 postgres: 5432, writer process
gpadmin 10981 10975 0 00:57 ? 00:00:00 postgres: 5432, checkpointer process
gpadmin 10982 10975 0 00:57 ? 00:00:00 postgres: 5432, seqserver process
gpadmin 10983 10975 0 00:57 ? 00:00:00 postgres: 5432, ftsprobe process
gpadmin 10984 10975 0 00:57 ? 00:00:00 postgres: 5432, sweeper process
gpadmin 10985 10975 0 00:57 ? 00:00:05 postgres: 5432, stats sender process
gpadmin 10986 10975 0 00:57 ? 00:00:01 postgres: 5432, wal writer process
gpadmin 11279 10975 0 00:59 ? 00:00:00 postgres: 5432, wal sender process gpadmin 192.168.18.131(53573) streaming 0/C05A028 gpadmin 16800 16608 0 04:15 pts/0 00:00:00 grep --color=auto postgres

设置gpadmin用户环境变量,Master,Standby都需设置。 

vim /home/gpadmin/.bashrc

[gpadmin@gp-master ~]$ cat .bashrc

  1. .bashrc

  2. Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi

  3. Uncomment the following line if you don't like systemctl's auto-paging feature:

  4. export SYSTEMD_PAGER=

  5. User specific aliases and functions

source /usr/local/greenplum-db/greenplum_path.sh

export MASTER_DATA_DIRECTORY=/data/greenplum_data/gpmaster/gpseg-1 export PGPRORT=5432 export PGDATABASE=testDB [gpadmin@gp-master ~]$ scp .bashrc gp-standby:pwd

启动和停止数据库测试是否能正常启动和关闭,命令如下

gpstart gpstop

到此 Greenplum 就部署完成了。下面进行一些简单的测试。

登录数据库:psql -d postgres

建表,插入,查询

postgres=# create table student ( no int primary key,student_name varchar(40),age int); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "student_pkey" for table "student" CREATE TABLE postgres=# insert into student values(1,'yayun',18); INSERT 0 1 postgres=# select * from student; no | student_name | age ----+--------------+----- 1 | yayun | 18 (1 row)