[OpenIndiana-discuss] problems for setting up a high availability cluster on two openindiana machines

Marc Lobelle marc.lobelle at uclouvain.be
Mon Feb 21 12:33:02 UTC 2022


Hello all,

I'm trying to user the openindiana pacemaker package, using as 
documentation the "Clusters from scratch release 2.1.2", wich assumes 
CENT-OS(LINUX) as underlying OS and a 2014 document called "Use 
pacemaker and corosync on Illumos (OmniOS) to run a Ha active/passive 
cluster.", starting from the section "Corosync configuration". The user 
is hacluster and he has the right to su and sudo, to ssh to the other 
node without password nor passphrase. I attached the corosync 
configuration file, the smf manifest and the smf startup script.

When I start the corosync service and look at what happened, I get this:

root at mosquito:~# cat `svcs -L corosync`
...

...

[ févr. 21 13:05:12 Leaving maintenance because disable requested. ]
[ févr. 21 13:05:12 Disabled. ]
[ févr. 21 13:05:23 Rereading configuration. ]
[ févr. 21 13:05:31 Enabled. ]
[ févr. 21 13:05:31 Executing start method ("/etc/smf/corosyncd start"). ]
*shell-init: error retrieving current directory: getcwd: cannot access 
parent directories: Permission denied*
Feb 21 13:05:32 notice  [MAIN  ] main.c:main:1352 Corosync Cluster 
Engine ('2.4.5'): started and ready to provide service.
Feb 21 13:05:32 info    [MAIN  ] main.c:main:1353 Corosync built-in 
features: testagents monitoring augeas xmlconf qdevices snmp bindnow
Feb 21 13:05:33 warning [MAIN  ] main.c:corosync_set_rr_scheduler*:884 
Could not set SCHED_RR at priority 59: Not owner (1)*
Feb 21 13:05:33 warning [MAIN  ] main.c:main:1438 *Could not set 
priority -2147483648: Permission denied (13)*
[ févr. 21 13:05:38 Method "start" exited with status 0. ]
shell-init: error retrieving current directory: getcwd: cannot access 
parent directories: Permission denied
cmap connection setup failed: CS_ERR_NOT_EXIST .  Retrying in 1s
cmap connection setup failed: CS_ERR_NOT_EXIST .  Retrying in 2s
cmap connection setup failed: CS_ERR_NOT_EXIST .  Retrying in 3s
cmap connection setup failed: CS_ERR_NOT_EXIST .  Retrying in 4s
cmap connection setup failed: CS_ERR_NOT_EXIST .  Retrying in 5s
Could not connect to Cluster Configuration Database API, error 12
[ févr. 21 13:05:53 Stopping because all processes in service exited. ]
[ févr. 21 13:05:53 Executing stop method ("/etc/smf/corosyncd stop"). ]
[ févr. 21 13:05:54 Method "stop" exited with status 0. ]
[ févr. 21 13:05:54 Executing start method ("/etc/smf/corosyncd start"). ]
shell-init: error retrieving current directory: getcwd: cannot access 
parent directories: Permission denied
Feb 21 13:05:56 notice  [MAIN  ] main.c:main:1352 Corosync Cluster 
Engine ('2.4.5'): started and ready to provide service.
Feb 21 13:05:56 info    [MAIN  ] main.c:main:1353 Corosync built-in 
features: testagents monitoring augeas xmlconf qdevices snmp bindnow
Feb 21 13:05:56 warning [MAIN  ] main.c:corosync_set_rr_scheduler:884 
Could not set SCHED_RR at priority 59: Not owner (1)
Feb 21 13:05:56 warning [MAIN  ] main.c:main:1438 Could not set priority 
-2147483648: Permission denied (13)
[ févr. 21 13:06:01 Method "start" exited with status 0. ]
shell-init: error retrieving current directory: getcwd: cannot access 
parent directories: Permission denied

...

root at mosquito:~#

I do not understand which parent directory cannot be accessed nor why 
priorities cannot be set.

Can anybody help me ?

Thanks

Marc
-------------- next part --------------
# Please read the corosync.conf.5 manual page

# create /etc/corosync/authkey with corosync-keygen
# copy /etc/corosync/authkey to each node
# restrict access with chmod 400 /etc/corosync/authkey

totem {
  cluster_name: okapi_cluster
  version: 2

  interface {
    ringnumber: 0
#    bindnetaddr: 192.168.178.0
    broadcast: yes
    mcastport: 5405
    ttl: 1
  }
  transport: udpu
}

logging {
   fileline: on
   function_name: on
   to_stderr: on
   to_logfile: on
   to_syslog: off
   syslog_facility: local6
   logfile: /var/log/hacluster/corosync.log
   debug: trace
   logfile_priority: error
   syslog_priority: error
   tags: enter|leave|trace
   timestamp: on
#   logger_subsys {
#      subsys: QUORUM
#      debug: off
# }
}

# expected_votes only to start with one node in the cluster !
quorum {
  provider: corosync_votequorum
  expected_votes: 1
#  two_node: 1
}

# already defined in SMF
#qb {
#  ipc_type: socket
#}

nodelist {
  node {
    nodeid: 1
    ring0_addr: okapi1
  }
  node {
    nodeid: 2
    ring0_addr: okapi2
  }
}
-------------- next part --------------
#!/usr/bin/bash

. /lib/svc/share/smf_include.sh
 
## Tracing with debug version
# PCMK_trace_files=1
# PCMK_trace_functions=1
# PCMK_trace_formats=1
# PCMK_trace_tags=1
 
export PCMK_ipc_type=socket
PREFIX=/usr/
CLUSTER_USER=hacluster
COROSYNC=corosync
PACEMAKERD=pacemakerd
PACEMAKER_PROCESSES=pacemaker
APPPATH=${PREFIX}/sbin/
SLEEPINTERVALL=10
SLEEPCOUNT=5
SLEPT=0
 
 
killapp() {
   pid=`pgrep -f $1`
   if [ "x$pid" != "x" ]; then
      kill -9 $pid
   fi
   return 0
}
 
start() {
        stop
        su ${CLUSTER_USER} -c ${APPPATH}${COROSYNC}
        sleep $sleep0
        su ${CLUSTER_USER} -c ${APPPATH}${PACEMAKERD} &
        return 0
}
 
stop() {
# first try, graceful shutdown
        pid=`pgrep -U ${CLUSTER_USER} -f ${PACEMAKERD}`
        if [ "x$pid" != "x" ]; then
           ${APPPATH}${PACEMAKERD} --shutdown &
           sleep $SLEEPINTERVALL
        fi
# second try, kill the rest
        killapp ${APPPATH}${COROSYNC}
                          sleep 1
        killapp ${PACEMAKER_PROCESSES}
        return 0
}
 
let sleep0=$SLEEPINTERVALL/2
case "$1" in
'start')
        start
        ;;
'restart')
        stop
        start
        ;;
'stop')
        stop
        ;;
*)
        echo "Usage: -bash { start | stop | restart}"
        exit 1
        ;;
esac
exit 0


More information about the openindiana-discuss mailing list