[OpenIndiana-discuss] bmc-watchdog SMF dependencies

Jim Klimov jimklimov at cos.ru
Tue Jun 26 20:26:14 UTC 2012


Hello all, I've got a new small matter for generic discussion:

   Some of my systems have hardware watchdogs, either on motherboards
features or in IPMI addons.

   A small intro for newcomers: Watchdogs include a timer that can be
started by BIOS or a watchdog driver in the OS, and the driver should
regularly restart this timer (a trivial driver is just a loop with
long sleeps and a write of a byte into a certain port's address).
If the computer freezes and the driver no longer functions, the
hardware watchdog issues a reset on the motherboard, or a similar
administrative action (if configurable).

   Now, in OpenSolaris (and portable to OI) there is a bmc-watchdog
package for some proprietary hardware implementations, as well as
a newly ported open-source driver is brewing. There is also an SMF
service to wrap the watchdog. From what I see, upon service startup
the HW timer is started, and for the duration of the service uptime
the timer-resets are regularly issued. Upon service shutdown there
are two possible approaches (as I tweaked the method script a lot,
I am not sure what was there originally): either the daemon for
regularly-restarting the timer is just killed (and the timer keeps
ticking), or the timer is also stopped. On my system it happened
to be the former, and during a shutdown which took longer than usual
to proceed (for valid reasons), the box got reset by the timer.

   Now I looked at the SMF manifest, and see that the service only
depends on filesystem/usr. In my practice this meant that upon OS
shutdown, the bmc-watchdog daemon was quickly killed (as nothing
depends on this service) and the timer ticked down to zero - boom!

   Question is: what is a valid way to avoid the watchdog killing
the system upon lengthy shutdowns?

   I came up with a few ideas:
1) Redefine the stop method to not kill the daemon - not good, for
    pedantic reasons at least ;)
2) Redefine the stop method to kill the daemon and stop the timer -
    not good because the box can potentially also freeze during
    shutdown, and in that case we would want it automagically reset;
3) Make milestone/single-user a dependency of bmc-watchdog (I also
    tried to redefine bmc-watchdog instance to have a dependent -
    but this did not get picked up properly) - in this case the
    daemon works until all heavy services in miletone/multi-user
    get shut down properly, and only gets killed then (and the HW
    timer ticks for a few more seconds, until the system is rebooted).

So far I like the idea#3 (alone) best. Are there any reasons not to
do so, or to do something different? Ultimately, I hope, the best
method should end up in the illumos-gate ;)

Thanks,
//Jim Klimov



More information about the OpenIndiana-discuss mailing list