[oi-dev] [developer] BMC driver on Illumos

Jim Klimov jimklimov at cos.ru
Thu Mar 28 20:18:30 UTC 2013


On 2013-03-28 20:12, Garrett D'Amore wrote:
>
> On Mar 28, 2013, at 12:05 PM, Jim Klimov <jimklimov at cos.ru> wrote:
>
>> So, for the case of dedicated-hardware watchdogs, this is the part of
>> your post which I can't find as relevant: "The usual thing is to hook
>> this up to a system timer, which will catch hard hangs."
>
> What I mean is that what most systems do is not express an API out to userland, but just have something that runs out of the timer that tickles the hardware watchdog register.  This guards against the hard hang of the entire system/scheduler, but it does nothing to ensure that some upper layer services are still being handled.

I see... well, whatever way a daemon is implemented (as is relevant for
Linux watchdogs, and at least legacy OpenSolaris and maybe illumos BMC
port), it does rely on a timer interrupt and software not having hung
in order to tickle the hardware watchdog's separate timer.

In Linux, I might guess, the API is expressed as /dev/watchdog node,
into where you can echo a character. The daemon is usually trivial,
reference code is part of some README and it is a dozen lines long.

In OpenSolaris, I believe, there was a single (closed-source) driver
for all supported watchdog models and the bmc-watchdog program could
talk to it. In a way it was the user-accessible API to set and query
the watchdog. In case it helps, here are a few output examples:

# bmc-watchdog -g
Timer Use:                   SMS/OS
Timer:                       Running
Logging:                     Enabled
Timeout Action:              Hard Reset
Pre-Timeout Interrupt:       None
Pre-Timeout Interval:        0 seconds
Timer Use BIOS FRB2 Flag:    Clear
Timer Use BIOS POST Flag:    Clear
Timer Use BIOS OS Load Flag: Clear
Timer Use BIOS SMS/OS Flag:  Clear
Timer Use BIOS OEM Flag:     Clear
Initial Countdown:           900 seconds
Current Countdown:           842 seconds


# bmc-watchdog -h
Usage: bmc-watchdog <COMMAND> [OPTIONS]... [COMMAND_OPTIONS]...

COMMANDS:
   -s         --set                            Set BMC Watchdog Config.
   -g         --get                            Get BMC Watchdog Config.
   -r         --reset                          Reset BMC Watchdog Timer.
   -t         --start                          Start BMC Watchdog Timer.
   -y         --stop                           Stop BMC Watchdog Timer.
   -c         --clear                          Clear BMC Watchdog Config.
   -d         --daemon                         Run in Daemon Mode.

OPTIONS:
   -D STRING  --driver-type=IPMIDRIVER             Specify IPMI driver type.
              --disable-auto-probe                 Do not probe driver 
for default settings.
              --driver-address=DRIVER-ADDRESS      Specify driver address.
              --driver-device=DEVICE               Specify driver device 
path.
              --register-spacing=REGISTER-SPACING  Specify driver 
register spacing.
   -f STRING  --logfile=FILE                       Specify an alternate 
logfile
              --config-file=FILE                   Specify an alternate 
config file
   -n         --no-logging                         Turn off all logging
   -?         --help                               Output help menu.
   -V         --version                            Output version.
              --debug                              Turn on debugging.



> Now I've not looked at Linux and how it uses watchdogs… but I've experience with a few different embedded systems, and the above handling is almost precisely what I've seen done.  NetBSD was nice because it instead offered a watchdog facility that extended into userland, allowing the service check to be done by a userland daemon, which is far more interesting than just that the clock interrupt handler is still working properly. :-)




More information about the oi-dev mailing list