[oi-dev] [developer] BMC driver on Illumos
Garrett D'Amore
garrett.damore at dey-sys.com
Thu Mar 28 17:21:17 UTC 2013
On Mar 28, 2013, at 9:39 AM, Jim Klimov <jimklimov at cos.ru> wrote:
> On 2013-03-28 16:18, Sašo Kiselkov wrote:
>> I'm building a system that's relying as much as possible on stock parts,
>> so custom kernel modules and hacking is something I'd like to avoid. I'm
>> not going to be around forever to keep the system going, or to
>> continually work on ways of deploying an old hack on a new install.
>
> I know *you* do have better contributions to make, but a watchdog driver
> is AFAIK about knowing what byte to write to what IO port to set, reset
> and query the timeout, and possibly configure what the watchdog does
> when the timer expires without updates. This info might be gleaned from
> Linux and BSD drivers for different watchdog chips.
>
> I think it might be a useful project for a student to make.
>
> Possibly too low-profile for a GSoC, but good to learn about driver
> development, porting code, etc. And quite useful for the community ;)
> As a result of such a project, we'd get one more kernel-hacker ;)
I've done such work for NetBSD systems. These things are usually pretty trivial from a hardware standpoint.
The harder thing is when these things are exposed as "registers" that are on an otherwise bog-standard part. In that case, you have to either modify an existing driver, or come up with some more tricky hack. (Its easier when this function is exposed as a separate PCI function or something like that. But that's very rarely the case with something like this. Usually they are part of the low level system chipset -- they kind of need be in order to do something like generate an NMI or cause a power reset.)
Then the other side of the problem is determining how you are going to trigger this. The usual thing is to hook this up to a system timer, which will catch hard hangs. But many "apparent" hangs are really not hangs in this sense -- there could be a high-priority process that is starving other processing for example, or a deadlock in the filesystem. Those kinds of "hangs" won't be detected by such a deadman.
The ideal type of design would be to have a user-space accessible deadman, that allowed user processes to configure, and then tickle the deadman to keep it alive. This would allow you to have a critical user space process validate that *it* is still serving whatever it needs to. This kind of task requires a little design work -- and probably should be hooked back into some common deadman framework. NetBSD has such a framework if I recall correctly. This project would be in-scope for GSoC effort, because I can see a few other options like using the system timer as a deadman (its already there btw!) if no other hardware watchdog is present. The framework should abstract all those and present a single syscall or ioctl interface to manage it.
- Garrett
>
> //Jim
>
>
> _______________________________________________
> oi-dev mailing list
> oi-dev at openindiana.org
> http://openindiana.org/mailman/listinfo/oi-dev
More information about the oi-dev
mailing list