[OpenIndiana-discuss] shell script mutex locking or process signaling

Fri May 31 11:41:44 UTC 2013

> From: Gary Mills [mailto:gary_mills at fastmail.fm]
> 
> SMF is actually well documented, but you do have to jump around from
> man page to man page.  Start with `man smf'.  There are also lots of
> examples to follow, both of manifests and methods.  They are all text
> files.

Ok, so here's a quasi-recent example of a difficulty I've encountered trying to use SMF.  I have a service, which is configured for a single instance.  I then wanted to break it into individual instances, svc:foo and svc:bar.  I looked at examples, I did what I thought made sense, tried to import it, and SMF puked on the xml, saying something generic like "invalid configuration."  If SMF is actually well documented and good to work with, I need something to (a) guide me creating good xml, and (b) validate the xml, letting me know if something's wrong, and how to fix it.

Last I knew, there are a bunch of good html editors out there.  You start typing something, and based on context, the tool knows what's valid to use in the spot where you are working, so it will suggest and autocomplete tags, and properties inside of tags.  If you start typing <li> when you're not inside <ul> or <ol> they throw warning signs at you.  Last I knew, there isn't any such thing as a DTD aware XML editor.  So when I sit down and start typing XML, I have no idea what tags belong in the place where I'm typing.

In my example above, it turned out, I was putting the exec method before the dependency name, or something like that.  Order matters in XML, and I got it wrong just by trying to read and copy some example into my service manifest.  To debug, I forget the exact process I followed, but I recall it being painfully iterative and manual.

> I'd recommend using the facilities of SMF, rather than trying to do it
> all outside of SMF.  These facilities are extensive and complete. 

You say SMF has capabilities that make this all go away.  But I read "man smf" and I don't see it there.  I don't know what to look for, and I'm not going to read the DTD from top to bottom, hoping to find something that fits the bill.

> Have you considered the contract facility?  It's used internally by
> SMF, but you can use it elsewhere as well.  The shell commands are
> ctrun(1), ctstat(1), ctwatch(1), and pkill(1).

There may be a solution there, but I'm not very familiar with solaris contract subsystem - it looks like you define the behavior of one process, and you use another process to monitor it.  If this is correct, it would make a very convoluted solution - A SMF service launches the "start" method, and while it's running, the same service launches the "stop" or "refresh" method ... Rather than executing the method directly, in each situation, utilizing contracts, the method would actually start a contract to monitor a sub-method for executing the "start," "stop," or "refresh."  And if the user (or system) is repeating calls to start/stop/refresh, each of these instances need to be made aware of each other, so the later method calls signal the earlier ones that they should terminate their contracts ...   *blah*

In any event, for the problem at hand in this thread, I used the easy solution:

Script starts.  Script uses mkdir $LOCKDIR which is /tmp/something
Script chugs along, and at select moments, checks for the existence of $BREAKLOCKDIR, which is /tmp/somethingelse
If a script starts and fails to get lock on LOCKDIR, then the script locks BREAKLOCKDIR and starts polling for the non-existence of LOCKDIR.
LOCKDIR is a signal that a script is already running.
BREAKLOCKDIR is a signal that a later process wants to steal lock.

If LOCKDIR becomes stale (for example, system power cycled while lock existed) any script that *has* lock guarantees to release it in less than 60 seconds.  So if the BREAKLOCK script detects LOCK exists for more than 60 seconds, assume it's a stale lock and steal it forcibly.