[oi-dev] illumos-gate, dmake and VM-farm question

Irek Szczesniak iszczesniak at gmail.com
Sun Nov 10 19:10:24 UTC 2013


On Fri, Nov 8, 2013 at 10:50 AM, Jim Klimov <jimklimov at cos.ru> wrote:
> Hello all,
>
>   Now that I have a hammer and everything looks like a nail -
> in that I've built a new home-NAS with cache and log on SSDs
> (my first, heh) and over-NFS compilations should be faster
> than previously possible on HDD-based rigs with VirtualBox
> VMs and NFS remote compile nodes...
>
>   So, I am wondering whether (and how?) the dmake used in the
> illumos-gate compilation process can be used to distribute
> the compilation load over several compute nodes? Namely, the
> storage box is not a super-computer (an N54L) and much of its
> CPU is spent on processing the data pool (pretty quick though),
> but at home it is surrounded by a number of desktops which in
> theory could each run a VirtualBox with OI inside and provide
> their CPU time to take part in a distributed compilation of a
> large project such as the gate. Previously I'd say this would
> bottleneck on networked IO, now I hope this barrier is gone.
>
>   Has anyone done that recently? Is the "d" in "dmake" used by
> anyone in the community? Does the idea have its merits in i.e.
> reduced compilation time? Should all building environments be
> set up identically (arch, compiler, etc.) or what? How-to's? :)
> Also, if such setups are used, are they "rigid" in the set of
> predefined available compilation nodes which should all be up,
> or just a subset of whichever ones are available can be used
> dynamically (i.e. VMs are fired up on PCs with no immediate load
> from users, and turned off in case of heavy load like gaming)?

Yes, we do use dmake in distributed mode for bioinformatics queue
processing and for building our own software library.

However, there are some caveats:
1. dmake only works well with rsh but not with ssh. The reason is that
ssh overhead in terms of CPU time and latency is so bad that it
virtually screws the advantage of dmake up. However there is no issue
with rsh in a local network, and if you really need better secure
authentication use Kerberos5 auth (which is supported in Solaris rsh
and has none of the disadvantages of ssh)
2. Make sure you have working NTP which is correctly synced. Otherwise
you run into NFS issues with timestamps
3. Preferred NFS version is 4 (NFSv4). NFSv3 may run into issues with
timestamps and locking even if NTP is running.
However, this is incompatible with building ON because the NFSv4
support in Illumos has bugs which causes the build to fail randomly on
a NFSv4 filesystem.

Irek




More information about the oi-dev mailing list