[OpenIndiana-discuss] problems iSCSI booting from clone LUN /= 0

Udo Grabowski (IMK) udo.grabowski at kit.edu
Thu Dec 15 15:22:00 UTC 2016


Hi,

desperatly trying to iSCSI boot from clones. I've successfully
booted one machine from a ZFS volume block device iSCSI target, and
(ZFS) cloned that. Reconfigured with our usual procedure to get to
a different (identical hardware) host, but that host will not boot,
hangs with 'iscsi connection unable to connect to target ...'

Background:
We start at pxe grub, picking up an DHCP IP (old Sun dhcp server)
with a next-server and bootfile argument that points to server
that delivers an undionly.ipxe (see ipxe.org) executable with
an compiled-in script that sets a static net0 with gateway and
netmask and a 'sanboot' line pointing to the iscsi target:
                               v LUN
sanboot 
iscsi:172.23.156.12:::0:iqn.2010-09.org.openindiana:02:dc58762d-b066-c109-ab0a-cdf3326c91c5 
|| shell

This target is the configured block device mentioned above, with
an grub installation on slice s0 of the disk. Bios gets populated,
iBFT Grub fires up, starts the stage1,stage2, and boot archive and
finally the installed OS. So far all is ok.

Now to the clone: As different iSCSI targets on a cloned
block device see the other targets contents as different
LUNS (0 and 1 in this case), and we need to import rpool
into a booted live CD environment to reconfigure the clone,
and you cannot import rpool if you see two of them with
the identical root guid (since its a clone), we had to
build target and host groups on the target server to
restrict access to the LUNs, so that we can import the
rpool, zpool reguid rpool to change the uid, and reconfigure
it appropriately. That also works.

The LUN 0 original boots correctly (with one quick initial
message that it cannot find its target), the iscsi block device
is visible as a scsi_vhci managed disk, but the LUN 1 clone
does not. Its compiled ipxe script has the LUN 1 in the line
                                v LUN
sanboot 
iscsi:172.23.156.12:::1:iqn.2010-09.org.openindiana:02:3af2c0d1-51e8-e6fa-9cbc-be673fd56ce4 
|| shell

It starts grub, the boot archive, spits out that
'cannot connect' message once, then apparently, from the snoop, reads
a lot from the target, but finally spits out a second 'cannot connect'
and then drops its network (which means it looses the iSCSI connect). After
a long while it finally spits out a third message and then hangs forever.
The last thing I see in the snoop are kernel symbols and the kernels header
line

...
ddi_prop_op
usba_hubdi_close
usba_hubdi_ioctl
usba_hubdi_power
ddi_quiesce_not_needed
getminor
misc/usba
@(#)SunOS 5.11 oi_151a9 November 2013  < the installed kernel
.rela.eh_frame
.rela.text
.rodata
.rodata1
.rela.data
.bssf
.bss
.symtab
.strtab
.comment
.SUNW_ctf
.dynamic
.shstrtab
.SUNW_ctf
...
a bit further binary stuff, then
....
InitiatorName=iqn.2010-04.org.ipxe:imksuns98   <-v---- all correct !
TargetName=iqn.2010-09.org.openindiana:02:3af2c0d1-51e8-e6fa-9cbc-be673fd56ce4
SessionType=Normal
HeaderDigest=None
DataDigest=None
MaxRecvDataSegmentLength=8192
DefaultTime2Wait=2
DefaultTime2Retain=20
ErrorRecoveryLevel=0
IFMarker=No
OFMarker=No
InitialR2T=Yes
ImmediateData=Yes
MaxBurstLength=262144
FirstBurstLength=65536
MaxOutstandingR2T=1
MaxConnections=1
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
4/E@
TargetAlias=iscsi-imksuns98
TargetPortalGroupTag=1
MaxRecvDataSegmentLength=32768
HeaderDigest=None
DataDigest=None
DefaultTime2Wait=2
DefaultTime2Retain=20
ErrorRecoveryLevel=0
IFMarker=No
OFMarker=No
InitialR2T=Yes
ImmediateData=Yes
MaxBurstLength=262144
FirstBurstLength=65536
MaxOutstandingR2T=1
MaxConnections=1
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
.....
and it stops the network.

We checked target names, initiator names and its corresponding host
group member names, all that seems to be correct. Somehow either
the network goes elsewhere, or there something not working quite right
with the LUN number 1. I've read somewhere that scsi commands do not
work if LUN 0 is not visible, as this serves as the proxy to tunnel
the commands ?
We also wonder how the network device gets configured, since the
recipe says 'no network setup, static service physical:default
enabled, nwam disabled, no /etc/hostname.<netdev> files, no
/etc/dhcp/dhcp.<netdev> files', and that miraculously gives a
working netdevice on the original, but seemingly not on the clone.
We mangled with the /etc/path_to_inst, fiddled with the /dev/e1000g*
links to get the identical setup, checked /etc/hosts entries etc.

It's really hard to somehow debug this, you cannot give -as single
user switch to the kernel since that stops the network (and therefore
the OS....), so it's impossible to get a shell in the boot process,
and since the OS itself is hardly fired up at all, there's nothing
to see or to change there since it does not get there.

Any ideas what could be wrong here ? Where exactly is the LUN
configured in the early kernel files (like /etc/path_to_inst) ?
How to get a glimpse of the iBFT table entries while booting ?
Is LUN 0 really necessary to get the iscsi connection working ?
Or is there a problem with the iSCSI boot procedure with LUNs /= 0
<http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/ibft.c>
?

We will precisely document how to setup such a configuration
when it finally works, since that is a real pain in the ***,
given that most available documents just pretend how easy that is
(its NOT AT ALL). We are just before the finishing line, but it seems
to get harder and harder when approaching it.

Thanks for any help !
-- 
Dr.Udo Grabowski   Inst.f.Meteorology & Climate Research IMK-ASF-SAT
http://www.imk-asf.kit.edu/english/sat.php
KIT - Karlsruhe Institute of Technology           http://www.kit.edu
Postfach 3640,76021 Karlsruhe,Germany T:(+49)721 608-26026 F:-926026



More information about the openindiana-discuss mailing list