[OpenIndiana-discuss] problems iSCSI booting from clone LUN /= 0

Udo Grabowski (IMK) udo.grabowski at kit.edu
Mon Dec 19 13:14:19 UTC 2016


On 18/12/2016 14:30, Till Wegmüller wrote:
> Hi Udo
>
>
> I don't really know what the concrete problem could be but I spotted some things
> in you mail where I can give some pointers that might help.
>
>  > It's really hard to somehow debug this, you cannot give -as single
>  > user switch to the kernel since that stops the network (and therefore
>  > the OS....), so it's impossible to get a shell in the boot process,
>  > and since the OS itself is hardly fired up at all, there's nothing
>  > to see or to change there since it does not get there.
>
> Odd. Illumos usually gives a lot of output when started in verbose mode (-v if I
> recall correctly). That output usually tells you setting up network and somesuch
> so it schould give some information.
>
> Have you also tried to boot with -as and then continue the boot process by hand?
> It should be enough start the iscsi Client via svcadm. The Boot Process does
> nothing more.
>
> What does the service log of the iscsi client service say when booted from the
> Original and what when it's booted from the Clone?
>

Hello Till,

tried -as, freezed because the network stops. Booted with -v all
the time, see these screenshots (ignore the visual sugar around :-):

<https://postimg.org/image/6dei81bf5/>
<https://postimg.org/image/3mrml7nf1/>
<https://postimg.org/image/8nw9edj7l/>
<https://postimg.org/image/dspcc3nr3/>

stops after the last screenshot, while flapping the network
two times up and and finally down. There seems to be no entry
for the e1000g device, it should be pci108e,4343 at 19, but there
is none, although it's configured in /etc/path_to_inst.

Mysteriously, it shows 2 pathes to the iscsi volume, don't know
where the second path comes from, there's no explicit iscsi connection
configured. With the network, the connections go up and down, but at
least path 1 is up when the machine starts to hang. The target server
goes from 1 to 0 connections after the timeout, so the client had
access to the volume.

The pings show that the network is up for a while, then delays
for reconfiguration, then is shortly up again and then dies:

ro sunth2 ~ # ping -s imksuns98
PING imksuns98: 56 data bytes
64 bytes from imksuns98 (172.23.157.98): icmp_seq=7. time=0.306 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=8. time=0.147 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=9. time=0.158 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=10. time=0.158 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=11. time=0.171 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=12. time=0.208 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=13. time=0.173 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=14. time=0.225 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=15. time=0.246 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=16. time=0.177 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=17. time=0.169 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=18. time=0.193 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=19. time=0.248 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=20. time=0.160 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=21. time=25483.673 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=22. time=24483.764 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=23. time=23483.767 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=24. time=22483.796 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=25. time=21483.819 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=26. time=20483.864 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=27. time=19483.866 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=28. time=18483.905 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=29. time=17483.956 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=30. time=16483.946 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=31. time=15483.994 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=32. time=14483.989 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=33. time=13484.014 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=34. time=12484.065 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=35. time=11484.072 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=36. time=10484.122 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=37. time=9484.124 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=38. time=8484.150 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=39. time=7484.162 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=40. time=6484.189 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=41. time=5484.229 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=42. time=4484.238 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=43. time=3484.274 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=44. time=2484.304 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=45. time=1484.318 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=46. time=484.354 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=47. time=0.128 ms
64 bytes from imksuns98 (172.23.157.98): icmp_seq=48. time=0.083 ms
<silent hereafter>

Tried with a second clone on a different machine (again the same hardware),
only to see exactly the same problem.

I'm out of options now, any ideas how to get that network running, or,
at least, how analyze that via the kernel debugger ? I really like
to get this configuration up and working.

Greetings, and thanks for any help !
-- 
Dr.Udo Grabowski   Inst.f.Meteorology & Climate Research IMK-ASF-SAT
http://www.imk-asf.kit.edu/english/sat.php
KIT - Karlsruhe Institute of Technology           http://www.kit.edu
Postfach 3640,76021 Karlsruhe,Germany T:(+49)721 608-26026 F:-926026



More information about the openindiana-discuss mailing list