[OpenIndiana-discuss] NFS hang during copy

Roy Sigurd Karlsbakk roy at karlsbakk.net
Sun Mar 20 14:49:00 UTC 2011


Hi all

I'm fighting a problem with an OpenIndiana 148 server and NFS3 mounts from Linux clients. A simple cron job is run that moves some data files from another server to the OI box. This runs well for a while, until at some point, the client hangs and reports NFS server connection failure. The calltrace from linux is 

[  484.712558] INFO: task mv:2353 blocked for more than 120 seconds.
[  484.712562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.712566] mv            D 0000000100000a8b     0  2353   2352 0x00000001
[  484.712573]  ffff880234b75ba8 0000000000000086 ffff880234b75b18 0000000000015980
[  484.712579]  ffff880234b75fd8 0000000000015980 ffff880234b75fd8 ffff8802349896e0
[  484.712584]  0000000000015980 0000000000015980 ffff880234b75fd8 0000000000015980
[  484.712589] Call Trace:
[  484.712599]  [<ffffffff81100d60>] ? sync_page+0x0/0x50
[  484.712606]  [<ffffffff8159e053>] io_schedule+0x73/0xc0
[  484.712610]  [<ffffffff81100d9d>] sync_page+0x3d/0x50
[  484.712614]  [<ffffffff8159e6cf>] __wait_on_bit+0x5f/0x90
[  484.712618]  [<ffffffff81100f53>] wait_on_page_bit+0x73/0x80
[  484.712623]  [<ffffffff8107f250>] ? wake_bit_function+0x0/0x40
[  484.712628]  [<ffffffff8110b975>] ? pagevec_lookup_tag+0x25/0x40
[  484.712632]  [<ffffffff8110141d>] filemap_fdatawait_range+0x10d/0x1a0
[  484.712637]  [<ffffffff811014db>] filemap_fdatawait+0x2b/0x30
[  484.712640]  [<ffffffff811017e4>] filemap_write_and_wait+0x44/0x50
[  484.712660]  [<ffffffffa038dfcc>] nfs_setattr+0x14c/0x160 [nfs]
[  484.712666]  [<ffffffff8116c55b>] notify_change+0x16b/0x310
[  484.712671]  [<ffffffff8117b15c>] utimes_common+0xdc/0x1b0
[  484.712675]  [<ffffffff8117b2d1>] do_utimes+0xa1/0xf0
[  484.712678]  [<ffffffff8117b3e3>] sys_utimensat+0x33/0x90
[  484.712684]  [<ffffffff8100a307>] tracesys+0xd9/0xde

When I strace the mv job from the client, it hangs on utimensat()

read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
read(3, "\0\0\6q7\17\\\30\3L\342\0\277\2\16\355!\33\362\366\22\201\223\1h\201\16\355\22\n\227\340"..., 1048576) = 848404
write(4, "\0\0\6q7\17\\\30\3L\342\0\277\2\16\355!\33\362\366\22\201\223\1h\201\16\355\22\n\227\340"..., 848404) = 848404
read(3, "", 1048576)                    = 0
utimensat(4, NULL, {{1300591624, 0}, {1300508167, 0}}, 0

This server has been working well for well over a year, and it normally works well, but in this case, we see repeatedly hangs. The clients experiencing this problem, will hang with 100% "wio" on one core, and the only way I've found to solve it temporarily is to reboot the client. I can't find anything in the server logs, but since the problem is from both an elderly Fedora box and an updated Ubuntu 10.04.2 machine, and that it has been working well for quite some time, I guess the upgrade to OI may be to blame.

Does anyone know how I can debug this further?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.



More information about the OpenIndiana-discuss mailing list