[oi-dev] More on USB failure to automount

Gary Mills gary_mills at fastmail.fm
Fri Aug 27 15:55:51 UTC 2021


Joshua M. Clulow has identified the location of the error:

    In particular, on my system, I see three write(2) calls that look like this:
    
       EC_devfs ESC_devfs_devi_add /pci at 0,0/pci8086,2064 at 14/storage at 2
    
       EC_devfs ESC_devfs_devi_add /pci at 0,0/pci8086,2064 at 14/storage at 2/disk at 0,0
    
       EC_dev_add disk /pci at 0,0/pci8086,2064 at 14/storage at 2/disk at 0,0
    /dev/rdsk/c4t0d0 0
    
    This seems about right.  These writes are into a self-pipe (i.e., both
    ends of the pipe are held open within this single hald process) that
    is established in the sysevent_init() routine, and stored in the
    "sysevent_pipe_fds" global where I was able to confirm with pfiles
    that the pipe is still open.
    
    Where things appear to fall down is once we get into the glib area.
    ...
    Though we do enter sysevent_iochannel_data() on schedule for each
    sysevent, it seems like the call to g_io_channel_read_line() always
    returns a value of 3 on my system -- which seems like an EOF?  Because
    the value is not G_IO_STATUS_NORMAL, we don't even try to call
    sscanf() to parse the lines we wrote above.  This means we never call
    into sysevent_dev_add() and thus we never pass the hotplug event on to
    the rest of HAL.

I've managed to add a few details using truss.  Here's the command I
used to examine the top hald process:

    # truss -f -o /tmp/NNN.truss -t read -u \
        libglib-2.0::g_io_channel_read_line -p PID

On a system where USB automount still worked, with a BE dated
2020-11-27, I got the following output:

    1994/1 at 1:       -> libglib-2.0:g_io_channel_read_line(0x8279938, 0x8046200, 0x8046204, 0x0)
    1994/1:         read(12, " E C _ d e v f s   E S C".., 1024)    = 80
    1994/1 at 1:       <- libglib-2.0:g_io_channel_read_line() = 1
    1994/1 at 1:       -> libglib-2.0:g_io_channel_read_line(0x8279938, 0x8046200, 0x8046204, 0x0)
    1994/1 at 1:       <- libglib-2.0:g_io_channel_read_line() = 1
    1994/1 at 1:       -> libglib-2.0:g_io_channel_read_line(0x8279938, 0x8046200, 0x8046204, 0x0)
    1994/1:         read(12, " E C _ d e v f s   E S C".., 1024)    = 89
    1994/1 at 1:       <- libglib-2.0:g_io_channel_read_line() = 1
    1994/1 at 1:       -> libglib-2.0:g_io_channel_read_line(0x8279938, 0x8046200, 0x8046204, 0x0)
    1994/1 at 1:       <- libglib-2.0:g_io_channel_read_line() = 1
    1994/1 at 1:       -> libglib-2.0:g_io_channel_read_line(0x8279938, 0x8046200, 0x8046204, 0x0)
    1994/1:         read(12, 0x08341480, 1024)                      Err#11 EAGAIN
    1994/1 at 1:       <- libglib-2.0:g_io_channel_read_line() = 3
    1994/1 at 1:       -> libglib-2.0:g_io_channel_read_line(0x8279938, 0x8046200, 0x8046204, 0x0)
    1994/1:         read(12, " E C _ d e v _ a d d   d".., 1024)    = 98
    1994/1 at 1:       <- libglib-2.0:g_io_channel_read_line() = 1

It shows the entry and return of g_io_channel_read_line() as well as
the underlying read() system call that occurs between the two.  Note
that all three lines in the pipe were read back by hald.  In this
case, the return value from g_io_channel_read_line() is 1.  Note also
that one read system call raised the EAGAIN errno.  This means that
there is data in the pipe but that it is not yet ready to read.  The
code should try again later to read the data.  In this case, the
return value from g_io_channel_read_line() is 3.  The hald process
retries the read() and is successful.

On a system where USB automount failed, running a BE dated 2021-05-14,
I got this output from a similar truss command:

    318/1 at 1:        -> libglib-2.0:g_io_channel_read_line(0x8281920, 0x8046200, 0x8046204, 0x0)
    318/1:          read(12, " E C _ d e v f s   E S C".., 1024)    = 64
    318/1 at 1:        <- libglib-2.0:g_io_channel_read_line() = 1
    318/1 at 1:        -> libglib-2.0:g_io_channel_read_line(0x8281920, 0x8046200, 0x8046204, 0x0)
    318/1:          read(12, " E C _ d e v f s   E S C".., 1024)    = 73
    318/1:          read(12, 0x08292292, 1024)                      Err#11 EAGAIN
    318/1 at 1:        <- libglib-2.0:g_io_channel_read_line() = 3
    318/1 at 1:        -> libglib-2.0:g_io_channel_read_line(0x8281920, 0x8046200, 0x8046204, 0x0)
    318/1:          read(12, " E C _ d e v _ a d d   d".., 1024)    = 82
    318/1:          read(12, 0x082922E4, 1024)                      Err#11 EAGAIN
    318/1 at 1:        <- libglib-2.0:g_io_channel_read_line() = 3
    
The first call to g_io_channel_read_line() was successful.  Subsequent
calls to that glib function failed with return code 3.  Note that
there are now two read() system calls within the glib function, and
the the second one always gets EAGAIN.  That is the bug.  All three
lines were read, but the data are discarded because of the return code
from g_io_channel_read_line().

What we need is a robust handling of EAGAIN after read() in glib.  To
give the pipe time to run, a nanosleep() may be appropriate.


-- 
-Gary Mills-		-refurb-		-Winnipeg, Manitoba, Canada-



More information about the oi-dev mailing list