[OpenIndiana-discuss] Cannot open: Illegal byte sequence with a file containing a question mark

Flo florian at acw.at
Fri Apr 27 10:02:07 UTC 2012


Hello,

@all,
thank you very much for your help and the very usefull explanation!

Greeting,
Florian


Am 2012-04-26 17:31, schrieb James Carlson:
> Flo wrote:
>>> If that shows that the property is set "on", then that's what's causing
>>> the failure.  Sadly, it's configurable only when creating a file system,
>>> so if you wanted to change it, you'd have to create a new file system
>>> and copy everything over.
>>
>> utf8only is on. I created a new folder with utf8only=off and this worked!
>>
>> Are there any disadvantages with utf8only disabled?
>> I use Napp-It and Napp-It enables it automatically
>
> You'd probably want to talk with the author of "Napp-It" to find out why
> he set that parameter.
>
> More generally speaking, there are a few file-system-level choices that
> you can make that determine how names are treated.  Allowing only UTF8
> is one of them.  Selecting case-insensitive matches is another.
>
> Which one you choose depends mostly on what you're doing with those
> files.  UTF8 has some great advantages -- it's an unambiguous encoding
> of UNICODE characters, so it fixes the usual national language character
> set problems you have with something like ISO 8859.  And because the
> character values are exactly equal for at least the ASCII characters, it
> mostly works without having to think too much about it.
>
> One of the downsides, as you've found, is that it's a somewhat
> restrictive format.  UNIX has traditionally allowed you to use any
> arbitrary byte value other than hex 00 (NUL) and 2F (/) in the name of a
> file (obviously, 2F is used for path separation), and in any sequence.
> Because UNIX allows "anything" here, two users with different LANG
> settings will see different characters when they look at the same files.
>
> UTF8, though, has rules for how multibyte characters are formed, and
> those rules result in the possibility that some arbitrary sequences of
> bytes are not necessarily legal encodings.
>
> That leads to an application compatibility problem.  If an application
> issues an open(2) (or creat(2)) system call with a file name that has a
> legal UNIX name but has an illegal UTF8 sequence, what do you do?
> Failing the system call means a break in compatibility.  Allowing the
> access means that the integrity of the file names is compromised.
> That's why there's an option, and why the normal ZFS default for the
> option is "off" -- to preserve compatibility.
>
> There's probably a deeper issue here concerning what was going on with
> the 'tar' program you were running.  I had _thought_ that file names
> inside the tar format were encoded using UTF8, which would imply that
> the problem is that 'tar' erroneously translated that to a national
> language code point when trying to create the file.  If so, then that
> could just be a configuration problem on your part -- e.g., attempting
> to use a national language character set when the rest of your world is
> set up for UTF8.
>
> But maybe I'm wrong about that.  Someone who knows the internals of tar
> better should probably look at it.
>




More information about the OpenIndiana-discuss mailing list