[oi-dev] I created an OpenIndiana port of UXP/New Moon out of boredom.
Jean-Pierre André
jean-pierre.andre at wanadoo.fr
Mon Sep 9 08:24:49 UTC 2019
Congratulations. A useful alternative to Firefox as its resource
needs are getting out of control.
Jean-Pierre
Jeremy Andrews wrote:
> I wanted to post this story somewhere in case anyone finds it
> interesting, it's OpenIndiana-related, and I don't really have a blog.
> The TL;DR version is that I've got UXP/New Moon working on OI, with all
> my changes visible in a public GitHub repository, but I haven't yet
> obtained permission to use the official branding (which would be Pale
> Moon branding in this case, and it might be difficult because they
> haven't shown interest in officially supporting Solaris and OI), and
> find the requirements involved in packaging it for OI confusing because
> it seems like the documentation expects me to do it a specific way that
> involves downloading the whole oi-userland tree and supplying patches
> and links to upstream code, and it's not really clear what I should do
> at this stage.
>
> So, about this time last month, I was looking for something to distract
> myself from a stressful situation in real life and keep my mind
> occupied. I was looking at the Pale Moon source code and noticed they'd
> just removed Solaris support. So I was thinking to myself, "How hard
> would it be to add it back in and then make the program actually compile
> and run?" So I simply installed OpenIndiana in a virtual machine and got
> to work despite having no real experience with Solaris, Firefox, or Pale
> Moon. The only thing I knew about Solaris going in is that it's the
> "other" Unix they offered on x86 systems at my college besides Linux so
> that they could teach about POSIX compliance, avoiding "Linuxisms," and
> say that they teach Unix and not just Linux. I wasn't able to stick with
> my degree because of Calculus, but I always wondered what working with
> it would have been like.
>
> There were five things I learned that were encouraging to me early on.
>
> 1. Oracle Solaris and the illumos distributions build Firefox with GCC
> now, and haven't used Sun Studio to do so in ages, so all the code that
> makes those assumptions is outdated. In fact, most of OpenIndiana is
> built with GCC 7 specifically. They do use their own linker, but I knew
> going in I wouldn't have to deal with any clang weirdness.
>
> 2. Most of the GNU toolchain is available, but you have to prefix
> commands with "g" to get the GNU version instead of the Solaris version.
>
> 3. Mozilla regards Solaris as a Tier 2 or 3 platform, and a ton of
> high-quality patches for it were created during or just after the
> Firefox 52ESR lifecycle by Mozilla at the request of an incredibly
> overworked Oracle employee trying to get the biggest Solaris issues
> fixed upstream.
>
> 4. All of the UXP project's major dependencies, like SQLite, NSS, NSPR,
> libevent, libffi, and other libraries are available and more or less
> up-to-date on Solaris and OI. NSS and NSPR have been on it since the
> beginning, with Netscape getting involved with Sun/Java offerings early
> on to power their server products back in the day.
>
> 5. Solaris and Linux are both based on System V in some form or other,
> unlike the BSDs. I've seen code in here with a 1989 AT&T copyright
> notice attached, because it is actually System V Unix code from Bell
> Labs. So there's a lot of overlap in the design, and a lot of POSIX
> functionality to fall back on where the differences lie.
>
> So after I got the system up and running, I tried to load a mozconfig
> file... and hit my first error before ever starting the build. Turns out
> that Solaris uses Ksh, and while Bash is available, it's hard to
> convince it to execute a script as a Bash script with all Bash features
> rather than a version limited to Ksh features. Anyway, it turned out
> Mozilla actually made a patch to remove the "Bash localism," and the
> mozconfig loader is now POSIX compliant (which it should have been in
> the first place). That was the first patch I applied.
>
> From there, it was mostly a matter of applying build system patches so
> the build system would recognize Solaris. 90% of the time, it would take
> the same code as Linux, and it was like FreeBSD the other 10% of the
> time, basically. One theme that kept coming up was that I had to replace
> several memory-related functions like malign and madvise with
> posix_malign and posix_madvise, because Solaris has versions of those
> functions that take different arguments like caddr_t. This had to be
> ifdefed only because apparently a few versions of Linux don't actually
> have posix_malign and only have the regular version with the POSIX
> syntax. I would say that this was the most common unexpected compile
> error I kept getting caught by, some "malign" or "madvise" function
> somewhere in the code I forgot to change.
>
> The build issue that consumed most of my time was figuring out why I was
> getting text relocations and .eh_frame issues in libxul.so. I learned
> everything I could about linkers and the ELF file format, and about
> libxul.so. Even to the point of reading Mike Hommey's blog and learning
> more about him, his interests, and the reasons behind his weird linker
> hacks and frustration with manual component registration than I really
> should have. I even found out that apparently on OI's official Firefox
> 52 build, the guy who got everything else working gave up and tried in
> desperation to build libxul.so with GNU LD and use the Sun linker for
> the rest of it, and they were lucky that it worked.
>
> However, it turned out that I had been trying to solve a problem I
> hadn't yet run into. My actual build issue was because of libffi, and it
> took me a while to figure out that it was relying on an external script
> to configure libffi that was making incorrect assumptions about several
> things. First issue is it assumed I wanted my .eh_frames to be read only
> just because I'm on x86. Well, that's not a safe assumption on Solaris,
> you want writable .eh_frames. Then I saw tons of text relocations, so I
> started researching how to avoid text relocations in PIC code (which
> Solaris seems to require). Then I found out you actually can't avoid
> them completely, because assembler code needs to access the global
> offset table at some point, and usually needs a PC relative relocation
> at some point to do so. Then, I remembered a comment I saw in a libffi
> source code file. "Solaris uses datarel encoding for PIC on x86." So I
> figured out that I had to enable that hack by changing Mozilla's old
> libffi configuration not to use PC relative relocations on Solaris x86.
> So it does have a mechanism for allowing relative relocations of some
> kind, just not PC relative ones. That got rid of most of the text
> relocations, but I was still getting them in a file called win32.S,
> which was always included whether I wanted/needed it or not. I
> eventually looked at that code and found that the Solaris hack was not
> available there, and instead it hardcoded PC relative encoding. I was
> somehow able to look at that hack from sysv.S and copy it into win32.S,
> perform the same tests and make it apply datarel encoding where
> necessary (easier than it sounds if you see the file). After this, I'd
> already fixed an issue that made the libxul.so modules appear out of
> order on Solaris with a patch from Mozilla, so everything worked.
>
> After this, I was finally able to build the browser, but it crashed
> almost immediately with an assertion failure to NS_IsMainThread() in
> NSS, that only one person had ever gotten before, and in their case it
> was an SSL policy issue. I found a way to avoid crashing right away by
> sheer accident. I specify the word "file" on the command line, and it
> takes me to a very simple HTTP page called file.com, with nothing but a
> single image on it advertising some kind of file storage service or
> something. None of the stacktraces really helped or made much sense, it
> appeared that the attempt to initialize NSS was itself the cause of the
> failure.
>
> I compiled a debug version, took a crash course in how to read
> stacktraces, and tried in desperation applying several patches I didn't
> think were necessary and didn't really even like. I found this set of
> patches from Mozilla upstream that stabilized the browser and stopped
> the assertion failure, but only got it to work offline. It was able to
> load up XUL plugins and offline saved web pages in this state, as well
> as show about:config and such. It generated error pages saying the PSM
> component appeared to be broken or disabled. I could see threads in gdb
> spinning up and then crashing immediately every time I'd try to go
> online. I thought that NSS was completely busted for some reason. I even
> tried running the NSS test suite, but it passed and nothing seemed to be
> wrong.
>
> I applied this one patch that changed the way the browser looked and
> completely busted the interface, kept it from saving any history, but
> only because I typed it in wrong. It went like this:
>
> palemoon.js:
>
> <code>
> <https://forum.palemoon.org/viewtopic.php?f=65&t=22899#>
>
> |pref("storage.nfs_filesystem, true); </code> |
>
> Yes, notice that the ending quote after filesystem is missing. For
> whatever reason, this made Pale Moon behave a lot like a really old
> version of Firefox used to act when it had a corrupted database. Same
> symptoms, history not not being saved, navigation being busted except
> the URL bar, etc.
>
> I had a weird feeling this might have changed or fixed something else,
> so I removed the temporary NSS patches and tried loading the browser
> again... and although the interface was still broken, I could now type
> in any URL I wanted, and nothing crashed. For some reason, even YouTube
> was working in this state. Though it took a full minute for a video to
> start playing, it was smooth once it started playing back. It's a feat I
> haven't been able to replicate since, the videos just refuse to play
> entirely due to a software raster feature failure or something. The only
> change I'd made recently that seemed like it could have fixed things was
> a change to compile NSS and NSPR with pthreads after seeing that the
> repositories for the official OS versions had added them in.
>
> Thinking that adding pthreads had solved the problem (a suggestion my my
> mind was vulnerable to because i remembered inexplicable segfaults on
> Linux 20 years ago due to things being compiled without them by
> default), I fixed that typo... and the browser started crashing again.
>
> So I assumed that maybe something was wrong with SQLite, if busting the
> database access by accident had somehow made the browser work after
> resolving the NSS issue. I ended up making absolutely sure that SQLite
> built with -D_POSIX_PTHREAD_SEMANTICS and set it up to include a linker
> mapfile provided from the OI repositories to make absolutely sure it
> built correctly. And then everything started working again. I assumed
> I'd finally done it... but the the next day, while trying to get YouTube
> to work again and making very small changes, I was getting the same
> problem again with every build of the browser, even with the exact same
> configuration that had worked before.
>
> When I figured out why, I felt like like a huge idiot. You want to know
> what the difference was between the browser successfully running, and
> crashing this whole time, since getting it to build? It was /which
> terminal window I ran ./mach run from./ Why? Because I'd used one of
> those terminal windows to run the NSS test suite. Why would that make a
> difference? While I was running the test suite... I'd added the files in
> dist/bin in the object directory to LD_LIBRARY_PATH because it didn't
> know where to look for its own object files. So whenever I tried to run
> the browser from the terminal window where I'd added the NSS I'd just
> built to the LD_LIBRARY_PATH, everything worked fine, and when I ran it
> from the other one, it crashed. And so the last several patches I'd been
> applying and things I'd thought I'd been doing to fix or break the
> browser were actually completely irrelevant. I'd probably had it working
> since the first time I got it built and didn't realize it had no idea
> where to find its own libraries in the build directory.
>
> So yeah, apparently now it builds and runs on Solaris perfectly fine.
> Regular VP9 test videos work, YouTube videos try to work for a few
> frames and then stop, but I have a feeling it might work better on
> actual hardware rather than using a software renderer in a VM. I have to
> disable Libevent's use of Solaris event ports for some weird reason to
> stop websites from sending PHP files to me rather than trying to parse
> them on the server. But yeah, I somehow got this to work in just under a
> month, I think. It helped a lot that the browser hasn't had extensive
> changes to memory handling or assembler code, that there were a ton of
> existing patches to a code base very similar to the UXP one for Solaris
> support, and that most of the potential trouble points were in external
> libraries anyway.
>
>
> _______________________________________________
> oi-dev mailing list
> oi-dev at openindiana.org
> https://openindiana.org/mailman/listinfo/oi-dev
>
More information about the oi-dev
mailing list