[oi-dev] I created an OpenIndiana port of UXP/New Moon out of boredom.

Jean-Pierre André jean-pierre.andre at wanadoo.fr
Mon Sep 9 08:24:49 UTC 2019


Congratulations. A useful alternative to Firefox as its resource
needs are getting out of control.

Jean-Pierre


Jeremy Andrews wrote:
> I wanted to post this story somewhere in case anyone finds it 
> interesting, it's OpenIndiana-related, and I don't really have a blog. 
> The TL;DR version is that I've got UXP/New Moon working on OI, with all 
> my changes visible in a public GitHub repository, but I haven't yet 
> obtained permission to use the official branding (which would be Pale 
> Moon branding in this case, and it might be difficult because they 
> haven't shown interest in officially supporting Solaris and OI), and 
> find the requirements involved in packaging it for OI confusing because 
> it seems like the documentation expects me to do it a specific way that 
> involves downloading the whole oi-userland tree and supplying patches 
> and links to upstream code, and it's not really clear what I should do 
> at this stage.
> 
> So, about this time last month, I was looking for something to distract 
> myself from a stressful situation in real life and keep my mind 
> occupied. I was looking at the Pale Moon source code and noticed they'd 
> just removed Solaris support. So I was thinking to myself, "How hard 
> would it be to add it back in and then make the program actually compile 
> and run?" So I simply installed OpenIndiana in a virtual machine and got 
> to work despite having no real experience with Solaris, Firefox, or Pale 
> Moon. The only thing I knew about Solaris going in is that it's the 
> "other" Unix they offered on x86 systems at my college besides Linux so 
> that they could teach about POSIX compliance, avoiding "Linuxisms," and 
> say that they teach Unix and not just Linux. I wasn't able to stick with 
> my degree because of Calculus, but I always wondered what working with 
> it would have been like.
> 
> There were five things I learned that were encouraging to me early on.
> 
> 1. Oracle Solaris and the illumos distributions build Firefox with GCC 
> now, and haven't used Sun Studio to do so in ages, so all the code that 
> makes those assumptions is outdated. In fact, most of OpenIndiana is 
> built with GCC 7 specifically. They do use their own linker, but I knew 
> going in I wouldn't have to deal with any clang weirdness.
> 
> 2. Most of the GNU toolchain is available, but you have to prefix 
> commands with "g" to get the GNU version instead of the Solaris version.
> 
> 3. Mozilla regards Solaris as a Tier 2 or 3 platform, and a ton of 
> high-quality patches for it were created during or just after the 
> Firefox 52ESR lifecycle by Mozilla at the request of an incredibly 
> overworked Oracle employee trying to get the biggest Solaris issues 
> fixed upstream.
> 
> 4. All of the UXP project's major dependencies, like SQLite, NSS, NSPR, 
> libevent, libffi, and other libraries are available and more or less 
> up-to-date on Solaris and OI. NSS and NSPR have been on it since the 
> beginning, with Netscape getting involved with Sun/Java offerings early 
> on to power their server products back in the day.
> 
> 5. Solaris and Linux are both based on System V in some form or other, 
> unlike the BSDs. I've seen code in here with a 1989 AT&T copyright 
> notice attached, because it is actually System V Unix code from Bell 
> Labs. So there's a lot of overlap in the design, and a lot of POSIX 
> functionality to fall back on where the differences lie.
> 
> So after I got the system up and running, I tried to load a mozconfig 
> file... and hit my first error before ever starting the build. Turns out 
> that Solaris uses Ksh, and while Bash is available, it's hard to 
> convince it to execute a script as a Bash script with all Bash features 
> rather than a version limited to Ksh features. Anyway, it turned out 
> Mozilla actually made a patch to remove the "Bash localism," and the 
> mozconfig loader is now POSIX compliant (which it should have been in 
> the first place). That was the first patch I applied.
> 
>  From there, it was mostly a matter of applying build system patches so 
> the build system would recognize Solaris. 90% of the time, it would take 
> the same code as Linux, and it was like FreeBSD the other 10% of the 
> time, basically. One theme that kept coming up was that I had to replace 
> several memory-related functions like malign and madvise with 
> posix_malign and posix_madvise, because Solaris has versions of those 
> functions that take different arguments like caddr_t. This had to be 
> ifdefed only because apparently a few versions of Linux don't actually 
> have posix_malign and only have the regular version with the POSIX 
> syntax. I would say that this was the most common unexpected compile 
> error I kept getting caught by, some "malign" or "madvise" function 
> somewhere in the code I forgot to change.
> 
> The build issue that consumed most of my time was figuring out why I was 
> getting text relocations and .eh_frame issues in libxul.so. I learned 
> everything I could about linkers and the ELF file format, and about 
> libxul.so. Even to the point of reading Mike Hommey's blog and learning 
> more about him, his interests, and the reasons behind his weird linker 
> hacks and frustration with manual component registration than I really 
> should have. I even found out that apparently on OI's official Firefox 
> 52 build, the guy who got everything else working gave up and tried in 
> desperation to build libxul.so with GNU LD and use the Sun linker for 
> the rest of it, and they were lucky that it worked.
> 
> However, it turned out that I had been trying to solve a problem I 
> hadn't yet run into. My actual build issue was because of libffi, and it 
> took me a while to figure out that it was relying on an external script 
> to configure libffi that was making incorrect assumptions about several 
> things. First issue is it assumed I wanted my .eh_frames to be read only 
> just because I'm on x86. Well, that's not a safe assumption on Solaris, 
> you want writable .eh_frames. Then I saw tons of text relocations, so I 
> started researching how to avoid text relocations in PIC code (which 
> Solaris seems to require). Then I found out you actually can't avoid 
> them completely, because assembler code needs to access the global 
> offset table at some point, and usually needs a PC relative relocation 
> at some point to do so. Then, I remembered a comment I saw in a libffi 
> source code file. "Solaris uses datarel encoding for PIC on x86." So I 
> figured out that I had to enable that hack by changing Mozilla's old 
> libffi configuration not to use PC relative relocations on Solaris x86. 
> So it does have a mechanism for allowing relative relocations of some 
> kind, just not PC relative ones. That got rid of most of the text 
> relocations, but I was still getting them in a file called win32.S, 
> which was always included whether I wanted/needed it or not. I 
> eventually looked at that code and found that the Solaris hack was not 
> available there, and instead it hardcoded PC relative encoding. I was 
> somehow able to look at that hack from sysv.S and copy it into win32.S, 
> perform the same tests and make it apply datarel encoding where 
> necessary (easier than it sounds if you see the file). After this, I'd 
> already fixed an issue that made the libxul.so modules appear out of 
> order on Solaris with a patch from Mozilla, so everything worked.
> 
> After this, I was finally able to build the browser, but it crashed 
> almost immediately with an assertion failure to NS_IsMainThread() in 
> NSS, that only one person had ever gotten before, and in their case it 
> was an SSL policy issue. I found a way to avoid crashing right away by 
> sheer accident. I specify the word "file" on the command line, and it 
> takes me to a very simple HTTP page called file.com, with nothing but a 
> single image on it advertising some kind of file storage service or 
> something. None of the stacktraces really helped or made much sense, it 
> appeared that the attempt to initialize NSS was itself the cause of the 
> failure.
> 
> I compiled a debug version, took a crash course in how to read 
> stacktraces, and tried in desperation applying several patches I didn't 
> think were necessary and didn't really even like. I found this set of 
> patches from Mozilla upstream that stabilized the browser and stopped 
> the assertion failure, but only got it to work offline. It was able to 
> load up XUL plugins and offline saved web pages in this state, as well 
> as show about:config and such. It generated error pages saying the PSM 
> component appeared to be broken or disabled. I could see threads in gdb 
> spinning up and then crashing immediately every time I'd try to go 
> online. I thought that NSS was completely busted for some reason. I even 
> tried running the NSS test suite, but it passed and nothing seemed to be 
> wrong.
> 
> I applied this one patch that changed the way the browser looked and 
> completely busted the interface, kept it from saving any history, but 
> only because I typed it in wrong. It went like this:
> 
> palemoon.js:
> 
> <code>
> <https://forum.palemoon.org/viewtopic.php?f=65&t=22899#>
> 
> |pref("storage.nfs_filesystem, true); </code> |
> 
> Yes, notice that the ending quote after filesystem is missing. For 
> whatever reason, this made Pale Moon behave a lot like a really old 
> version of Firefox used to act when it had a corrupted database. Same 
> symptoms, history not not being saved, navigation being busted except 
> the URL bar, etc.
> 
> I had a weird feeling this might have changed or fixed something else, 
> so I removed the temporary NSS patches and tried loading the browser 
> again... and although the interface was still broken, I could now type 
> in any URL I wanted, and nothing crashed. For some reason, even YouTube 
> was working in this state. Though it took a full minute for a video to 
> start playing, it was smooth once it started playing back. It's a feat I 
> haven't been able to replicate since, the videos just refuse to play 
> entirely due to a software raster feature failure or something. The only 
> change I'd made recently that seemed like it could have fixed things was 
> a change to compile NSS and NSPR with pthreads after seeing that the 
> repositories for the official OS versions had added them in.
> 
> Thinking that adding pthreads had solved the problem (a suggestion my my 
> mind was vulnerable to because i remembered inexplicable segfaults on 
> Linux 20 years ago due to things being compiled without them by 
> default), I fixed that typo... and the browser started crashing again.
> 
> So I assumed that maybe something was wrong with SQLite, if busting the 
> database access by accident had somehow made the browser work after 
> resolving the NSS issue. I ended up making absolutely sure that SQLite 
> built with -D_POSIX_PTHREAD_SEMANTICS and set it up to include a linker 
> mapfile provided from the OI repositories to make absolutely sure it 
> built correctly. And then everything started working again. I assumed 
> I'd finally done it... but the the next day, while trying to get YouTube 
> to work again and making very small changes, I was getting the same 
> problem again with every build of the browser, even with the exact same 
> configuration that had worked before.
> 
> When I figured out why, I felt like like a huge idiot. You want to know 
> what the difference was between the browser successfully running, and 
> crashing this whole time, since getting it to build? It was /which 
> terminal window I ran ./mach run from./ Why? Because I'd used one of 
> those terminal windows to run the NSS test suite. Why would that make a 
> difference? While I was running the test suite... I'd added the files in 
> dist/bin in the object directory to LD_LIBRARY_PATH because it didn't 
> know where to look for its own object files. So whenever I tried to run 
> the browser from the terminal window where I'd added the NSS I'd just 
> built to the LD_LIBRARY_PATH, everything worked fine, and when I ran it 
> from the other one, it crashed. And so the last several patches I'd been 
> applying and things I'd thought I'd been doing to fix or break the 
> browser were actually completely irrelevant. I'd probably had it working 
> since the first time I got it built and didn't realize it had no idea 
> where to find its own libraries in the build directory.
> 
> So yeah, apparently now it builds and runs on Solaris perfectly fine. 
> Regular VP9 test videos work, YouTube videos try to work for a few 
> frames and then stop, but I have a feeling it might work better on 
> actual hardware rather than using a software renderer in a VM. I have to 
> disable Libevent's use of Solaris event ports for some weird reason to 
> stop websites from sending PHP files to me rather than trying to parse 
> them on the server. But yeah, I somehow got this to work in just under a 
> month, I think. It helped a lot that the browser hasn't had extensive 
> changes to memory handling or assembler code, that there were a ton of 
> existing patches to a code base very similar to the UXP one for Solaris 
> support, and that most of the potential trouble points were in external 
> libraries anyway.
> 
> 
> _______________________________________________
> oi-dev mailing list
> oi-dev at openindiana.org
> https://openindiana.org/mailman/listinfo/oi-dev
> 






More information about the oi-dev mailing list