[oi-dev] PR #184: Support for epub and pandoc version problems

benn benny.lyons at gmx.net
Mon Jun 7 17:19:35 UTC 2021


Hi,
Didn't get any time over the weekend until now.

I've just cleaned up my epub branch, but PR #184 was automatically closed, so I
generated a new PR #186, which is just PR #184 with my very old merge conflicts
hopefully correctly resolved.

The makepdf should select your newer pdf generation mechanism if a newer pandoc
version is old, and still work for older pandoc version. At some stage, we can
deprecate this older generation mechanism, i.e., once Debian stable supports it.
There might be confusion in the future with various styles flying about the place.

On Fri, 2021-06-04 at 21:21 +0100, James Madgwick wrote:
> On Fri, 04 Jun 2021 11:49:20 +0200
> benn <benny.lyons at gmx.net> wrote:
>
> > > Only after some struggling with errors in the Lua filter did I
> > > realize the features needed to make the PDF presentable required
> > > the latest version of pandoc, or at least a version newer than the
> > > one found in Debian stable. I installed the deb directly from the
> > > pandoc GitHub releases.  
> >
> > I think, I'm very much open to correction here, the settings in yaml
> > (or xml) configuration files, can also be passed as command line
> > options to the older pandoc version (i.e., 2.2.1).
>
> Indeed, the version in debian stable (which is 2.2.1) does take command
> line options but not for all the options in the yaml. The most
> important part of the yaml is the 'header-includes' field (which adds
> LaTex commands for packages, fonts, etc) which I could not get to work
> correctly when passed as a command line option. The yaml could instead
> be included as a header in the markdown document itself, but this would
> require it to be duplicated in every file in the docs and may not be
> compatible with mkdocs.
>
> The Lua filters error problem I mentioned relates to the ':insert'
> function in the Pandoc Lua API, which is relatively new. This option
> is not present in version 2.2.1, and although I believe there's a
> workaround for this particular issue, there may be other features
> missing from the earlier API version.
>
> > > > The generation of the epub format of the manuals provided, in the
> > > > past, output that was sufficient for me to proof read. Alas with
> > > > the newer versions of pandoc I have not been able to review
> > > > this.  
> > >
> > > Using the latest version of pandoc, the epubs appeared to be ok,
> > > though I have no experience with the format.  
> >
> > Producing the appropriate format is important, difficult and time
> > consuming. The OpenSolaris produced a very nice guide indeed called
> > 'Documentation Style Guide for OpenSolaris.' It is over 250 pages
> > long and contains a wealth of information. (It is generally available
> > somewhere, 
>
> It's linked from this page on the docs
> (https://docs.openindiana.org/contrib/style/).
>
> >
> > Now to emulate this style, if the more recent pandoc version is
> > easier for you, great. I'll get makepdf to work for you. Anyway, it
> > must be a goal with OI to support more recent versions of software
> > and you are, currently, the only one tweaking the pdf style
> > However, my old pdf generation mechanism was broken and I cannot
> > easily update to a newer version yet. Agreed, the old mechanism will
> > have to die, but killing Debian (or any other platform plagued) users
> > ought to be avoided. Think of the unfortunate Gentoo Linux user
> > having to recompile pandoc just cause something they do not really
> > need breaks! So the old pdf generation mechanism will have to become
> > deprecated; but not quite yet.
>
> I'm not opposed to keeping support for older versions of pandoc if we
> think it's worthwhile. My main concern was the general quality of the
> output, it's unfortunate that changes to formatting will not be
> reflected in that case. The best solution to this would be, I think, to
> provide already compiled PDFs & epub as downloads from the docs
> website. This way only those who wished to make their own changes would
> need to run pandoc themselves - thus avoiding difficulties for a general
> user in getting hold of the correct versions of program.
Yes, a very good idea. Ultimately, users will not require the source, only 
a button for html, dvi, pdf, epub, ...  but we've a long way to go yet.
> > > > The quality of the pdf produced needs to be improved and the
> > > > quality of epub has still to be examined.  
> > >
> > > The PDF is quite ok with the latest pandoc I would say (example:
> > > https://github.com/OpenIndiana/oi-docs/files/6566753/getting-started.pdf).
> > > It could do with a few tweaks to the LaTex, but nothing much is
> > > needed urgently. Unlike LaTex I don't know about epub, it does seem
> > > like there are extra settings which can be tweaked in padoc, but
> > > far fewer than for LaTex/PDF.  
> >
> > The epub was, like the pdf, there for me, my personal use without any
> > tweaking. It has proved to be exceptionally handy. This needs serious
> > tweaking to coincide with the OpenSolaris style-guide.
>
> Having looked at the style guide above, it does seem the current OI
> Docs are quite different. I see two main components here, the content
> itself: how it is written (language use/writing style) and how its
> arranged (eg use of headings), and the formatting of the content (font
> style, size, margins). Only the latter can be changed using pandoc.
>
> I simply styled the pandoc PDF output to be similar to the docs website.
>
> Changing the pandoc PDF formatting to be closer to the OpenSolaris
> style-guide is probably possible. This could be done with separate yaml
> and lua pandoc config and potentially an option in the script to chose
> the PDF formatting style.
No problem, I can add an option.
Once I get some time, I'll replace the Bash script with a Python version and add a
few things in the process like logging, tests, ...

> If this were done, I think the PDF conversion
> related files would be best moved into a separate folder in the repo,
> as it will start to get cluttered otherwise. To fit the OpenSolaris
> style-guide we'd need to include suitable images (ideally the same ones
> if they can be located) to match those used by OpenSolaris.
Yes, very good idea.
> The main problem I see here is the way the content is styled in the
> current OI docs. In particular I notice very frequent use of breakout
> 'Note' boxes and code boxes. The OpenSolaris style-guide specifically
> mentions: "The inclusion of many Notes might indicate an organizational
> problem in the text". The tone in some places is also more informal than
> that proposed by the style-guide. I think that, in general, most modern
> open-source documentation is not written as formally as the style-guide
> sets out. It could be said to be dated, although I don't find that to
> be a problem.
>
> While the formatting can be changed and customized without too much
> difficultly, the content cannot. In my opinion the Docs have many other
> problems which should be fixed first, perhaps the writing style could be
> addressed as improvements and changes are made.
>
> In general, I'm not sure about using Pandoc with LaTex to try and create
> PDFs in the OpenSolaris style. It might make more sense to use the
> original OpenSolaris documentation workflow, as described here:
> https://web.archive.org/web/20081222132331/http://opensolaris.org:80/os/community/d
> ocumentation/doc_collab/tools/
> Unfortunately I couldn't find a copy of the OpenSolaris docs source:
> http://web.archive.org/web/20100907075618/http://dlc.sun.com/osol/docs/downloads/cu
> rrent/
> which would be useful to see exactly how the XML was structured and to
> get copies of the icons used. This file seems to be lost to time.
>
> Pandoc has the capability to output in docbook format (but the input
> cannot contain _any_ HTML). This can then be processed as described in
> the example linked from the OpenSolaris page above. Here's a very quick
> attempt at doing this: http://madgwick.xyz/files/handbook-fo.pdf
Yes, this looks familiar and good. I think, we should stay 'as close 
as reasonably possible' to the ideas set out in that document, lets not 
re-invent the wheel.
> There's a few problems with images, links and text wrapping - I expect
> these can be fixed somehow. To me, this looks closer to the OpenSolaris
> style, although the content still needs changing to fit the writing
> style and headings need work etc.
>
> The problem with using the method above, is it adds further complexity
> to the process for generating PDFs. I'm not sure that the formatting of
> the PDF is a priority at the moment.
It certainly is not, other things, such as content are more important.

> As it is now, the PDF output is
> presentable and look similar to the web docs (when the latest pandoc is
> used) and without the latest pandoc it's at least readable. For the
> time being, it would be simple to write up some information on how to
> pipe the markdown through pandoc, xsltproc and fop to get the example
> PDF above which is closer to the OpenSolaris doc style - without
> actually including scripts to do this.
Maybe if we were to add the options required to generate the OpenSolaris 
style to makepdf.sh -s OpenIndiana, -s OpenSolaris, whereby OpenIndiana would be
default? I could do this, just send me the pandoc command line options, and yaml
configuration file.

> > The build fails because the lint syntax checker has some error. Run
> > the following: mdl -s markdownlint-rules.rb .
> > fix the errors, rerun, ... Once this runs without errors, you can
> > commit.
>
> The error is now fixed (to the extent that Travis passes) in master. But
> as your PR is based on a branch which is one commit behind, Travis is
> building from that instead and so the error still appears. This can be
> ignored.
>
> > Maybe we should dump this check into the newer python makedoc, or
> > whaterver we can call makepdf, as I tend to by-pass the html stuff
> > while working. 
>
> I agree that the mdl check is not as useful as it might be. Especially
> considering that Travis is using an older version which doesn't find as
> many errors as the latest version. Running latest 'mdl' on the master
> branch incorrectly reports various errors. This is caused by the mix of
> markdown and HTML which mdl isn't designed for. There are a number of
> bug reports on the mdl pages relating to this, but the answer is mostly
> to not mix HTML and markup.
>
> I've checked the PR #184 and it works fine for me and it's an
> improvement on what exists so far. You mentioned in the PR that it
> shouldn't be merged to master. What do you propose we do next?
I think PR #186 should now be merged.


> I'm wondering if anyone else has comments on the general OI docs -> PDF
> situation?
That would be nice; but looks like we are on our own, at least for now.

Cheers
Benny






More information about the oi-dev mailing list