[oi-dev] PR #184: Support for epub and pandoc version problems

James Madgwick james at madgwick.xyz
Fri Jun 4 20:21:19 UTC 2021


On Fri, 04 Jun 2021 11:49:20 +0200
benn <benny.lyons at gmx.net> wrote:

> > Only after some struggling with errors in the Lua filter did I
> > realize the features needed to make the PDF presentable required
> > the latest version of pandoc, or at least a version newer than the
> > one found in Debian stable. I installed the deb directly from the
> > pandoc GitHub releases.  
> I think, I'm very much open to correction here, the settings in yaml
> (or xml) configuration files, can also be passed as command line
> options to the older pandoc version (i.e., 2.2.1).

Indeed, the version in debian stable (which is 2.2.1) does take command
line options but not for all the options in the yaml. The most
important part of the yaml is the 'header-includes' field (which adds
LaTex commands for packages, fonts, etc) which I could not get to work
correctly when passed as a command line option. The yaml could instead
be included as a header in the markdown document itself, but this would
require it to be duplicated in every file in the docs and may not be
compatible with mkdocs.

The Lua filters error problem I mentioned relates to the ':insert'
function in the Pandoc Lua API, which is relatively new. This option
is not present in version 2.2.1, and although I believe there's a
workaround for this particular issue, there may be other features
missing from the earlier API version.

> > > The generation of the epub format of the manuals provided, in the
> > > past, output that was sufficient for me to proof read. Alas with
> > > the newer versions of pandoc I have not been able to review
> > > this.  
> >
> > Using the latest version of pandoc, the epubs appeared to be ok,
> > though I have no experience with the format.  
> Producing the appropriate format is important, difficult and time
> consuming. The OpenSolaris produced a very nice guide indeed called
> 'Documentation Style Guide for OpenSolaris.' It is over 250 pages
> long and contains a wealth of information. (It is generally available
> somewhere, 

It's linked from this page on the docs
(https://docs.openindiana.org/contrib/style/).

> 
> Now to emulate this style, if the more recent pandoc version is
> easier for you, great. I'll get makepdf to work for you. Anyway, it
> must be a goal with OI to support more recent versions of software
> and you are, currently, the only one tweaking the pdf style
> However, my old pdf generation mechanism was broken and I cannot
> easily update to a newer version yet. Agreed, the old mechanism will
> have to die, but killing Debian (or any other platform plagued) users
> ought to be avoided. Think of the unfortunate Gentoo Linux user
> having to recompile pandoc just cause something they do not really
> need breaks! So the old pdf generation mechanism will have to become
> deprecated; but not quite yet.

I'm not opposed to keeping support for older versions of pandoc if we
think it's worthwhile. My main concern was the general quality of the
output, it's unfortunate that changes to formatting will not be
reflected in that case. The best solution to this would be, I think, to
provide already compiled PDFs & epub as downloads from the docs
website. This way only those who wished to make their own changes would
need to run pandoc themselves - thus avoiding difficulties for a general
user in getting hold of the correct versions of program.

> > > The quality of the pdf produced needs to be improved and the
> > > quality of epub has still to be examined.  
> >
> > The PDF is quite ok with the latest pandoc I would say (example:
> > https://github.com/OpenIndiana/oi-docs/files/6566753/getting-started.pdf).
> > It could do with a few tweaks to the LaTex, but nothing much is
> > needed urgently. Unlike LaTex I don't know about epub, it does seem
> > like there are extra settings which can be tweaked in padoc, but
> > far fewer than for LaTex/PDF.  
> The epub was, like the pdf, there for me, my personal use without any
> tweaking. It has proved to be exceptionally handy. This needs serious
> tweaking to coincide with the OpenSolaris style-guide.

Having looked at the style guide above, it does seem the current OI
Docs are quite different. I see two main components here, the content
itself: how it is written (language use/writing style) and how its
arranged (eg use of headings), and the formatting of the content (font
style, size, margins). Only the latter can be changed using pandoc.

I simply styled the pandoc PDF output to be similar to the docs website.

Changing the pandoc PDF formatting to be closer to the OpenSolaris
style-guide is probably possible. This could be done with separate yaml
and lua pandoc config and potentially an option in the script to chose
the PDF formatting style. If this were done, I think the PDF conversion
related files would be best moved into a separate folder in the repo,
as it will start to get cluttered otherwise. To fit the OpenSolaris
style-guide we'd need to include suitable images (ideally the same ones
if they can be located) to match those used by OpenSolaris.

The main problem I see here is the way the content is styled in the
current OI docs. In particular I notice very frequent use of breakout
'Note' boxes and code boxes. The OpenSolaris style-guide specifically
mentions: "The inclusion of many Notes might indicate an organizational
problem in the text". The tone in some places is also more informal than
that proposed by the style-guide. I think that, in general, most modern
open-source documentation is not written as formally as the style-guide
sets out. It could be said to be dated, although I don't find that to
be a problem.

While the formatting can be changed and customized without too much
difficultly, the content cannot. In my opinion the Docs have many other
problems which should be fixed first, perhaps the writing style could be
addressed as improvements and changes are made.

In general, I'm not sure about using Pandoc with LaTex to try and create
PDFs in the OpenSolaris style. It might make more sense to use the
original OpenSolaris documentation workflow, as described here:
https://web.archive.org/web/20081222132331/http://opensolaris.org:80/os/community/documentation/doc_collab/tools/
Unfortunately I couldn't find a copy of the OpenSolaris docs source:
http://web.archive.org/web/20100907075618/http://dlc.sun.com/osol/docs/downloads/current/
which would be useful to see exactly how the XML was structured and to
get copies of the icons used. This file seems to be lost to time.

Pandoc has the capability to output in docbook format (but the input
cannot contain _any_ HTML). This can then be processed as described in
the example linked from the OpenSolaris page above. Here's a very quick
attempt at doing this: http://madgwick.xyz/files/handbook-fo.pdf
There's a few problems with images, links and text wrapping - I expect
these can be fixed somehow. To me, this looks closer to the OpenSolaris
style, although the content still needs changing to fit the writing
style and headings need work etc.

The problem with using the method above, is it adds further complexity
to the process for generating PDFs. I'm not sure that the formatting of
the PDF is a priority at the moment. As it is now, the PDF output is
presentable and look similar to the web docs (when the latest pandoc is
used) and without the latest pandoc it's at least readable. For the
time being, it would be simple to write up some information on how to
pipe the markdown through pandoc, xsltproc and fop to get the example
PDF above which is closer to the OpenSolaris doc style - without
actually including scripts to do this.

> The build fails because the lint syntax checker has some error. Run
> the following: mdl -s markdownlint-rules.rb .
> fix the errors, rerun, ... Once this runs without errors, you can
> commit.

The error is now fixed (to the extent that Travis passes) in master. But
as your PR is based on a branch which is one commit behind, Travis is
building from that instead and so the error still appears. This can be
ignored.

> Maybe we should dump this check into the newer python makedoc, or
> whaterver we can call makepdf, as I tend to by-pass the html stuff
> while working. 

I agree that the mdl check is not as useful as it might be. Especially
considering that Travis is using an older version which doesn't find as
many errors as the latest version. Running latest 'mdl' on the master
branch incorrectly reports various errors. This is caused by the mix of
markdown and HTML which mdl isn't designed for. There are a number of
bug reports on the mdl pages relating to this, but the answer is mostly
to not mix HTML and markup.

I've checked the PR #184 and it works fine for me and it's an
improvement on what exists so far. You mentioned in the PR that it
shouldn't be merged to master. What do you propose we do next?

I'm wondering if anyone else has comments on the general OI docs -> PDF
situation?


-- 
Regards,
James



More information about the oi-dev mailing list