catsidhe: (Default)
[personal profile] catsidhe
I have a project, which I've been working on, off and on, for years. The project is simple in concept: take the Middle Irish Annals from the CELT project, and turn them into nicely printed editions, for use as references in SCA Onomastic Heraldry (and if there are other uses for such editions, so much the better).

It turns out that this is not a simple project, and has become A Great Project.

To do this properly, I decided to typeset it in LaTeX. Ah, but to turn the SGML into LaTeX is non-trivial, to the point where I have had to teach myself SGML and DTDs from first principles to comprehend the TEI flavour of SGML (which is, it turns out, possibly the single richest and most complicated implementation of SGML in the wild, ever), and, because I could not get my head around the existing tools for SGML (nSGMLs comes to mind), I have been building an SGML parser in Perl, from first principles. (I have a working implementation, which can, after a recent surge in effort, cope with such SGML quirks as missing end tags and even missing end angle-brackets on tags. (So that <tag<othertag> parses out correctly, yes, these documents contain these legal but parser-breaking constructs.) It is not a validating Parser. Technically.
And I had to build a tool so that I could visualise how the document was being parsed, so that I could find errors (in the document and parser both), so I have learned first PerlTk, and then Perl::GTK2, and I have a tool which reads, parses and displays a TEI document, and even very crudely attempts to render it.
And to print the Middle Irish properly, I needed a font with the requisite Insular Minuscule characters, so I took the nearest Metafont font (namely: eiad) and modified it heavily, such that it now has a different shape, a different layout, and many new characters — including the rest of the latin alphabet, and special ligatures not used outside Irish, which required research into Middle Irish paleography and Irish typography — and LaTeX formats for a whole slew of variant shapes (bold, italic, even teletype — it is entirely possible to generate a bold-extended italic small caps variable-width teletype shape of this thing, due in no small part to the foresight and skill of the guy who first wrote it). And turn that into Truetype, thus the Perl::GTK2 viewer can display the Irish in an Irish font, special characters and all.

So, after all this, I am well progressed towards my goal, and have along the way learned much about Perl, PerlTk and Perl::GTK2, LaTeX, METAFONT, SGML, and Middle Irish and its orthography, paleography and typography.

Know any jobs going for that skillset?



Anyway, after futzing around with one aspect of my Grand Project, I have returned to another. And this one is indexing. One of the characters in the Annals is ‘ę’, the grapheme ‘e ogonek’. Except it isn't, really. E ogonek is just the character which looks closest to the insular character e-caudata, which is an ‘e’ with a tail pointing to the right, but which is actually a ligature of ‘ea’. And I would like the index to sort ‘ę’ as if it were ‘ea’, just as German indexes sort ‘ü’ as if it were spelled ‘ue’. Makeindex can't do that. Xindy can... if you can figure out how to tell it so. Which I haven't. Xindy was written to be universal and highly configurable for many disparate language sorting requirements. It is so configurable that the set of modules which tell it how to imitate Makeindex are long, and convoluted, and there is no meaningful tracing. Which means I can't tell how, or where, it turns ‘\k{e}’ into ‘ę’ (which it does, because I see “Edan, Ęd, Edon, ...”, where “Ęd" should be in the “Ead...” region), and how to force it to sort ‘ę’ as ‘ea’. That is, according to the docs, it's straightforward and easy. Which it makes it infuriating when it doesn't work. And I have no feedback as to why. The documentation is as far as I can tell out of date, and is either rarified technical details to the point of uselessness (a description of how a function works, but not a word of how it works with other functions, which is where the hard yakka comes in...), or else so user-friendly and basic as to be functionally useless. In only one example, in an attempt to make a rule that “\emph{...}” is to be ignored in sorting, I was told that there was a bad regexp. Not what was bad about it, not a clue about how to fix it, just a stern warning that it was wrong. At least it told me where the bad regexp was, I suppose. It turned out that it wanted ‘{’ and ‘}’ to be escaped with a backslash, for some reason. It still doesn't work, though.

So now it looks like I'll have to figure out xindy from first principles as well, to get this one step closer.


Ah well, it keeps me off the streets.

Profile

catsidhe: (Default)
catsidhe
Page generated Feb. 13th, 2026 06:14 am

Style Credit

Expand Cut Tags

No cut tags