![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
So, for quite some time now, I've been working on a Great Project.
It started as something simple: I wanted to take the Irish Annals from the CELT project, and make an index of the names therein. So that I'd have a handy reference on hand at SCA Heraldry meetings where we discuss the historicity of proposed persona names for registration.
This parsing engine turns an SGML file into a tree data structure, which can be fairly trivially walked, such as by the two programs I wrote to use this module:
But then, when you have a text with Irish and Latin text in it, and these are marked so that you can tell the difference, and given that Irish historically has its own typefaces, how can you easily show which text is Latin and which Irish? Well, you do what I did, and take an existing METAFONT font for Gaelic called eiad, and tweak it until it suits your purposes. I added all sorts of missing characters to this font, filling out the missing jkqvwxy and z (upper and lower case), then adding other characters I'd need; especially e-caudata, which is represented in Roman fonts with the character ‘ę’. Then, for fun, I expanded the font to include letters from Old English like þ, ð and æ. (OE and Irish are both written in Insular Minuscule, and thus a font for one seems appropriate for both.) Then I tweaked the shape of the glyphs, so that it looks closer to a historical font which is common in early 20C books in and about Gaelic, known in various forms as ‘Monotype Series 24’, and ‘Newman’.
So now I have a METAFONT font called Fear Nua (which is ‘New Man’ in Gaelic), with most of the characters I can think of (there's a ‘ui’ ligature which I should really work on at some point), plus the LaTeX files to make it an entire family, with bold, italic (which is italic, not just slanted), small caps, sans serif, teletype (fixed width and variable), largely because the guy who built the eiad font in the first place did so with great skill and foresight, and made it relatively easy for me to build on his work.
In summary: I have
And I have a question for those of you in the programming community: now what do I do with them?
None of them are complete to my satisfaction, but then, it's entirely possible that they will never be, if I hide them from the world and continue to do all the work in my copious free time whenever I have a flash of insight or a moment to get into it.
I imagine that the SGML parser module could be submitted to CPAN, and the font to CTAN. But I've been putting it off and putting it off, partially because I don't know where to begin, partially because I'm just really nervous about putting my code out there for others to pick over.
As for the other stuff... Sourceforge? For all that I've been supporting others with subversion and the like, I've never used it in anger myself, and find that worrying about this is yet another excuse to hold off doing anything.
Indeed, I've been meaning to post this cry for advice for weeks now, but kept ... putting it off.
So: what do I do with all this now?
It started as something simple: I wanted to take the Irish Annals from the CELT project, and make an index of the names therein. So that I'd have a handy reference on hand at SCA Heraldry meetings where we discuss the historicity of proposed persona names for registration.
- First problem
- the CELT files are in SGML.
- More to the problem
- the industry standard tool for parsing SGML files, nsgmls, requires you to be an expert in SGML parsing before you can begin to make head or tail of the documentation.
- Solution
- I have built an SGML parsing engine entirely in Perl. It copes with such interesting complications as
<table><tr><td>blah</table>
where the </table> tag also implicitly closes the <tr> and <td> elements, not to mention situations like<p>foo<p>bar
where the second <p> closes the first.
It also copes with an even more interesting part of SGML parsing: when you have a string like<foo<bar>
it should be parsed in the following wise: a tag cannot contain the character ‘<’, so that example has an implicit closing bracket, and should be interpreted as a tag <foo>, containing (or followed by, depending on the DTD) the tag <bar>.
This parsing engine turns an SGML file into a tree data structure, which can be fairly trivially walked, such as by the two programs I wrote to use this module:
- A Perl::GTK2 program which takes that tree, and presents it to you graphically.
- A command line program which takes that tree, and then outputs something based on the contents of that tree, by interpreting another language which I invented for the purpose. So for that, I have written an interpreter in Perl. It's crude, and would not win any praise in a CS course, but then, I haven't done a CS course, and did it from first principles. And it works well enough. Well enough, certainly, that the combination of this interpreter, running code I wrote for the purpose, running over the output of the SGML parser I wrote, takes the CELT SGML file and generates a LaTeX document, which generates an index, and is not bad looking either, if I do say so myself.
But then, when you have a text with Irish and Latin text in it, and these are marked so that you can tell the difference, and given that Irish historically has its own typefaces, how can you easily show which text is Latin and which Irish? Well, you do what I did, and take an existing METAFONT font for Gaelic called eiad, and tweak it until it suits your purposes. I added all sorts of missing characters to this font, filling out the missing jkqvwxy and z (upper and lower case), then adding other characters I'd need; especially e-caudata, which is represented in Roman fonts with the character ‘ę’. Then, for fun, I expanded the font to include letters from Old English like þ, ð and æ. (OE and Irish are both written in Insular Minuscule, and thus a font for one seems appropriate for both.) Then I tweaked the shape of the glyphs, so that it looks closer to a historical font which is common in early 20C books in and about Gaelic, known in various forms as ‘Monotype Series 24’, and ‘Newman’.
So now I have a METAFONT font called Fear Nua (which is ‘New Man’ in Gaelic), with most of the characters I can think of (there's a ‘ui’ ligature which I should really work on at some point), plus the LaTeX files to make it an entire family, with bold, italic (which is italic, not just slanted), small caps, sans serif, teletype (fixed width and variable), largely because the guy who built the eiad font in the first place did so with great skill and foresight, and made it relatively easy for me to build on his work.
In summary: I have
- one Perl module (actually a small set of interconnecting modules) which provide an SGML parser (and a specific TEI parser, and a specific CELT parser) in pure Perl.
- A program which uses (1) to display an SGML file in a human-browseable way (although not yet able to edit or meaningfully search that file).
- A program which applies rules from a bespoke language to apply simple transformations to an SGML file (parsed with (1)), sufficient to generate output which is a compilable LaTeX file
- A METAFONT font for Gaelic, which includes the full Roman character set and several paleographic characters, in an entire family of shapes. A bit of work with Fontforge can easily turn this into a postscript or truetype font (indeed, I have done this to an older version, and those bits which are displayed in this font in the output from (3) are also displayed in this font on screen when I view it in (2).
And I have a question for those of you in the programming community: now what do I do with them?
None of them are complete to my satisfaction, but then, it's entirely possible that they will never be, if I hide them from the world and continue to do all the work in my copious free time whenever I have a flash of insight or a moment to get into it.
I imagine that the SGML parser module could be submitted to CPAN, and the font to CTAN. But I've been putting it off and putting it off, partially because I don't know where to begin, partially because I'm just really nervous about putting my code out there for others to pick over.
As for the other stuff... Sourceforge? For all that I've been supporting others with subversion and the like, I've never used it in anger myself, and find that worrying about this is yet another excuse to hold off doing anything.
Indeed, I've been meaning to post this cry for advice for weeks now, but kept ... putting it off.
So: what do I do with all this now?