= Things to do, and the people doing them =

<<TableOfContents()>>

== Determine wiki structure from export XML ==

'''Who is working on this:''' BradleyDean, PaulBoddie

Given the XML structure in the Confluence exports, extract the site structure (including pages, attachments and history).

I've written [[https://hg.boddie.org.uk/ConfluenceConverter|some experimental code]] to export page revisions and manifests from the XML dump (`convert.py`), along with a module (`parser.py`) that performs some simple parsing of page text given to it on standard input. The idea is to combine the manifests and give them to the [[HelpOnPackageInstaller|package installer]] in order to import the Wiki content into Moin, but only after the actual page revisions have been parsed and converted to Moin syntax. -- PaulBoddie <<DateTime(2012-04-02T00:45:46+0200)>>

  I forgot to include the `xmlread` module, but I'll upload that later today. -- PaulBoddie <<DateTime(2012-04-02T10:09:41+0200)>>

  The missing module is now available [[https://hg.boddie.org.uk/xmlread|here]]. You can just copy `xmlread.py` into the `ConfluenceConverter` distribution and it should work. -- PaulBoddie <<DateTime(2012-04-02T18:08:15+0200)>>

== What confluence markup is being used? ==

'''Who is working on this:''' PaulBoddie

So we know what work needs to be done, find out what subset of the confluence markup is being used in the mailman wiki.

  The current strategy is to just target basic markup and to try and identify macros in use. With Confluence 3 markup, this involves searching for things resembling `{...}` - see the [[https://hg.boddie.org.uk/ConfluenceConverter/file/tip/tools/get_macros.py|get_macros.py]] file. With Confluence 4 XHTML markup, the exercise is simplified somewhat by looking for element usage. -- PaulBoddie <<DateTime(2012-12-21T00:47:54+0200)>>

== Parse confluence markup into DOM/AST-structure ==

'''Who is working on this:''' PaulBoddie

'''NOTE: The DOM/AST structure will need to be agreed upon between this and the moinmoin output step'''

Given raw confluence markup (just the page content, extracted from the XML structure) parse the data and store in some sort of DOM/AST style form.

  Currently, this is just writing Moin markup out while traversing both Confluence 3 and 4 markup. There may well be opportunities to consolidate some of the output formatting, but a tree representation isn't yet in use. -- PaulBoddie <<DateTime(2012-12-21T00:47:54+0200)>>

== MoinMoin output from parsed data ==

'''Who is working on this:''' PaulBoddie

'''NOTE: The DOM/AST structure will need to be agreed upon between this and the parsing step'''

Given the parsed content, generate raw MoinMoin markup

  See above. If we can get away with just invoking common formatting functions and not needing to generate a tree, we'll just stick with doing the former. -- PaulBoddie <<DateTime(2012-12-21T00:47:54+0200)>>

== Notes from MoinMoin devs ==
If you are going the DOM way (parsing stuff into a DOM tree, generating moin markup from that DOM tree), you should use the same DOM tree as moin2 does. There's a moinwiki_out converter already for that DOM tree, so you save half of the work.