A straight dump from Brad's email about this topic: Depending on the complexity of the data we might want to consider some sort of intermediate structure - ie: {{{#!sh `Confluence DB/Export -> -> MoinMoin` }}} The advantage of such an approach is it would be more easily testable and also easier to break up for concurrent development. The main disadvantage I can see with this idea is that it creates a layer of complexity that may not be worth the extra effort (not to mention extra possibility of bugs). == Confluence Dump XML == In a word - confusing. A good step towards understanding is in the way internal ids and id-refs are handled: * id's and id-references are '''not''' done using attributes as might be expected * an element is assigned an id by giving it a child element like: {{{10}}} * a reference to an element with an id has the form of an element which is a 'referencing-type' - for example the following XML chunk is a reference: {{{3309603}}} * '''So:''' an id element is a child is either an id assignment or and id-reference based on the context (ie the type of parent element to the id element). Looking at the elements in a confluence export that are parents to an id element the rule appears to be: * "property" and "element" elements are reference elements * "object" elements are the only ones with id's assigned '''BUT''' They have another ''special'' implementation which is that id's are not unique - they appear to be unique only within a class of object - so we have conflicts like: {{{ 104 ... 104 ... }}} The references to these pseudo-id's include enough information to derive the type: {{{ 104 104 }}} In order to build an id-map then: * Grab all "object" elements and find the "id" elements - map id values to object nodes based on the object class (ie the value of the class attribute on the object) * Grab all "property" and "element" elements with an "id" child element - these are the references and the type of reference is given by the class attribute of the parent element == Content Migration Strategy == It might be useful to consider how the content should be migrated. For example: Should all the history be migrated or just a snapshot of the content?:: Doing the former ''might'' be possible with a page package whereas the latter is definitely possible with a page package. The original request was for the migrated wiki to contain a full history (see the announcement links on ConfluenceConverter) -- BradleyDean <> . Right. It looks like it might be possible with page packages to write a [[HelpOnPackageInstaller|package installer]] script that replays at least the edits with their author details, if not including the timestamp information. -- PaulBoddie <> . It should be possible to preserve the timestamp information given recent changes to Moin, but the package installer would need to be patched to take advantage of this. -- PaulBoddie <> Should the history preserve metadata such as the editor and the date and time?:: If so, the edit log would probably need to be manipulated to reflect the correct history. As above, ideally the meta-data will be migrated where possible. -- BradleyDean <> . As noted above, if the package installer can do this without us having to manipulate the edit log directly, this might not be so difficult. If the timestamp is required, perhaps we could extend the installer to modify such details. -- PaulBoddie <> How should user profiles be migrated?:: Even if history metadata isn't important, it would probably be desirable to migrate profiles even if it isn't possible to preserve things like passwords. How might comments and other non-page features be incorporated into the migrated Wiki?:: Moin doesn't directly support various Confluence features, but things like comments could reside on subpages according to a convention. Should the content be audited and filtered during the process?:: If there are spam pages or spammer user profiles, we could filter them out, but this would probably occur between parsing the Confluence XML dump and importing into Moin.