ZWiki to Confluence Wiki Converter Tool

The usage of the ZWiki to Confluence Wiki Converter Tool built for Mifos will be described on this page. FTR, initial notes were on Migrate Developer Wiki, this is what it became.

MIFOSADMIN-54 issue & sub-tasks may be of interest.

Open / Next

  • How to handle Page ID vs. Page Name??
  • pagetree / pagetree:root=PageName|sort=natural|excerpt=true|reverse=false - why not working locally?!
  • Final
    • Actually export local Confluence and re-import to this real Confluence!
    • Which space to migrate into? Space KEY cannot be (easily) changed anymore... http://confluence.atlassian.com/display/DOC/Copy+Or+Rename+A+Space
    • Infra: Autom. HTTP redirect from old Wiki to new Wiki? Shutdown old Wiki?
    • Permissions on this Wiki.. for now anon read (e.g. of this page?!) is off, is this intentional?

Approach (Architecture)

The overall approach (to not say 'architecture') is to:

  1. get the old Wiki's content using a custom written Web crawler (spider), the ZWikiScraper
  2. turn the "restructured text"-like ZWiki markup into XML, using a custom written tool (ZWikiJRstConverter) based on JRst
  3. convert that XML into Confluence mark-up using a custom converter.zwiki-mifos.properties and custom HeaderParser subclasses in com.atlassian.uwc.converters.xml.jrst for the UWC XmlConverter
  4. upload those pages now in Confluence mark-up into a local Confluence Wiki (trial license), using Atlassian UWC
  5. import a dump of the local wiki into this Wiki (http://mifosforge.jira.com)

How To

Local Confluence

According to http://confluence.atlassian.com/display/JIRASTUDIO/2010/06/28/JIRA+Studio+2.2+Released this JIRA Studio is now (upgraded) running Confluence 3.1.2. This is not the latest current Confluence version, so get it from http://www.atlassian.com/software/confluence/ArchiveConfluenceDownloads.jspa, the "3.1.2 - Standalone for Production Usage (TAR.GZ Archive)".

http://confluence.atlassian.com/display/DOC/Installing+Confluence+Standalone+on+UNIX+or+Linux explains set-up. It's reasonably straightforward, mostly just edit the confluence/WEB-INF/classes/confluence-init.properties to point to a "data directory".

Using built-in hsqldb for now; update if any serious issues with that.

You'll then need a License Key... easy to get via my.Atlassian.com for 30 days (or may be also via a support ticket for this hosted Studio).

As explained on https://studio.plugins.atlassian.com/wiki/display/UWC/UWC+Quick+Start, you must have "a Confluence user/login id and password" and "The Remote API must be turned on. See Adminstration -> General Configuration -> Remote API (XML-RPC & SOAP), via http://localhost:8080/admin/editgeneralconfig.action#features.

Developer Workspace

Here is how to get a developer workspace with all the sources:

~$ mkdir WikiStuff; cd WikiStuff
~/WikiStuff$ git clone git://github.com/vorburger/JRst.git
~/WikiStuff$ cd JRst
~/WikiStuff/jrst.git$ mvn install
~/WikiStuff/jrst.git$ cd ..
~/WikiStuff$ git clone git://github.com/vorburger/ZWiki-to-Confluence-Wiki-Converter-Tool.git
~/WikiStuff$ cd zwikiscraper
~/WikiStuff/zwikiscraper$ mvn package

We cannot use the binary distribution of Atlassian's UWC as some new classes had to be added to it. Those new classes (only; there are appropriate .gitignore for the rest) are in Git in ZWiki-to-Confluence-Wiki-Converter-Tool/uwc. The idea is to take the UWC src from Atlassian's SVN (I used Revision: 39297) and 'overlay' it to the few files that are in git, like so:

~/WikiStuff$ cd ZWiki-to-Confluence-Wiki-Converter-Tool/uwc
~/WikiStuff/ZWiki-to-Confluence-Wiki-Converter-Tool/uwc$ svn co https://svn.atlassian.com/svn/public/contrib/confluence/universal-wiki-converter/devel/

You can now open the three Eclipse projects in WikiStuff/ZWiki-to-Confluence-Wiki-Converter-Tool/zwikiscraper WikiStuff/ZWiki-to-Confluence-Wiki-Converter-Tool/uwc & WikiStuff/jrst/jrst (two times jrst, yes!) in an Eclipse, best with m2eclipse installed.

Step-by-Step instructions for one-time Wiki migration

  1. set-up a working (nothing in red) dev workspace as outlined above
  2. run (from within Eclipse) the main() in ch.vorburger.mifos.wiki.ZWikiScraper, passing a valid Mifos Plone ZWiki uid/pwd as parameters -- this will spider/download a Wiki "dump" into target/wikiContent
  3. run (from within Eclipse) the main() in ch.vorburger.mifos.wiki.ZWikiJRstConverter -- this will convert the local RST Wiki dump into XML using JRst (the *.xml files will be next to the original RST TXT format, under zwikiscraper/target/wikiContent)
  4. use the following UNIX CLI commands to get a copy of the dump containing only the XML, in target/wikiContent.xml/:
    1. cd zwikiscraper/target
    2. mkdir wikiContent.xml
    3. cp -R wikiContent/ wikiContent.xml/
    4. cd wikiContent.xml/
    5. find . -type f ! -name "*.xml" -exec rm '{}' \;
  5. set the correct path in ZWikiMifos.0003.filepath-hierarchy-ignorable-ancestors.property in converter.zwiki-mifos.properties in WikiStuff/ZWiki-to-Confluence-Wiki-Converter-Tool/uwc/conf (which you got from git, see above)
  6. start UWC from within Eclipse via the UWC.launcher included in git, point it to localhost, choose the 'zwiki-mifos' converter, and select the (entire) wikiContent.xml/ folder
  7. Clicking 'Convert' in the UWC UI will now appropriately convert all *.xml files to Confluence mark-up (they will be in uwc/output directory) and directly upload them into your (local) Confluence
  8. Export Space, via Upper Left Corner: Browse/Space Admin, Tab: Advanced, LHS: XML Export (Space-based Export only is probably better than the full system backup stuff)
  9. Ask Atlassian hosted Studio Support to import the exported ZIP into this http://mifosforge.jira.com !

Note that there is slight delay from when UWC reports "Uploading Pages to Confluence... Uploaded X out of Y pages." until you can actually see the page in the local Confluence.

Uploading Pages to Confluence...
2010-07-02 00:16:02,952 INFO  [MIFOSADMIN:Thread-4] - Uploaded 3 out of 3 pages.
...dev workspace... local confluence ... dump ... import...

All imported & migrated pages will be in one Confluence "space". It is very easy to move pages around the hierarchy within this space, or to move pages between spaces, see e.g. http://www.screencast.com/users/BrendanPatterson/folders/Jing/media/f6f47332-ee0b-4876-aded-a47433581896.

Manually adding a 'pagetree:sort=natural|excerpt=true|reverse=false|startDepth=10' Confluence macro on the home page is recommended to make it look like the old Wiki's index.

While we are on Confluence Spaces: From experience at my day job, I would recommend that we should go easy on the number of difference spaces... unless there are e.g. different permission schemes, there isn't really a good reason IMHO, and so I would advocate "one or a very few large spaces" (e.g. DEV-OSS vs. ENDUSER-DOC vs. GRAMEEN for Deployments etc.) over "lot's of fine grained little spaces" (e.g. DEV-THIS and DEV-THAT and Infra and Test and Deployment and what else). Just a thought.

Open Problems

Wiki Markup Syntax Conversion (Minor)

Now tracked via http://mifosforge.jira.com/browse/MIFOSADMIN-54...

  • Tables - not done but may not be that hard - how important?
  • Attachments are not migrated. It would be.. "imaginable" (with the UWC), but some work. Not critical (manual) ?
  • Blockquotes work.. for one paragraph, several paragraph would need a better "flattening"
  • Relative links to Plone CMS pages outside the Wiki (obviously?) don't work (e.g. Read the `Developer Kick Start page <../>`_ for...)

Hanging (JRst)

One page (Admin Online Translation HOWTO), cause JRSt to hang in an infinite loop, for some reason. This happens to all pages where the *.rst ends without 1-2 CRs (newlines) at the end of the file. The simple work-around is to just append 2x CR. This even works while the ZWikiJRstConverter is stuck (endless loop keeps reading the file) ! (Before this was clear, the work-around was to manually rename the respective $content file by prefixing it with SKIP--*, and the ZWikiJRstConverter would skip it.)

A few pages (about 12 out of 457) cause internal exceptions in JRSt, and thus no XML is available for them to convert. Following http://mifosforge.jira.com/browse/MIFOSADMIN-70, those pages are imported as empty pages with the note "The Wiki conversion tool was unfortunately not able to migrate this page from the old Plone zWiki into Confluence. Please do it manually... sorry.", and tagged with 'tbd_mig_manual'.

Conversion failed for the following input pages (see above for detailed exceptions) :
target/wikiContent/KaysToUpdateFS/GLIMVersion2/content$GLIM - Version 2
target/wikiContent/KaysToUpdateFS/ReportsImprovements/ReportsFor11/content$Reports for 1.1
target/wikiContent/FrontPage/DeveloperDocumentation/DeveloperHowTos/HowToDevelopOnMifos/DevelopmentOverview/WritingAcceptanceTests/TestDataSet/content$Test Data Set
target/wikiContent/FrontPage/Archive/DecliningBalanceInterestCalculationWithEqualPrincipalInstallment/content$Declining Balance Interest Calculation with Equal Principal Installment
target/wikiContent/FrontPage/FutureReleases/ArchitecturePlans/SpringSpike/DeclarativeTransactionManagement/content$Declarative Transaction Management
target/wikiContent/MifosTroubleshootingFAQ/content$Mifos Troubleshooting & FAQ

Duplicate Pages

The http://www.mifos.org/developers/wiki/FrontPage/contents "index" page on the old Wiki, which is the entry point for the spider and subsequent migration, has a few strange duplicates. For example:

TBD how the migration handles this; it may 1. create both page (duplicating the content) and require a manual clean-up afterwards, or 2. only one of the pages may be taken, requiring a manual move (easy), or 3. none of them are migrated.

Stats

ZWikiScraper runs for about 20 minutes.

ZWikiJRstConverter says something like this at the end:

Conversion from reStructuredText to XML took 308s
Converted 445 sucessfully, but skipped 0 and failed 12 (total 457)

References