Data Migration Schema And Tool Choices

XML Schema Language and Tool Choices

There are three main choices for defining XML documents: DTDs, W3C XML Schema (WXS), and RELAX NG (RNG). WXS and RNG are both preferable to DTDs because of their greater expressive power. WXS has become the defacto standard for expressing XML schemas despite RNG being an approved ISO standard that is arguably superior to WXS. RNG has the advantages of having greater expressive power, being simpler to learn/use and easier to read. Because WXS is so widely used as compared to RNG, there are many more WXS tools.

Here are some factors to consider when comparing WXS and RNG for schemas:

  • ability to express data for data migration
  • ease of use/understanding for Mifos development/maintenance
  • ease of use/understanding for Mifos specialists doing data migration
  • tool support for validation, code generation

Pros and Cons

RELAX NG

Pros

  • able to express data migration data
  • easier to use/understand for Mifos development/maintenance than WXS
  • easier to use/understand for Mifos specialists who are not familiar with WXS

Cons

  • lack of good tool support
  • if Mifos specialists are familiar with XML they are more likely to be familiar with WXS

W3C XML Schema

Pros

  • able to express data migration data
  • widespread tool support and industry adoption
  • if Mifos specialists are familiar with XML they are more likely to be familiar with WXS
    Cons
  • harder to use/understand for Mifos development/maintenance than RNG
  • harder to use/understand for Mifos specialists who are not familiar with WXS

Discussion

The amount of effort required to implement data import/export for Mifos can potentially be reduced if a schema-to-java code generator can be used to eliminate the need to write and maintain java code for reading and writing Mifos import/export XML files. RELAX NG code generator candidates:

  • relaxer (last release in 2003)
  • relaxNGCC (last release in 2002)
  • JAXB (current development, but has "experimental RELAX NG support" that as of Feb. 2007 appears not to be working)

W3C XML Schema code generator candidates: _ JAXB (current development) _ Castor (current development) _ Zeus (last release 2003) _ JBind (last release 2004) * Quick (last release 2001)

Depending on a code generator that is not being actively developed is risky since the tool will not track changes in java releases and there is no chance for bugfixes in the tool unless you do it yourself. This constraint pares things down to JAXB and Castor and since JAXB does not currently have working support for RNG it means not '''directly''' working off an RNG schema.

Recommendations

RNG would be preferable to use for Mifos import/export schema definition because of its ease of use, but as noted, tool support is an issue and in particular the lack of a code generator that is being actively developed. However RNG schemas can be used to generate WXS schemas using an automated tool (trang). This opens the possibility of maintaining the Mifos import/export schema in RNG, and generating a WXS schema from that. This would allow us to present both RNG and WXS schemas to Mifos specialists and to use the WXS schema as a basis for code generation.

SUN's current reference implementation of JAXB appears to have a good feature set and a potential path to direct RNG support in the future, so it would be a reasonable choice over Castor. If full RNG support were added to JAXB in the future, then code generation could be done directly from the Mifos RNG schema (though the WXS schema could still be generated for use with other tools or for developers more familiar with WXS).

Plan B

Having laid out the primary recommendation, it is worth considering some possible fallbacks.

If it turns out that using RNG as the primary Mifos import/export schema language does not seem to offer enough benefit over using WXS directly, then RNG can be abandoned and the schema can be maintained directly in WXS. This move would not impact the choice of JAXB for code generation since JAXB would already be working of off the WXS schema. There are also tools for generating RNG schemas from WXS schemas so we could even switch to WXS as the primary schema but still offer a generated RNG schema if it had some utility.

Code generators can be a mixed blessing, so if JAXB turns out to be problematic for some reason, then we could always fall back on tools like JDOM or dom4j. Although my experience has been with JDOM, dom4j might be the better choice for this project since it has support for XML Schema datatypes and the MSV (multi-schema validator) library which can validate against both RNG and WXS.