Data Migration Tips

The following consists of firsthand tips and experiences on the data migration process from our program manager to help evaluate if it's worth the effort.

You’ve evaluated Mifos, checked out all its functionality, like what you see and are now preparing to roll it out. Only one problem, how to get data from your old system into Mifos.  Seems simple enough, loans are loans, clients are clients, etc. Just move it over right?   

If only it were that simple.

Imagine if you were trying to take an engine out of a car and stick it in a pickup truck. Or even worse, imagine it was a lawnmower engine you were trying to move!  What may seem like a simple task, even with a simple source database, can quickly turn into quite an ordeal.  You may have to deal with things like:

  • Bad design, both functionaly and data modeling wise
  • Bugs in the original system leading to bad data
  • Missing data requirements
  • Bad or dirty data
  • Calculation differences
  • Differing degrees in accuracy
  • Differing states for loans, clients, transactions, etc

Essentially you are taking one design and trying to move it to another, and without some standard set (Which I don’t believe there is any at the moment), you are guaranteed to have issues. 

This is not meant to scare you off, but rather provide you with a reality check.  Take for instance a data migration we are currently working on.  We have three full-time people working on it, two developers and an Analyst.  In addition to that, there is also the client and I and we're each spending quite a few hours each week attempting to resolve the issues that come up.  Development time will be somewhere between two to two and a half months, but beforehand, both the development team and I spend quite a few weeks doing an initial analysis and the analysis is still going on.  The Analyst has been working full-time since researching every possible issue and figuring out the best solution, trying to stay ahead of the developers.  Added all together, there is probably somewhere between three to four months of work, spread across five people. 

And the source database was simple, out of some 60+ tables we’re only working with about 25 actual tables.  Out of those, only six really hold relevant data, most every other table is a reference table or holds only a few pieces of information.  What all are we migrating?  Just Clients, Groups, Loans, and Transactions.  Everything else is set up manually (Office Hierarchy, all users (400+), config, etc). 

It gets worse.  We’re also working with a moving target.  We have to constantly update the mapping files (between the source and Mifos system) for users and offices everytime we make changes.   We are creating a set of queries to run before doing migrations to try and suss out every possible type of dirty data that may have appeared between development and the migration.   It can't really helped, the MFI has to make forward progress even as we put these scripts together.

In just this last week, much like all other weeks, there were 17 issues we had to resolve.  When we’re lucky, it may take five or ten minutes to determine the right strategy, or maybe 20 minutes to resolve some bad data in the source database (and even then, if its a data issue, we often have to update our starting Mifos database and the source db, and send off updates to the developers!).  When we’re not, it can take a lot longer.  One issue this week required a 45 minute call, on top of quite a few emails exchanged and hours upon hours of research.

Here are some examples of some of the issues we come across:

  • The source database does not match transactions with installments in the payment schedule.  This means we’re having to develop a complex set of logical tests to determine which transactions go with what payments, especially since the dates aren’t reliable (in this instance, lots of clients make their payments before the due date).  We have to deal with partial payments, over payments, early repayments of loans, written off loans (calculating back what payments were written off), etc.
  • When repaying loans, Mifos and the source calculated it differently.  The source would charge the interest still left, Mifos wouldn’t (as the loan is paying off early).  So we had to figure out how to get around this in a way and still model it so they could pull meaningful reports off it
  • Missing client data in the source database:  things like spouse information, poverty status, and meeting schedule dates
  • In Mifos, a client is linked directly to a loan officer however, this is not so in the source.  In the source the client is linked to a loan and loans are linked to a loan officer, so we had to backtrack to figure this out.  Seems simple correct?  But what if the client does not have a loan?  Who do they get assigned to?  What if the client has loans under different loan officers?  How do we determine which loan to use for determining who is the appropriate loan officer?
  • Groups in the source database tend to be transitory, may be part of a group for one loan and not after.  In Mifos, this is not the case, most groups are set to be groups for awhile.  So we’ve had to determine the proper strategies for how we track customer movement between groups, opening and closing of groups, etc.  Additionally, groups are tracked through loans, so we had to go through them again.  Normally you would think it would make sense to move clients and groups together at the same time, not so in this case.  Made more sense to do clients first and then move groups with loans.
  • My favorite.  Source system has three different interests that can be added to a loan, while Mifos allows for one.  Through some testing and validation we determined that when all three of the interests were calculated the same way as Mifos (off a declining balance), we could easily add them together to determine the rate.  The problem is at one time one of the interests was calculated differently, meaning we couldn’t do that.  The solution?  Copy over the transactions, add up all the interest collected and determine the exact interest rate that way.

Those are just six examples, and remember, on an average week we had 17 issues to resolve!  And timing can be critical because sometimes the developers won’t find an issue until they run right into it and may be blocked until you can provide a solution. 

The thing with data migrations is you simply cannot figure everything out beforehand, or rather you’d probably spend twice as long as if you were to just do it.  It’s an iterative process, its extremely explorative and half the time you won’t find something until you start working with it directly.  That, unfortunately, makes it hard to plan for, but there is not much that can be done.  Fortunately, we’re collecting more and more information about moving data into Mifos, mapping what is required for creating clients, groups, loans, etc.  That should help reduce some of the time required.

In the end, data migrations are completely doable and there is nothing to be scared of, just make sure you plan very, very conservatively.

Ryan, Mifos Program Manager