Introduction and Scope

Data Warehouse 1.0 is Stage One

The soon-to-be-released Mifos Technology Plan will probably have as one of its main goals "Transform Mifos into a best-of-breed business intelligence system for microfinance".  Part of this vision involves the development of a data warehouse that contains MFI data from the Mifos application as well as other sources.  The scope of this 'stage one' data warehouse initiative (DW 1.0) is to provide the ability to tackle the traditionally 'gnarly' mifos reports.

Focus on High Cost Gnarly Reports

Gnarly reports aren't the  'show me a list of clients and their addresses' type of report.  They are the operational and finance reports that show a raft of metrics, transaction totals and arrears aging information at MFI/Branch/Loan Officer/Center levels on current and historical date ranges.

Prior Mifos releases shipped with just a few reports... the Branch Progress report being typical of the gnarly type.  As a result, a number of clients have developed their own gnarly reports. (Ryan Whitney, Sam Birney, Sungard are some of the people who have been at the coalface).

Requirements

The purpose of DW 1.0 is to support a number of Operations and Finance 'standard reports' - many very gnarly. Currently, there are 16 reports which are split as follows:

High Priority - 7
Medium Priority - 5
Low Priority -4

While a goal for DW 1.0 is to be able to provide a more generic reporting capability than these specific 16 reports (and allow whatever slicing and dicing can be imagined), the 16 contain typical mifos reporting patterns and concepts and are pretty much the 'business requirements' that have been used for this design and implementation.

The list and mock-ups of the 16 are in a spreadsheet.  Rather than link to this spreadsheet (it does change), here's a link to the Loan Officer Detailed Report - Content   (No. 1 priority).  This gives a good idea of the types of complex reports that mifos and its clients have attempted to build from the mifos production schema over the years:

Reformat

DW 1.0 converts the 200+ mifos database into about 20 tables designed for easy reporting .  Some of the reports mentioned above are not supported (partially or fully) by DW 1.0.  They are reports that source data external to mifos or the 'show me a list of clients and their addresses' type reports.  To get DW 1.0 to support each of these reports can be decided on a case-by-case basis at they are being developed.

Design Stuff

Design in a Nutshell

The main design is just in following the standard dimension modelling (star schema) approach.  There are lots of onlines sources about dimensional modelling and star schemas.  http://www.rkimball.com/ is a good one because Ralph Kimball is pretty much the industry guru in the area for many years.

Relatively Easy and Consistent Querying

It's relatively easy to write the query for 'What clients were active at a certain period for a branch' e.g. something like:

select count(distinct(c.customer_id)) as number_of_active_clients
from dim_office o
join dim_customer c on c.branch_key = o.office_key
where o.display_name = 'my branch'
and c.customer_level_id = 1
and c.customer_status = 'ClientStatus-Active'
and c.valid_from <= '2010-02-01'
and c.valid_to > '2010-02-01'

This would be quite difficult on the production mifos schema as you'd have to consider the change_log table for customer status changes and the customer_hierarchy table for hierarchy changes (and then you'd worry about clients not in a hierarchy). Also the DW queries tend to have this type of pattern so writing them should result in less mistakes (also can use metadata to simplify).

Data Warehouse Database Design (Link)

Identifying Changed Data in Mifos

Mifos Audit Usage

The key 'changes' that are important for Data Warehouse 1.0 are related to offices, personnel (loan officers), customers (centers, groups and clients) and accounts (loan and savings).  Specifically, it's important to pick up hierarchy and status changes.  Customer hierarchy changes are the most varied and complex.

A revision of how auditing is done in mifos is on the technology plan.  This would impact the data warehouse ETL but would hopefully be more accurate and easy to work with than the current implementation (some notes below).
The specific logic for identifying changes is in the individual ETL jobs in the business intelligence git repository (git://mifos.git.sourceforge.net / mifos/bi) but the key aspects are:-

There are a couple of small bugs associated with the mifos audit list under Issues Outstanding in Data Warehouse 1.0 Non-automated ETL Testing Summary but the main points are the next ones.

Mifos Audits - Possible Improvement Areas

Data Quality Related

Its possible that there are some data quality issues with MFI's that have been using mifos over a number of releases due to such things as bugs and migration decisions.  Because of this... it's probable that the first time the Data Warehouse ETL is run there will be some data qualities issues to deal with... so allow some time for this.

In one MFI test done, a small % of transactions (savings, repayments, disbursals) occured before the mifos client creation date.  Some of these may be valid (e.g. redo loan transactions).

ETL

Top Level Pentaho Data Integration Job to Create Initial Data Warehouse

staging

Data Warehouse 1.0 Non-automated ETL Testing Summary (Link)

Data Warehouse 1.0 - Common Query Examples From Standard Reports (Link)