The Mifos data warehouse is populated by an "ETL" program. Although it can be run manually, it is usually scheduled to run each day just after midnight (although this isn't mandatory... you could skip weekends, holidays or run only once a week depending on how up-to-date you need the data warehouse to be).
In linux, the ETL would be scheduled via cron. In windows, via the task scheduler.
There are about twelve Mifos standard reports that rely on the Mifos data warehouse e.g. MFI Progress, Loan Officer Detailed, Loan Officer Summary, Due vs Collected. When entering parameters to run these reports you will see a note saying 'Data for this report last updated on <date>'. |
There are two ETL options available.
This option rebuilds the Mifos data warehouse from scratch each run. It is fine for test situations and for small MFIs. As a rule-of-thumb, if an MFI has about 10,000 clients and has been running on Mifos for 2 years, the Full ETL should take between 2 and 3 hours. The ETL should run on the same machine as the mysql database. If it doesn't the additional network traffic will slow down the process dramatically... by up to 4 times in the case of using Amazon RDS.
It will move all the data, including 'todays' into the data warehouse.
Run in linux using:
bash "full path down to"/ETL/MifosDataWarehouseETL/full_reload.sh DW_DB_Name PDI_HOME MYSQL_USER_DETAILS |
Where
DW_DB_Name is the name of your mifos data warehouse database
PDI_HOME is the path of your PDI installation e.g /opt/pentaho/data-integration/
MYSQL_USER_DETAILS is the mysql username and password (if required) details e.g. -uroot -prootpwd OR -uroot
Run in windows using:
full_load.bat DW_DB_NAME |
Where
DW_DB_Name is the name of your mifos data warehouse database
You will need to create the bat file yourself. Here is an example of the contents.
SET PDIPATH=C:\Users\John\Desktop\PentahoPlatform\pdi-ce-4.0.0-stable\data-integration\ SET DWPATH=C:\dev\businessIntelligenceRepo\bi\ETL\MifosDataWarehouseETL\ echo DROP DATABASE IF EXISTS %1; > dropdb.sql echo CREATE DATABASE %1; > createdb.sql mysql -uroot -pmysql < dropdb.sql mysql -uroot -pmysql < createdb.sql mysql -uroot -pmysql %1 < %DWPATH%load_mifos_datawarehouse.sql mysql -uroot -pmysql %1 < %DWPATH%load_mifos_datawarehouse_stored_procedures.sql mysql -uroot -pmysql %1 < %DWPATH%load_ppi_poverty_lines.sql mysql -uroot -pmysql %1 --default_character_set utf8 < %DWPATH%load_dw_ppi_survey.sql %PDIPATH%kitchen.bat /file:%DWPATH%DataWarehouseInitialLoad.kjb /level:Basic > transformationLog_%1.log EXIT |
This will populate the data warehouse with all data up to the end of yesterday. It only adds data that has been changed or added since the last run.
The incremental ETL is limited to data 'up to the end of yesterday' because Mifos auditing is generally at the day level. However, the reports based on the data warehouse shouldn't require 'right now' data (Note: the default date for date parameters may be later than the "last updated on" date stated on the parameter page... in this case you'll need to pick dates no later than the "last updated on" to ensure correct loan and savings balance data)
Run in linux using:
bash "full path down to"/ETL/MifosDataWarehouseETL/incremental.sh PDI_HOME |
where PDI_HOME is the path of your PDI installation e.g /opt/pentaho/data-integration/.
Note that no database name or user details are required. These get picked up from the definition of DestinationDB in data-integration/simple-jndi/jdbc.properties.
Run in windows using:
incremental_load.bat |
You will need to create the bat file yourself. Here is an example of the contents.
SET PDIPATH=C:\Users\John\Desktop\PentahoPlatform\pdi-ce-4.0.0-stable\data-integration\ SET DWPATH=C:\dev\businessIntelligenceRepo\bi\ETL\MifosDataWarehouseETL\ %PDIPATH%kitchen.bat /file:%DWPATH%DataWarehouseIncrementalLoad.kjb %1 /level:Basic > transformationLog_incremental.log EXIT |
Yes, it is possible to pass a date parameter (yyyy-mm-dd) to the incremental ETL. This is an "up to" date. However, this shouldn't be done in normal usage. The parameter is used for testing and, potentially, when building up the data warehouse for a large MFI (i.e. when tweaking and handholding the ETL). |
The Incremental ETL doesn't derive its full benefit until database changes slated for Mifos 2.2 are implemented. However, these changes can be implemented on top of mifos 2.1 Mifos 2.2 Potentially Large DB Changes and PreUpgrade Option
If the incremental ETL fails you need to get to a clean starting point before running it again. Currently, there is no out-of-the-box way to do this provided. However, there are a number of ways to approach this depending on the size of your data.
/*note that as_of_date is in yyyy-mm-dd format whereas 'key' dates are in yyyymmdd format*/ delete from hist_savings_balances where as_of_date > '2011-08-01'; delete from fact_loan_disbursals where created_date_key > '20110801' ; delete from fact_client_attendances where meeting_date_key > '20110801'; |
The "Full ETL" option will always wipe out current information. So, nothing needs to be done to switch from Incremental to Full.
To switch from Full to Incremental, first drop the mifos datawarehouse (usually called mifos_dw)
drop database mifos_dw; |
The first run of the Incremental will build all the data up to yesterday.
There are tweaks and trade-offs that can be made when running the ETL that may be relevent for larger MFI (50K+ clients). These are best discussed on a case-by-case basis via the mifos-users list.