Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Table of Contents

Introduction

...

Mifos

...

Data

...

Warehouse

...

1.0

...

is

...

a

...

star

...

schema

...

data

...

warehouse.

...

 Its purpose is to make reporting on data as easy and as available as possible.  This can mean a number of things like:

  • maximise the ease of writing SQL queries or using query products on top of the data structure
  • an increased confidence in data quality resulting from the data analysis that's part of moving from the large mifos data structure to the small star schema data structure
  • reports running faster 

The idea is to increase the access and availabity of data over time.  One part of this is to continue to simplify the data structure over time by  doing extra work upfront during data loading and by the use of meta data to hide difficulties.
This document isn't a tutorial on how to write sql or exactly what's in the data warehouse but it should give a flavour for writing queries against the data warehouse and the best place to get examples (the standard reports shipped with mifos).  Contains:

  1. Examples of vanilla but important reporting uses
  2. The way to get at loan arrears information quickly and easily
  3. Examples of customer and account based reporting that is not so vanilla

Vanilla But Important Reporting on the Star Schema

The following query shows how to get a breakdown of repayment transactions for a branch for a quarter.  This is a typical star schema type query.  The 'fact' table is joined to the 'dimension' tables that are used for filtering and grouping.
Repayment transactions cover normal repayments, adjustments, reversals, reschedules and write-offs.

Panel

select aa.account_action_name,

sum(r.principal_amount

+

r.interest_amount)

as

amount

from

fact_loan_repayments

r


join

dim_account_action

aa

on

aa.account_action_key

=

r.account_action_key


join

dim_date

d

on

d.date_key

=

r.action_date_key


join

dim_office

o

on

o.office_key

=

r.branch_key

where

d.year_quarter

=

'2010-Q1'


and

o.display_name

=

'Branch

No.

54'

group

by

aa.account_action_name


order

by

amount

desc

This

...

simple

...

query

...

does

...

quite

...

a

...

lot.

...

 Its a trivial matter to change 'branch'

...

for

...

'loan

...

officer',

...

'center',

...

'group'

...

and

...

other tables that are used for filtering and grouping data.  There are other 'fact' tables in the data warehouse that cover such things as disbursals and savings.  It's the same approach to querying for all of them (consistency of query pattern is a strength of this type of data warehouse).
Finally, here's a query to get a breakdown of the number and amount of disbursals per loan_officer by branch by month for 2009 (loan officers can move branch).

Panel

select p.display_name as loan_officer, o.display_name as branch, d.month_name,

sum(disbursal_count)

as

disbursals,

sum(disbursal_amount)

as

disbursal_amount

from

fact_loan_disbursals

disb


join

dim_date

d

on

d.date_key

=

disb.disbursal_date_key


join

dim_office

o

on

o.office_key

=

disb.branch_key


join

dim_personnel

p

on

p.personnel_key

=

disb.loan_officer_key

where

d.year4

=

2009


and

p.is_loan_officer

is

true

group

by

p.display_name,

o.display_name,

d.month_number


order

by

p.display_name,

o.display_name,

d.month_number

Loan

...

Arrears

...

Information

...

Because

...

it's

...

quite

...

complicated,

...

 the data warehouse pre-calculates

...

information

...

about

...

each

...

loan

...

that

...

is

...

in

...

arrears.  Data is snapshot by day.  The query below gets arrears information that is past a mifos configured 'lateness' number of days as of end 2010-03-31.

...

 It does it for a loan officer and breaks the results down into pre-determined weekly bands.  Replacing 'weekly' by 'monthly' give the same data in monthly bands.

Panel

select wb.past_due_description,

 

 
count(distinct(hla.customer_key))

as

clients,

count(distinct(hla.loan_account_key))

as

loans,


sum(total_in_arrears)

as

total_in_arrears,sum(total_outstanding)

as

total_outstanding

from

hist_loan_arrears

hla


join

dim_date

endperiod

on

endperiod.date_key

=

hla.as_of_date_key


join

dim_arrears_band_weekly

wb

on

wb.arrears_band_weekly_key

=

hla.arrears_band_weekly_key


join

dim_personnel

lo

on

lo.personnel_key

=

hla.loan_officer_key


join

dw_mfi_configuration

config

on

config.mfi_configuration_key

=

1

where

endperiod.date_value

=

date('2010-03-31')


and

lo.display_name

 

 =

'

M P Smith' and

XXX'
and hla.days_in_arrears

>

config.loan_lateness_days

group

by

wb.past_due_description


order

by

wb.arrears_band_weekly_key

 

Customer

...

and

...

Account Based Reporting

In the mifos standard reports... there's

...

a

...

little

...

of

...

the

...

vanilla

...

reporting

...

above,

...

quite

...

a

...

bit

...

of

...

the

...

loan

...

arrears

...

reporting

...

but

...

mostly it's customer and account based.  This type of reporting against the data warehouse is a little more involved than the vanilla but there are 2 main positives.

  1. it's still better than querying the mifos application database 
  2. most importantly, it caters for customer hierarchy changes (loan officer movements, group and client transfers etc) as well as customer and account status changes.  Basically,historical reporting like comparing one time period to another is made possible.

The main tables (dim_customer, dim_loan and dim_savings) each link directly to a full mifos hierarchy... branch, loan officer, center, group etc... but the thing which is different from normal mifos tables is that one customer, for example a client, will have a number of entries in dim_customer.  Each time the hierarchy changes (e.g. client or group transfer, loan officer assignment) or there is a status change... a new entry is added.  The entry is valid for a period of time which is denoted by a pair of date fields called valid_from and valid_to.  If you want to find the unique version of the client that is relevent on 2010-01-01 you add the following to a query...

Panel

and valid_from <= date('2010-01-01')


and

valid_to

>

date('2010-01-01')

It

...

took

...

me

...

a

...

little

...

while

...

to

...

get

...

used

...

to

...

this.

...

 Also some of the queries felt a little complicated but still better than working with the mifos application database structure.  Below are a couple of examples from the standard mifos reports.  If you have the Pentaho Report Designer you can get the .prpt files from git://mifos.git.sourceforge.net

...

/

...

mifos/bi

...

and

...

look

...

at

...

the

...

queries.

...

a)

...

as

...

of

...

2010-01-01,

...

how

...

many

...

centers

...

did

...

loan

...

officer

...

XXX

...

manager

...

Panel

select

count(distinct(customer_id))

as

centers_managed

from

dim_personnel

p


join

dim_customer

c

on

c.loan_officer_key

=

p.personnel_key

where

p.display_name

=

'XXX'

&nbsp; and

 /* get loan office entry that is valid for date*/
and p.valid_from

<=

date(${client_summary_end_date}) and

date('2010-01-01')
and p.valid_to

> date(${client_summary_end_date}) and

> date('2010-01-01')

and c.customer_status

=

'CenterStatus-Active'


and

c.valid_from

<=

date(${client_summary_end_date}) and

 date('2010-01-01') /* get center entry that is valid for date*/
and c.valid_to

> date(${client_summary_end_date}) b) as of

> date('2010-01-01')

b) as of 2010-01-01,

...

how

...

many

...

clients with

...

active

...

loans

...

did

...

loan

...

officer

...

XXX

...

manage

...

Panel

select

count(distinct(c.customer_id))

as

clients_with_loans

from

dim_personnel

p


join

dim_customer

c

on

c.loan_officer_key

=

p.personnel_key


join

dim_loan

l

on

l.customer_key

=

c.customer_key

where

p.

personnel

display_

id = ${client_summary_personnel_id}&nbsp; and

name = 'XXX' 
and p.valid_from

<=

date(${client_summary_end_date}) and

 date('2010-01-01')
and p.valid_to

> date(${client_summary_end_date}) and

> date('2010-01-01')

and c.customer_level_id

=

1 and

1 /* client... without this groups would get picked up as well*/

and l.loan_status

in

('AccountState-ActiveInGoodStanding',

'AccountState-ActiveInBadStanding')


and

l.valid_from

<=

date(${client_summary_end_date}) and

 date('2010-01-01')
and l.valid_to

> date(${client_summary_end_date})) d, c) as of

> date('2010-01-01')

c) as of 2010-01-01,

...

what

...

were

...

the

...

repayments

...

on

...

active

...

loans

...

that

...

loan

...

officer

...

XXX

...

dealt

...

with.

...

 Here's

...

where

...

I

...

struggle

...

a

...

bit.

...

 It's

...

a

...

little

...

struggle

...

technically

...

but

...

really

...

its

...

a

...

bigger

...

struggle

...

to

...

make

...

sure

...

you

...

are

...

asking

...

the

...

right

...

question.

...

 The query below only looks at non-adjusted repayments for active loans managed by a loan office at a certain point in time... which is okay if that's what the question is.

Panel

select ifnull(sum(principal_amount),0.0)

as

principal_paid,

ifnull(sum(interest_amount),0.0)

as

interest_paid

from

fact_loan_repayments

paid


join

dim_loan

l

on

l.loan_account_key

=

paid.loan_account_key


join

dim_personnel

p

on

p.personnel_key

=

paid.loan_officer_key


join

dim_

date d on d.date_key = paid.action_date_key where

account_action aa on aa.account_action_key = paid.account_action_key

where p.personnel_id

= ${accounts_summary_personnel_id}&nbsp; and

= 'XXX'
and p.valid_from

<=

date(${accounts_summary_end_date}) and

 date('2010-01-01')
and p.valid_to

> date(${accounts_summary_end_date}) and d.date_value &nbsp;<= date(${accounts_summary_end_date}) and

> date('2010-01-01')

and paid.action_date  <= date('2010-01-01')
and aa.account_action_id = 1
and paid.adjusted is false

and l.loan_status

in

('AccountState-ActiveInGoodStanding',

'AccountState-ActiveInBadStanding')


and

l.valid_from

<=

date(${accounts_summary_end_date}) and

 date('2010-01-01')
and l.valid_to

> date(${accounts_summary_end_date})) b,

> date('2010-01-01')