Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

The list and mock-ups of the 16 are in a spreadsheet.  Rather than link to this spreadsheet (it does change), here's a link to the Loan Officer Detailed Report - Content   (No. 1 priority).  This gives a good idea of the types of complex reports that mifos and its clients have attempted to build from the mifos production schema over the years:

...

The main design is just in following the standard dimension modelling (star schema) approach.  There are lots of onlines sources about dimensional modelling and star schemas.  http://www.rkimball.com/ is a good one because Ralph Kimball is pretty much the industry guru in the area for many years.

  • The only data changes that matter to the gnarly reports are hierarchy and status changes.  Removing the complexity of identifying hierarchy and status changes for customer and accounts occurs during the ETL process and is reflected as a 'stack' of dim_personnel, dim_customer, dim_loan and dim_savings entries each having a 'valid_from' and a 'valid_to' date pair.
  • I've considered branch, loan officer, center, group, and client to be one big hierarchy and collapsed it during the ETL onto all the relevant dim and fact (and hist) tables. So, metrics involving loans and loan officers don't require 'joins' up the hierarchy to get to loan officer, its pretty much already available.
  • All the tables (except the generic dim_date one) currently have very few data 'columns'.  They are mostly 'keys' and standard audit columns e.g. A central table like dim_customer only has customer_id, display_name, customer_level_id, customer_status and gender.   The others have less.  This is because... the gnarly reports don't need the data.  This approach is not necessarily typical of dimension tables though.  Of course, additional columns can be added but I took the approach that if its not needed don't add it yet to keep the database structure very small and easy to report from (not for performance reasons).

Relatively Easy and Consistent Querying

It's relatively easy to write the query for 'What clients were active at a certain period for a branch' e.g. something like:

select count(distinct(c.customer_id)) as number_of_active_clients
from dim_office o
join dim_customer c on c.branch_key = o.office_key
where o.display_name = 'my branch'
and c.customer_level_id = 1
and c.customer_status = 'ClientStatus-Active'
and c.valid_from <= '2010-02-01'
and c.valid_to > '2010-02-01'

...