We want to be sure that the description of the sample datamart is as complete as possible
In v1.1, the Mifos application will support reports that are
This approach has several drawbacks:
After v1.1 our goal is to provide a datamart that will make building new reports
To that end, the version 1.1 relase comes with a small sample datamart and a collection of batch jobs that populate it. The jobs were designed with the open-source tool Talend Open Studio The tool generates Java source that can be executed within an application server like Tomcat or run directly from the command line. You can find on-line documentation on Talend at http://www.talend.com/resources/documentation.php and a helpful tutorial at http://www.talendforge.org/tutorials/menu.php .
This should give developers an introduction to what we plan to build going forward. Consider this a work in progress for evaluation, but not yet production-ready. Your feedback is appreciated. Please post comments on MifOS's forums.
The MifOS team thanks Ravi Kutaphale, IBM-Dublin, for this important contribution to the MifOS project.
MifOS release 1.1 includes this work in progress:
We have used dimensional modeling, a technique that seeks to
present the data in a standard framework that is intuitive and allows for high-performance access. In this model, a datamart consists of one table with a multi-part key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single part primary key that corresponds exactly to one of the components of the multipart key in the fact table. This characteristic star-like structure is often called a star join. (R. Kimball, L. Reeves, M. Ross, W. Thornwaite. The Data Warehouse Lifecycle Toolkit. Wiley, 1998. )
Dimension tables contain information about business entities that do not change over time, or at worst are slowly changing. The function of the attributes in a dimension table are to filter, group, and label facts to be reported on.
list-table:: Dimension Tables:widths: 20 80
:header-rows: 1
- - Name - Description
- -
head_office_dim
- This and the following tables reflect the standard office hierarchy- -
regional_office_dim
-- -
subregional_office_dim
-- -
area_office_dim
-- -
branch_office_dim
-- -
center_dim
- Standard client hierarchy- -
group_dim
-- -
client_dim
-- -
client_loan_account_dim
-- -
loan_officer_dim
-- -
client_status_dim
- Has pre-populated values for differrnt client statuses'- -
time_dim
- Designed to hold maximum information about particular day.
list-table:: Fact Tables:widths: 20 80
:header-rows: 1
- - Name - Description
- -
client_loan_fact
- Daily snapshot data for the client loans- -
client_status_fact
- Periodic snapshot of the status of all clients in the system- -
group_status_fact
- Periodic snapshot of the status of all groups in the system
The sample Talend design jobs can be found in subversion at : trunk/mifos/TalendETL ( https://mifos.dev.java.net/source/browse/mifos/trunk/mifos/TalendETL/ ).
The ETL jobs read the data from the MifOS transactional schema, apply transformations to the data and persist it in the new data warehouse. Talend provides a graphical user interface to generate the ETLs. The output from Talend design is a plain JDBC client. This java code can be executed as a standalone client, and thus can be executed via cron or a similar scheduler. For editing the ETLs, Talend generates the design files which can be imported in Talend and edited accordingly.
The ETLs are in the SVN repository under trunk/mifos/TalendETL This package is self sustained. After you checkout the ETLs, you can build the ETLs using the build file in TalendETL.
list-table:: ETL Jobs:widths: 10 10 10 70
:header-rows: 1
- - Name - Source Tables - Destination Tables - Comment
- - MasterETLLoader - - All - executes all the ETL jobs in a proper sequence. When running the ETLs for the first time, the master ETL is the best option. It runs all the ETLs populating the parent tables and then the child tables.
- - RegionalOfficeLoader - office - head_office_dim, regional_office_dim - the regional office is linked to head office by a foreign key, so it is necessary to first populate the head_office_dim and then the regional_office_dim. The correlation between the office table (transaction schema)and various office dimensions is done via "global_office_num".
- - SubregionalOfficeLoader - office - subregional_office_dim -
- - AreaOfficeLoader - office - area_office_dim -
- - BranchOfficeLoader - office - branch_office_dim -
- - ClientLoanOfficeLoader - personnel, customer - loan_officer_dim -
- - CenterDimensionLoader - customer - center_dim -
- - GroupDimensionLoader - customer - group_dim -
- - ClientDimensionLoader - customer - client_dim -
- - ClientLoanAccountLoader - customer, account, loan_account - client_loan_account_dim -
- - ClientStateFactLoader - customer, customer_state, customer_state_flag, customer_flag_detail - client_state_fact -
- - ClientLoanFactLoader - account, loan_account, loan_arrears_aging - client_loan_fact -
- - GroupStateFactLoader - customer, customer_state, customer_state_flag, customer_flag_detail - group_state_fact -
- - TimeDimension - - time_dim -
- - DailyTimeDimensionLoader - - time_dim - The ETL to populate the time_dim dimension with the current date. Not all the attributes in the time dimension are populated by this ETL.
- - populate_client_status.sql - - client_status_dim -
- - populate_group_status.sql - - group_status_dim -
From ecable Wed Apr 9 10:42:38 -0700 2008 From: ecable Date: Wed, 09 Apr 2008 10:42:38 -0700 Subject: Message-ID: <20080409104238-0700@mifos.org>
fixed some of the wiki-formatting