Example Data Mart
This page is under construction
We want to be sure that the description of the sample datamart is as complete as possible
In v1.1, the Mifos application will support reports that are
- built directly from the Mifos database repository using SQL, and
- built by accessing the Mifos business layer using BIRT scripted data sources.
This approach has several drawbacks:
- Report designs are tightly coupled to operational database schema. When that schema changes for operational reasons, existing reports must be rebuilt.
- Using scripted data sources insulates the reports from changes to the database. However, creating reports may impact the overall performance of MifOS as the process contends with MifOS for resources.
After v1.1 our goal is to provide a datamart that will make building new reports
- easier and less error-prone – the datamart's star-schema are optimized for reporting and tend to more intuitively reflect the business. This makes SQL queries far simpler than those that access fully normalized schema that are optimized for daily operations.
- not require Java skills and deep knowledge of MifOS's object model in order to design scripted data sources
- less subject to underlying schema changes required for operational efficiency
- not have to contend with MifOS operational resources – the tasks to populate datamarts and build reports from them can occur in a separate environment.
To that end, the version 1.1 relase comes with a small sample datamart and a collection of batch jobs that populate it. The jobs were designed with the open-source tool Talend Open Studio The tool generates Java source that can be executed within an application server like Tomcat or run directly from the command line. You can find on-line documentation on Talend at http://www.talend.com/resources/documentation.php and a helpful tutorial at http://www.talendforge.org/tutorials/menu.php .
This should give developers an introduction to what we plan to build going forward. Consider this a work in progress for evaluation, but not yet production-ready. Your feedback is appreciated. Please post comments on MifOS's forums.
The MifOS team thanks Ravi Kutaphale, IBM-Dublin, for this important contribution to the MifOS project.
Version 1.1 contents
MifOS release 1.1 includes this work in progress:
- database schema for the sample datamart, which currently contains client dimensions, office dimensions, a time dimension, and fact tables for client status and client loan accounts.
- Talend-generated source code for batch jobs that populate the datamart's dimensional and fact tables.
Datamart design
We have used dimensional modeling, a technique that seeks to
present the data in a standard framework that is intuitive and allows for high-performance access. In this model, a datamart consists of one table with a multi-part key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single part primary key that corresponds exactly to one of the components of the multipart key in the fact table. This characteristic star-like structure is often called a star join. (R. Kimball, L. Reeves, M. Ross, W. Thornwaite. The Data Warehouse Lifecycle Toolkit. Wiley, 1998. )
Datamart tables
Dimension tables
Dimension tables contain information about business entities that do not change over time, or at worst are slowly changing. The function of the attributes in a dimension table are to filter, group, and label facts to be reported on.
list-table:: Dimension Tables:widths: 20 80
:header-rows: 1
- - Name - Description
- -
head_office_dim
- This and the following tables reflect the standard office hierarchy- -
regional_office_dim
-- -
subregional_office_dim
-- -
area_office_dim
-- -
branch_office_dim
-- -
center_dim
- Standard client hierarchy- -
group_dim
-- -
client_dim
-- -
client_loan_account_dim
-- -
loan_officer_dim
-- -
client_status_dim
- Has pre-populated values for differrnt client statuses'- -
time_dim
- Designed to hold maximum information about particular day.
Fact Tables
list-table:: Fact Tables:widths: 20 80
:header-rows: 1
- - Name - Description
- -
client_loan_fact
- Daily snapshot data for the client loans- -
client_status_fact
- Periodic snapshot of the status of all clients in the system- -
group_status_fact
- Periodic snapshot of the status of all groups in the system
Overview of the jobs
The sample Talend design jobs can be found in subversion at : trunk/mifos/TalendETL ( https://mifos.dev.java.net/source/browse/mifos/trunk/mifos/TalendETL/ ).
The ETL jobs read the data from the MifOS transactional schema, apply transformations to the data and persist it in the new data warehouse. Talend provides a graphical user interface to generate the ETLs. The output from Talend design is a plain JDBC client. This java code can be executed as a standalone client, and thus can be executed via cron or a similar scheduler. For editing the ETLs, Talend generates the design files which can be imported in Talend and edited accordingly.
Building the jobs
The ETLs are in the SVN repository under trunk/mifos/TalendETL This package is self sustained. After you checkout the ETLs, you can build the ETLs using the build file in TalendETL.
Simple steps to follow
- Checkout trunk/mifos/TalendETL.
- Build the jobs by running ant on the build file under TalendETL.
- Create the datamart using the SQL scripts in TalendETL/sql.
- Execution : The build compiles all the ETLs and copies all the relevent files in target folder. Edit the hibernate.properties under target/calsses/conf to point to correct transaction and datmart schemas. Run the execute_master_ETL.bat. This will execute all the ETLs in required sequence.
ETL Job Details
list-table:: ETL Jobs:widths: 10 10 10 70
:header-rows: 1
- - Name - Source Tables - Destination Tables - Comment
- - MasterETLLoader - - All - executes all the ETL jobs in a proper sequence. When running the ETLs for the first time, the master ETL is the best option. It runs all the ETLs populating the parent tables and then the child tables.
- - RegionalOfficeLoader - office - head_office_dim, regional_office_dim - the regional office is linked to head office by a foreign key, so it is necessary to first populate the head_office_dim and then the regional_office_dim. The correlation between the office table (transaction schema)and various office dimensions is done via "global_office_num".
- - SubregionalOfficeLoader - office - subregional_office_dim -
- - AreaOfficeLoader - office - area_office_dim -
- - BranchOfficeLoader - office - branch_office_dim -
- - ClientLoanOfficeLoader - personnel, customer - loan_officer_dim -
- - CenterDimensionLoader - customer - center_dim -
- - GroupDimensionLoader - customer - group_dim -
- - ClientDimensionLoader - customer - client_dim -
- - ClientLoanAccountLoader - customer, account, loan_account - client_loan_account_dim -
- - ClientStateFactLoader - customer, customer_state, customer_state_flag, customer_flag_detail - client_state_fact -
- - ClientLoanFactLoader - account, loan_account, loan_arrears_aging - client_loan_fact -
- - GroupStateFactLoader - customer, customer_state, customer_state_flag, customer_flag_detail - group_state_fact -
- - TimeDimension - - time_dim -
- - DailyTimeDimensionLoader - - time_dim - The ETL to populate the time_dim dimension with the current date. Not all the attributes in the time dimension are populated by this ETL.
- - populate_client_status.sql - - client_status_dim -
- - populate_group_status.sql - - group_status_dim -
From ecable Wed Apr 9 10:42:38 -0700 2008 From: ecable Date: Wed, 09 Apr 2008 10:42:38 -0700 Subject: Message-ID: <20080409104238-0700@mifos.org>
fixed some of the wiki-formatting