Decouple General Ledger (GL) from Mifos
IntroductionContents::
As of end 2009, GL financial transaction data represented more than 50% of the GK Mifos database and generating financial transaction data took 25-33% (load testing not done yet) of the time to save collection sheets.
There are questions over how much of the mifos GL data is used by GK and other customers. Some customers will have GL packages which they want mifos data interfaced into and some might just rely on mifos GL data.
Whatever the customer need, if GL data and processing can be 'mostly' decoupled/'taken out' of mifos and put in a 'mifos GL' area/system then there should be a good improvement in transaction through-put for collection sheets (and any transaction that needs to generate GL data). This would also pave the way for an effective 'horizontal scaling' (i.e. you could put the 'removed process' on a different box and database from mifos.
To gain full benefits from decoupling it would be necessary to understand who uses GL data (i.e. only a select few accounting people or every loan officer) and how up-to-date it needs to be (is a delay of a few minutes, a day, or a week very important). Also, how does the current mifos GL data meet current needs.
Impact
For a full decoupling, all the GL related areas need to be isolated and taken out of Mifos. e.g. | Anywhere financial data is being generated | Setting up and use of Chart of Accounts details | Attaching GL codes to product offerings
The general architecture of a decoupled solution would be
- Mifos sends GL related messages onto a queue (probably JMS based)
- A separate asynchronous process picks up the GL message and processes them.
- This separate process either interfaces with a GL package or grows its own GL system.
Incremental Approach Focused on Scalability
Because of the current focus on scalability, the following incremental approach should get specific scalability benefits quicker and with less risk. Note: probable benefits would be tested in the performance lab first.
Note: With extensive involvement of domain/requirement experts from GK and the mifos team the number of 'steps' would probably be reduced.
- Get Reduction in Mifos processing time
- Instead of writing all the 'trxn' tables during a relevant financial transaction, write something far simpler containing enough information to generate the same information later.
- Later would be a batch job which would process and tick off each item processed.
- Move the 'txrn' tables into a separate database
- This would involve moving to something like a two phase commit in relevant transactions.
- It might be a separate phase because the upgrade would take substantial time and might be best to not have toom much else depending on it.
- Move remainder of GL functionality out of Mifos
- This would include COA information and its links to product offerings.
- Move to full Message Queue based Approach
Outcome of Initial Spike - Mingle Card 2519
http://mingle.mifos.org/projects/mifos/cards/2519
Found out that what I had been descibing as GL transaction data also included payment/payment breakdown data.
Table financial_trxn is the sole GL transaction table. However, at 115million+ rows it has about a quarter of all mifos rows at GK in Dec 2009 (not necessarily 1/4 data volume). I'd estimate that the original hope of 25/33% collection sheet processing improvement would go down to around 10% (simulation pending).
The other 'trxn' tables I'd included in GL record a breakdown of account_payment's. I analysed the use and structure of these tables during the spike.
Specific Recommendation 1 - Decouple financial_trxn (GL generation)
Decouple GL transaction generation from mifos and mifos db. The separated GL generation process can be run in a batch window or at any time... preferably lower peak.
Prereqs:
- Verify that it's acceptable for GL generation to not be real time. http://mingle.mifos.org/projects/mifos/cards/2565
- Simulate running collection sheet processing under load without GL generation. http://mingle.mifos.org/projects/mifos/cards/2566
Specific Recommendation 2 - Implement a Payments 'Module'
Currently, when a payment is made, code which updates schedule, activity and performance data is interspersed with payment processing. This complicates understanding of the process and stalls attempts to improve scalability. Clearly isolating payment specific processing (and possibly schedule, activity and performance processing) allows for easier changes to the code area e.g. when the payment processing is hidden behind an interface then the implementation code can be refactored with lower risk.
Prereq:
Spike the isolation of payments related processing. http://mingle.mifos.org/projects/mifos/cards/2520
Specific Recommendation 3 - Restructure account_payment, 'trxn' tables
In my opinion the 'trxn' tables contain aspects of both over-normalisation and over-denormalisation.
Over-normalisation: account_txrn has 3 subtypes (customer_trxn_detail, loan_trxn_detail & savings_trxn_detail). This means that 2 rows are created instead of 1 (if no subtypes).
Over-denormalisation: There are a number of fields/relationships on account_txrn that should be on account_payment or are copied from account_payment. This bulks up account_trxn which has impact because so many rows (52million+ at GK Dec 2009) are written to it.
Also, there is a one to many relationship from account_txrn to fee_txrn_detail which is clunky and results in difficulty identifying which financial_trxns were derived from which fees (if more than one fee was written).
I think one table (let's call it account_payment_breakdown) could replace the 'trxn' tables. There would be a need to ensure upgrade compatibility with any redesign as a lot of client specific implementation work is done in this area.
Prereq:
Spike potential improvement by creating a script to convert data from current structure to redesigned structure. http://mingle.mifos.org/projects/mifos/cards/2567