HandlingOneMillion

Overview - Put Focus on Peak Time Collection Sheet ThroughputContents::
In peak hours, on the busiest days, GK processes about 950 collection sheets (figure derived from the transaction monitor logs). If we can simulate the GK processing of 3000 collection sheets in an hour (full collection sheet workflow) then that would be strong evidence that GK and Mifos can handle 1 million clients.

GK Situation

Mifos V1.3

Broadband in nearly all branches

No. of Clients (Mid march 2010): ~350k (might be more than that)

Goal

Show that GK can handle a million clients and what needs to happen to get there.

Main Acceptance Measurement

3000 collection sheets can be processed in an hour at GK.

How we might go about this in the Performance Lab

Using V1.3, one collection sheet workflow takes about 9 seconds (server time) at GK with an average concurrent users figure of about 2.33.

Using V1.3 and the 2.33 figure, establish how long a collection sheet workflow takes in the performance lab so results can be adjusted.

With a database of a million and 7 concurrent users how much collection sheet throughput can we get in a GK hour with each of our versions (1.3, 1.4, Gazelle C, Shamim D).

Then review what extra would need to be done (if anything) to reach 3000 with whatever comfort level we felt we could get.

Note 7 = 3 x 2.33 approximating 3 x 350K clients

The End

That's the basic idea. The remainder is just answering some of my own questions and notes.

Server Response at Peak

In peak hours, on the busiest days, when GK processes about 950 collection sheets:

Average server response is 0.6 second.

Average Collection Sheet get (continue) is: 3.5 seconds

Average Collection Sheet create (submit) is: 3.7 seconds

GK Operations - V1.3 Workflow

Kay referenced http://confluence.mifos.org/pages/viewpageattachments.action?pageId=31326482

Amongst other things this document mentions that GK use a reporting database (doesn't appear to be through replication). If this is so, I imagine that the database is on the same machine/server as the mifos database (although it might be accessed through the same Tomcat).

Also, it seems that some of the current workflow won't be necessary after upgrading to V1.4 (fix for collection sheet bug causing partials/duplicates).

Maybe then GK won't have to cram as many collection sheets into a few hours (and that would automatically give them the scope for handling more clients).

Suggestion added 15th April 2010: Trickle processing SMS For Collection Sheets

What about the other processes?

At peak hours on the busiest days, collection sheet processing accounts for about 45% of server processing. So, collection sheet throughput is a good enough indicator but not the whole story.

On other days and times much less or no collection sheet processing occurs... but that's okay.

Moving reporting to a replicated machine should improve through-put by up to 33%. So, if we had a figure of 3500 for Shamim D release, it would still be a bit tight so we would feel far more comfortable if reporting was moved.

And there's nothing stopping us helping GK improve report speed even on a replicated machine.

Collection Sheet throughput seems to be a good indicator but that doesn't mean there won't be other pages that are slow. However, these should be localized problems that can be individually 'fixed' rather than a 'systemic' scalability issue. Of course, fixing and tuning will likely help overall and collection sheet throughput. It's also possible that at certain times other processes will be more critical than collection sheet e.g. client creation.

What about non-mifos changes?

Hardware, Tomcat, MySQL and other changes will impact. For example, upgrade to MySQL 5.1, gzip compression, improved browser caching and moving reports to a replicated machine. The results on server throughput will be seen in the transaction logs.

What About Batch?

At the moment the batch tasks ordinarily run to 4.30/5am (but can run longer). With 1 million clients they will probably run too long for V1.3 and 1.4. However, Gazelle C (v1.5.x) should stabilize until we can remove some of them.

It would be important to confirm in the performance lab that Gazelle C will run the batch tasks for 1 million clients within about a 6 hour batch window.

Note: adding holidays will probably cause a batch problem until Shamim D release

From keith Wed Mar 31 04:34:37 -0700 2010 From: keith Date: Wed, 31 Mar 2010 04:34:37 -0700 Subject: What about pre-caching collection sheets? Message-ID: <20100331043437-0700@www.mifos.org>

Can collection sheet reports be generated ahead of the time that a printed report is needed, and saved until printed? When a user requests the report, retrieval of the sheet data over broadband would be nearly instantaneous. This would only work if data for the sheet has been entered or generated well before being printed, but I don't know enough about GK's workflow to know.

KeithW Comments

I would mention these observations are based on 'actual GK data' from 'request/transation logging on GK server' we know the core areas that take up the large % of performance

  • Collection Sheet
  • and Reporting

Thus I agree with concentrating/prioritising on getting the performance lab to come up with figures around what each of the versions (1.3+) are capable of doing for collection sheets and 'marrying' these with the 'actual' figures we are seeing on GK's server.

  • It should be stressed that the data these tests are run against is very important. it should mimic GK usage of mifos i.e. no holidays, lots of account payments + gl transactions (basically all the joined tables) as well as the number of accounts/customers per collection sheet or center etc.

Reporting as you say can be partly solved by moving to a replicated database on another server OR by getting our hands on GK's reports and seeing if improvements can be made there.