...
What are the criteria for determining when to upgrade hardware, etc. What is the process for upgrading server hardware and software?
Disaster recovery and contingency planning and testing.
Problems occur in production, and you must be prepared for them. Review this checklist make sure you have considered, DOCUMENTED, and TESTED the response to problems occur.
Testing the Process
Testing should include:
- Letting database backup scripts run and perform a full restore. Note the amount of time the process takes and what steps could have been improved.
- Verifying O/S backup restore.
- Verifying all phone numbers on contact lists
- Noting the amount of time taken from outage start to full recovery.
Documenting the Process
- Compile list of emergency contact information, including vendors and contacts at the MFI (Who should be called in the event of an outage?)
- Instruction for restoring a system should be detailed and clear. Try to write them for a new user.
- Make sure a procedure exists for updating existing disaster recovery documentation as new scripts are added/removed. ( setting up a monthly email reminder to make these updates is often a good idea).
- Good versus bad examples <we will document some examples of good versus bad documentation here in the future>
Problems and Responses
Types of problems that can occur, and how to prepare for them:
- Natural disasters (e.g. flooding of main office or branch offices), fires, and political emergencies
- OFF SITE DATA STORAGE: make sure data is stored at a location that is not the head office only for BOTH the database and the OS. <link>
- Well-documented procedures for recovery steps <link>
- Contact list <link to template>
- Security breach of Mifos server or Act of sabotage by staff
- What are the processes for immediately changing passwords. Are they documented?
- What needs to be evaluated for your organization(check accounts, database evaluation)?
- Failure or loss of Mifos server, database and/or server disk storage
- Make certain scripts are running for database backups.
- Make certain scripts are running for OS backups
- Make certain backup and recovery procedures are DOCUMENTED IN DETAIL
- Restoring the OS
- Restoring the database
- Mifos configuration settings
- Any custom scripts that may be running
- Verifcation test plan (10-12 trials to make to ensure system is functioning properly and stable)
- Make certain all scripts are stored in a documented location and include instructions for recreating the production setup
- Loss of Internet access/power at main office or branch office
- What procedures need to be in place for working around the problem at a branch? Document and test.
- What procedures need to be in place at the Head Office in the event of extended power outage?
- Loss of key staff members
- Staff turnover is a fact of running an organization. Make sure all procedures, instructions, and important details are documented and available to newer staff members who may be forced to troubleshoot or process a recovery.
Documentation
All system administration and maintenance processes should be documented
Hiring/staffing to support this function
Everyone should understand who is responsible for each function in this area. Ensure that each area is covered by the existing staff or new staff is hired/outsourced.
Troubleshooting and end-user support
- Processes for end user support and issue resolution
- Hiring/staffing to support this function
...
- Process for determining requirements for new reports <TO UPDATE>
- Process for having new reports added into Mifos