Cloud maintenance
Setup local machine
One time setup to start working with cloud. There are two major tools that need to be setup, the amazon ec2/rds tools and the chef command line client. This guide assumes Ubuntu desktop, lucid release (10.04). Later versions of Ubuntu might work as well.
EC2/RDS command line tools
- http://aws.amazon.com/developertools/2928?_encoding=UTF8&jiveRedirect=1
- http://aws.amazon.com/developertools/351
- http://docs.amazonwebservices.com/AmazonRDS/latest/CommandLineReference/
- shell environment variables, place in your .bashrc so you don't have to set it up repeatedly (keys and credentials are in vault):
export EC2_HOME=$HOME/ec2 export EC2_PRIVATE_KEY=`ls $EC2_HOME/pk-*.pem` export EC2_CERT=`ls $EC2_HOME/cert-*.pem` export EC2_REGION=us-east-1 export AMAZON_ACCESS_KEY_ID=<redacted> export AMAZON_SECRET_ACCESS_KEY=<redacted>
- unzip tools and place bin directory in your path
chef
Chef provides configuration management of machines, starts/stops services when configuration changes, etc.
Setup chef locally, create account, connect with your chef server. We've been using "Opscode console" (they host our chef server).
- http://wiki.opscode.com/display/chef/Package+Installation+on+Debian+and+Ubuntu** just install "chef" package, chef-server not needed since we use opscode's
- http://help.opscode.com/kb/start/2-setting-up-your-user-environment (scroll down to section Create your Chef repository)
- organization key is in vault directory as mifos-validator.pem.cpt (same password as the password vault)
- when setting up your chef env if you lost your client key or want to generate a new one go to this (replace with your username): http://community.opscode.com/users/YOUR_USERNAME/ (be sure to login again if you don't see the "get private key" link)
- create
~/.chef/knife.rb
. Here's a template:# Replace USERNAME, ORGANIZATION with yours current_dir = File.dirname(__FILE__) log_level :info log_location STDOUT node_name "USERNAME" client_key "#{current_dir}/USERNAME.pem" validation_client_name "ORGANIZATION-validator" validation_key "#{current_dir}/ORGANIZATION-validator.pem" chef_server_url "https://api.opscode.com/organizations/ORGANIZATION" cache_type 'BasicFile' cache_options( :path => "#{ENV['HOME']}/.chef/checksums" ) # Customize as necessary. Mifos cookbooks are in the cloud git # repository, under chef/cookbooks. Multiple paths are allowed. cookbook_path ["#{ENV['HOME']}/git/mifos-cloud/chef/cookbooks"]
- Copy the keys and knife configuration you downloaded earlier into
~/.chef
:$ mkdir -p ~/.chef $ cp USERNAME.pem ~/.chef $ cp ORGANIZATION-validator.pem ~/.chef $ cp knife.rb ~/.chef
-
- verify your connectivity:
knife node list
you should see a list of nodes that are currently managed by chef
Knife hints
Cookbooks
List cookbooks that chef server knows about
knife cookbook list
Updating a cookbook
Cookbooks are stored in git in the chef directory of the cloud repository. If you want to update a cookbook, UPDATE/COMMIT/PUSH IN GIT FIRST AND BUMP VERSION NUMBER, before sending it to chef server. Here is a step by step instructions
$ mkdir -p ~/git/mifos-cloud $ git clone git://mifos.git.sourceforge.net/gitroot/mifos/cloud ~/git/mifos-cloud $ cd ~/git/mifos-cloud # update version before doing anything: vi chef/cookbooks/<cookbook>/metadata.rb $ make changes $ knife cookbook upload <cookbook you changed> $ git add/commit/push
How to get the AMI of every node
knife search node "ec2:[* TO *]" -a ec2.ami_id
How to get both AMI and hostname of every node
knife exec -E 'nodes.all {|n| if n.name =~ /^i-/ then printf("%-45s\t%s\n" % [n.override.tomcat.vhost,n.ec2.ami_id]) end }'|sort
How to get whats installed for each MFI
You will need to install amazon-ec2 gem e.g. sudo gem install amazon-ec2
and knife should be working. Invoke maint/state.rb
like so:
knife exec state.rb
How to change Pentaho to run reports and ETL against an RDS replica
If you wish to use the RDS instance for Mifos and Pentaho, ignore this section.
1. Set up RDS replica.
2.
knife role edit mifos_MFI
3. Edit override_attributes.pentaho.mifos_database_replica_host (optionally, adding this setting). "null" means fall back to override_attributes.mifos.database_host, and is the same as omitting override_attributes.pentaho.mifos_database_replica_host (see cookbooks/pentaho/recipes/default.rb in the cloud repo for details).
NOTE: nothing maintains Pentaho's database (ex: "MFISHORTNAME_prod_hib"), so "SourceDB" must be changed manually here.
NOTE: data sources in BIRT reports must be maintained manually, separately.
NOTE: data sources in Jasper Servers must also be maintained manually, separately.
Starting a new mifos/pentaho instance
- if converting an mfi from old infrastruture
- Stop mifos instance
- Dump database
- Copy uploads/config in MIFOS_CONF
- Create security group in EC2 (AWS UI)
- SSH "gateways" setup/info
- allows us to limit points of entry for our hosted machines
- note hosts below in ec2-authorize commands... the gateways are currently birch.mifos.org (the whole Seattle GTC, actually) and cloudboss.mifos.org.
- add to your
.ssh/config
(substituting MFINAME for something meaningful):Host *MFINAME.mifos.org ProxyCommand ssh birch.mifos.org exec /bin/nc %h %p
- one-time setup for EC2 physical firewall
- manually change EC2_ACCOUNT_NUMBER with the 12-or-so digit number fetched from the AWS console
- SSH via gateways only - also images will constantly change so you WILL see the "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED" often. You can verify the key if you want to with ec2-get-console-output or via the ec2 web ui.
- 18980-18981 is for monitoring JMX over RMI via OpenNMS
- Create security group first via web ui or cli
#!/bin/bash # login through web console and create new security group, e.g. green-grameen set -ex SEC_GROUPS="digamber light-microfinance rise keef" EC2_ACCOUNT_NUMBER=000000000000 for SEC_GROUP in ${SEC_GROUPS} do ec2-authorize -P tcp -p 22-22 -s 75.149.167.24/32 ${SEC_GROUP} ec2-authorize -P tcp -p 22-22 -s 10.252.50.116/32 ${SEC_GROUP} ec2-authorize -P tcp -p 22-22 -s 184.72.240.48/32 ${SEC_GROUP} ec2-authorize -P tcp -p 443-443 -s 0.0.0.0/0 ${SEC_GROUP} ec2-authorize -P tcp -p 80-80 -s 0.0.0.0/0 ${SEC_GROUP} ec2-authorize -P tcp -p 18980-18981 -s 10.252.50.116/32 ${SEC_GROUP} ec2-authorize -P tcp -p 18980-18981 -s 184.72.240.48/32 ${SEC_GROUP} ec2-authorize -P udp -p 161-161 -s 10.252.50.116/32 ${SEC_GROUP} ec2-authorize -P udp -p 161-161 -s 184.72.240.48/32 ${SEC_GROUP} ec2-authorize -P tcp -p 161-161 -s 10.252.50.116/32 ${SEC_GROUP} ec2-authorize -P tcp -p 161-161 -s 184.72.240.48/32 ${SEC_GROUP} ec2-authorize -P icmp -t -1:-1 -s 10.252.50.116/32 ${SEC_GROUP} ec2-authorize -P icmp -t -1:-1 -s 184.72.240.48/32 ${SEC_GROUP} ec2-authorize -o ${SEC_GROUP} -u ${EC2_ACCOUNT_NUMBER} ldap done
- Create RDS security group
- Authorize EC2 security group for MFI
- Authorize default EC2 security group temporarily to make importing existing database more straightforward, remove after importing
- Create RDS instances as m1.small initially
- v5.1.50
- enable auto minor version upgrade
- allocate 10GB
- use MFI long name for MySQL instance (ie: "rise", "secdep")
- initial user/pass can be anything simple, this will be changed later
- leave Database Name blank
- Db Parameter Group: "mifoscloud"
- backup retention period: 8 days (best for PITR/binlogs)
- backup window: 1600-1700 UTC (good for India/Philippines/Africa)
- maintenance windows Saturday 1700-1800 UTC
- example:
Engine: mysql Engine Version: 5.1.50 Auto Minor Ver. Upgrade: Yes DB Instance Class: db.m1.small Multi-AZ Deployment: Yes Allocated Storage: 10 DB Instance Identifier: rise Master User Name: mifos Master User Password: mifos Database Name: Database Port: 3306 Availability Zone: Using a Multi-AZ Deployment disables this preference. DB Parameter Group: mifoscloud DB Security Group(s):rise Backup Retention Period: 8 Backup Window: 16:00-17:00 Maintenance Window: Saturday 17:00-Saturday 18:00
- Create chef roles, base + test + prod + optional MFI specific recipe
- look at an existing role
knife role show mifos_digamber
- create a new role
knife role create mifos_rise
knife role create mifos_rise_test
(look at mifos_digamber_prod, mifos_digamber_test for examples)
- look at an existing role
- Create 2 EBS volumes 1 for test and prod each (storing uploads)
for testing:for prod:ec2-create-volume --snapshot snap-5abd2f36 -s 1 -z us-east-1d ec2-create-tags -t Name=testing-digamber <vol-id>
ec2-create-volume --snapshot snap-5abd2f36 -s 1 -z us-east-1a ec2-create-tags -t Name=digamber <vol-id>
- Update DNS if required, this is typically required if this is a new mfi.
- Allocate elastic ip
- nslookup ip and get hostname
- create CNAME in network solutions management console (use CNAME instead of A record so inter/intra-AZ data transfer will be charged at lower rate)
- Create EC2 instances
- get ami-id from hudson job: https://ci.mifos.org/hudson/view/cloud/job/cloud-mifos-image
- latest ami-id for 2.0.2:ami-8a8d7fe3 (see at end: https://ci.mifos.org/hudson/view/cloud/job/cloud-mifos-image/44/console)
ec2-run-instances ami-8a8d7fe3 --instance-type m1.small -z us-east-1d -d '{ "run_list": ["role[ldapclient]", "role[base]" ] }' --disable-api-termination -g rise ec2-create-tags -t Name=testing-rise.mifos.org -t Service=Mifos INSTANCE_ID
- on boot, the node will add itself to the chef server (see rc.local, imaging/create_image.py, cloud source code)
- make sure you can login via SSH. If not, fetch console output (this is something you may have to do from time to time):
ec2-get-console-output INSTANCE_ID
- attach ec2 volume
ec2-attach-volume -i INSTANCE_ID -d /dev/sdc1 vol-ee7e2386
knife node edit INSTANCE_ID.mifos.org
, add "role[mifos_rise_test]" to run list section or do through http://manage.opscode.com- log into box and run
sudo chef-client
to see change immediately or wait 30 minutes or so
- set up Mifos and Pentaho databases
sudo /etc/pentaho/system/mifos_pentaho_init.sql -u mifos -pmifos
- change mifos password via AWS Web UI (Modify RDS instance, put a password generated with, for example, apg, in the "Master User Password" field)
- add backup jobs to BackupPC
- when adding new backup host, use the NEWHOST=COPYHOST syntax mentioned on the "edit hosts" page
- add monitoring of box to OpenNMS
Monitoring systems
- Monitoring server: https://cloudboss.mifos.org/opennms/ (also see Monitoring Servers With OpenNMS)
- overview of what's running where and current status** also see "knife status --run-list"
- outage notifications are sent to the mifos-adm google group
Disaster recovery
Database
Most of persistent data is stored in RDS. This implies the data is highly available as it is replicated synchronously in two availability zones. However, it is certainly not impossible to lose an entire region e.g. due to natural disaster etc. In addition to relying on multi-AZ functionality we also save and encrypt daily full mysqldump to cloudboss (in us-east-1b AZ) here: https://cloudboss.mifos.org/cloud In the event of disaster you would need to download and decrypt the snapshot and create a new RDS instance and follow the instructions that apply when migrating an MFI from the old infrastructure.
Point in time recovery
RDS supports point in time recovery. We configure each RDS instance to store up to 8 days of PITR logs. However this feature in our trial runs can be really slow e.g. 9 hours or so to do a restore. As an alternative you can use the mysqldump snapshots from above.
Front-end
Application server (Tomcat/Jetty). What clients hit.
In the event of AZ being unavailable or hardware failure.
-
-
-
-
-
- Identify which situation by checking if other nodes in the same AZ are available or not.
-
-
-
-
- If hardware failure, simply launch a new instance with the appropriate AMI, add it to chef config, remap elastic ip, mount volumes, etc.
- If an entire AZ is down:
- create new volumes in alternate AZ, retrieve from backuppc the uploads, custom reports etc.
- only use "tar download" restore method, and only of
/etc/mifos/uploads
dir - download tar to local machine, then copy to remote host and untar as user "tomcat6"
- only use "tar download" restore method, and only of
- relaunch each frontend into an alternate AZ, add to chef config, remap elastic ips, mount new volumes etc.
If you manually stop Mifos, for example, during a restore of /etc/mifos/uploads
, Chef will automatically restart it. To temporarily disable this behavior, you can use: sudo service chef-client stop
, then sudo service chef-client start
when you're finished.
Statefiles
Statefiles are lists of specific versions of packages to be included in images. They are kept in the statefiles/ dir in the "cloud" git repo.
- updated from ci periodically (right now * */3 * * *)
- commited/tagged/pushed to "cloud" git repo at sf.net if there is a change (can monitor commit logs) to be notified of a change
- tag has build number and job name
Image maintenance
When upgrading machines, be sure to schedule outages.
Upgrades for new Mifos versions
If a new point release of Mifos comes out (e.g. 2.1.9):
- modify the AMI generation script for that Mifos release to use the new point release version. (We would modify imaging/mifos_2_1_bi_1_2.sh to get the 2.1.x Mifos version along with BI 1.2).
- update the "mifosversion" variable in the script to be 2.1.9 (commit and push)
- re-run the hudson job "cloud-mifos_2_1-bi_1_2-image" to create a new AMI with the updated Mifos war (the name of the new AMI will be in the console log output on of the hudson job).
- follow the groovy script usage below using the new AMI generated in the previous step.
Upgrades for security/features
- Statefiles with lists of latest packages are created periodically (see above).
- An administrator must keep track of security releases in upstream Ubuntu packages.
- A groovy script is available to move a customer from one image to the next (it could be an upgrade or just a security update), this script should only be used if the mfi deployment for the environment already exists, volumes created, elastic ips associated etc. It can be invoked like so:
groovy upgrade.groovy <mfi e.g. rise> <environment testing|prod> <ami id>
It will then prompt you to verify to continue. You should check the AMI path, calculated hostnames, etc. There will be roughly 5 minutes of downtime to change the image (as long as there is not a major mifos upgrade requiring database migrations).
This script requries groovy 1.7 or later. It also needs the following shell variables to be defined:
export AMAZON_ACCESS_KEY_ID=xyz export AMAZON_SECRET_ACCESS_KEY=xyz
- Manual testing to do after changing the image
- Enter collection sheet balance, ensure you can go all the way to the preview state
- Search '%' make sure you see everything before upgrading
- Try rendering a few birt reports
- Try rendering a few pentaho reports
- Notify customer if upgrading mifos version note these links
TODO: how to move a customer to a newer image, what adhoc/manual tests to perform after bouncing their servers, how/when to notify customer.s of the change(s)
When using the upgrade script use the "Long Name" from below for the MFI argument
Long Name |
Short Name |
---|---|
secdep |
sec |
rise |
ris |
light-microfinance |
lmf |
digamber |
dig |
keef |
kee |
Image production
Image production ci jobs are manually kicked off since there is a cost associated with storing Amazon EC2 images. Fire off the cloud-mifos-image job on the ci server to create a new image.
LDAP
See also: /wiki/spaces/MIFOSADMIN/pages/8552788 (login to mifosforge.jira.com required).
Pre-requisites
- Log into a machine already in ldap or ldap.mifos.org
- Create ~/.ldaprc and copy the following lines into it if it is already not there:
TLS_CERT /etc/ssl/certs/ldap.crt TLS_KEY /etc/ssl/private/ldap.key
- The passphrase required when invoking the commands below can be found in vault.
- Admin accounts are added to group id 10000 and will get sudo on all ldap clients, automation accounts e.g. backuppc accounts are added to group id 11000, "regular"/non-admin users are added to 12000
Adding a new user
You can use the following script to help generate the ldif formatted input:
#!/bin/sh # ./addnewuser.sh johndoe John Doe <UID> johndoe@grameenfoundation.org <secret> cat << EOF dn: uid=$1,ou=people,dc=mifos,dc=org objectClass: inetOrgPerson objectClass: posixAccount objectClass: shadowAccount uid: $1 sn: $3 givenName: $2 cn: $2 $3 displayName: $2 $3 uidNumber: $4 gidNumber: 10000 gecos: $2 $3 loginShell: /bin/bash homeDirectory: /home/$1 userPassword: $6 shadowExpire: -1 shadowFlag: 0 shadowWarning: 7 shadowMin: 8 shadowMax: 999999 shadowLastChange: 10877 mail: $5 EOF
then invoke it like:
./addnewuser.sh johndoe John Doe <UID> johndoe@grameenfoundation.org <secret> | ldapadd -x -W -D cn=admin,dc=mifos,dc=org -h ldap.mifos.org -ZZ
if the user "johndoe" exists it will error, however duplicating the UID will NOT generate an error so be sure to make sure that it does not exist. If the command completes successfully than you can log into a ldap client e.g. <mfi>.mifos.org and run getent passwd and you will see the newly added entry.
You will need to add an authorized_key for the user in the chef config:
knife data bag create authorized_keys <user_id> { "id": "<user id>", "key": "ssh-rsa <key content>" }
Deleting a user
If you make a mistake you can delete the entry with the following:
ldapdelete -x -W -D cn=admin,dc=mifos,dc=org -h ldap.mifos.org -ZZ 'uid=johndoe,ou=people,dc=mifos,dc=org'
Also, you should fill in their data bag with an invalid key e.g.:
knife data bag show authorized_keys jbrewster { "id": "jbrewster", "key": "ssh-rsa DISABLED" }
Searching
You can also search the ldap db with the following
ldapsearch -LLL -x -W -D cn=admin,dc=mifos,dc=org -h ldap.mifos.org -ZZ 'cn=*Jeff*'
The last argument 'cn=
Jeff
'
can be adjusted accordingly based on which field you want to search on.
Resetting a password
The following script will generate the ldif formatted data to feed into ldapmodify
#!/bin/sh # ./reset.sh johndoe <secret> cat << EOF dn: uid=$1,ou=people,dc=mifos,dc=org changetype: modify replace: userPassword userPassword: $2 EOF
then you can invoke it like so:
./reset.sh johndoe <THE NEW PASSWORD> | sudo ldapmodify -x -W -D cn=admin,dc=mifos,dc=org -h ldap.mifos.org -ZZ
At this point you'll need to enter the LDAP master password (from the vault).
SECDEP Specific Configuration
Secdep MFI has one additional detail that is not managed via chef. This MFI has a jasper reporting server (reports.mifos.org) that has a reference to the mifos datasources. When changing database hosts, etc. you MUST verify that the jasper reports continue to work. To update a datasources:
1. login to http://reports.mifos.org and click on
2. click on the second drop down in the "Refine" section and select "Data Sources"
3. Right click on the "Secdep Prod Mifos DB" datasource and click "Edit"
4. Update the credentials to the new host.
5. If the database host is remote, it must use ssl in order to connect, this can be enforced with the following url params: verifyServerCertificate=true&useSSL=true&requireSSL=true
6. If after updating the credentials and clicking on "Test Connection" and the connection fails verification than verify with the mysql cli that you can connect e.g. mysql -h <host> -u <user> -p --ssl --ssl-ca=<path to ca cert pem> --ssl-verify-server-cert and verify you can connect that way. If it does succeed most likely the java truststore has not been updated with the ca cert. You can trust it with this command
/home/mifosadmin/jasper_oss/jasperserver-ce-3.7.0/java/bin/keytool -import -file ca-cert.pem -alias ca-cert -keystore mysql-ssl-ca-cert.ts -storepass <store pass>
7. Restart jasper if the trust store has been updated.
sudo service secdepjasper restart
Updating Jasper reports
1. Log into jasper
2. Right click on report, click "Edit"
3. Click next to update JRXML files
4. Browse for reports here: http://mifos.git.sourceforge.net/git/gitweb.cgi?p=mifos/documents;a=tree;f=deployment/SECDEP/Jasper+Reports;h=f80156caba0e3d03900bda404662e99b96711bfd;hb=HEAD
Note: PCFC report needs to select each jrxml file
MySQL/RDS maintenance
Changing database hosts
If for some reason the database coordinates need to be changed than the following steps need to be taken:
- Update role (usually) mifos_<MFI>
- run chef-client on each host to see changes immediately
- update uploaded reports in /etc/mifos/uploads/report with something like: sudo find /etc/mifos/uploads/reports -type f -exec sed -i -e 's/secdep-db.mifos.org/secdep.cz2a1vveusgo.us-east-1.rds.amazonaws.com/g' {} \;
- update pentaho datasources in <mfi_shortname>_<environment>_hib e.g. sec_prod_hib with a query similar to:
for DestinationDB:
update DATASOURCE set URL = 'jdbc:mysql://<HOST>:3306/sec_prod_mifos_dw?useUnicode=true&characterEncoding=UTF-8' where NAME='DestinationDB';
and for SourceDB:
update DATASOURCE set URL = 'jdbc:mysql://<HOST>:3306/sec_prod_mifos?useUnicode=true&characterEncoding=UTF-8' where NAME='SourceDB';
remember to update for both prod and test environments!
Growing a database
Log into RDS console at: https://console.aws.amazon.com/rds/ right-click on instance and click "Modify" and adjust allocated storage appropriately. The database will be down while it resizes so plan accordingly when you decide to increase the size.
Changing master user password
Do this from the AWS console. Make sure you check the box next to "Apply Immediately", or you may have to wait some amount of time (maybe a few minutes) before your changes are applied.
ETL configuration
An ETL job copies data from the Mifos OLTP database to the data warehouse. See these files in the mifos-cloud repository:
chef/cookbooks/pentaho/recipes/default.rb
chef/cookbooks/pentaho/templates/default/data_integration.sh.erb
Adding a new package to be part of image
Update create_mifos_state.sh, add package to add to the PACKAGES variable
TODO
- deploy/migration procedure (from old EC2 setup to new cloud images)
- document
- practice
- new disaster recovery procedure
- document
- practice
- document new system
- architecture
- persistent data
- configuration
- separate setup/initial instructions from specific maintenance procedures
- don't need step: "clone chef repo"
- commit cookbooks into the cloud repo
- DONE 2011-03-14 by Adam
- script to wrap ec2-run-instances
- keep trying to attach volume until succeeds immediately after running "ec2-run-instances"
- nevermind... changed procedure so mifos_X role isn't included
- be able to generate both 2.0 and 2.1 images etc.
- aka branches, parallel development so release maintenance and new development can occur (currently we just have one Mifos 2.0 image, cloud/master can't handle branched development. Statefile probably needs to be maintained outside of the "cloud" repo)
- groovy script to reimage a box (for instance, for a security upgrade)
- do a couple of test restores
- we can't currently pin recipe/role versions in a role run_list
- chef client crashes!
Trash
- Clone chef repo
$ cd ~ $ git clone http://github.com/opscode/chef-repo.git $ mkdir -p ~/chef-repo/.chef