===== Fixing various Assistment-related problems. ===== **Note well, as10 and as11 are no longer used for the assistment project's production.** Asssitments production uses the machine as12, primarily, and as2 as a small NFS server of image files. We are attempting to divest of as2 due to its age. --mvoorhis 25 Jan 2014. ==== Mongrels -- restart technique ==== Message-Id: <20090904181823.44508AF1DE@eressea> Subject: procedure for restarting MONGREL on as10/as11 Date: Fri, 4 Sep 2009 14:18:23 -0400 (EDT) Assistment not working? Check MONGREL processes on as10/as11. Check individual machines by going to: * http://as10.cs.wpi.edu:3000/ * http://as11.cs.wpi.edu:3000/ **NOTE WELL: as of early October 2011, this checking technique appears to fail all the time. I have asked the assistment folks what the technique is, to check for positive asisstment-mongrel function. At this point, if http://www.assistments.org/ works, I guess all is well?? --mvoorhis 8 October 2011** If the machine does not respond here, then its mongrel is dead and needs to be restarted. * login to as10/11 * become MONGREL user (command: sudo su - mongrel) * cd /var/www/assistment/current ./stop.sh [check, are the mongrel ruby jobs still there?] if they are still there: kill -9 the mongrel-ruby jobs cd ../shared/pids rm * [i.e., nuke all pid files] cd ../../current ./start.sh Mongrel should be running again; verify by connecting to the machine's port 3000 with a web browser. BOTH as10 and as11 need to be good for a connection to http://assistments.org/ to function correctly (???). Script to restart mongrel shouldn't be concerned about the reason to do the restart; shouldn't be concerned about being clean or pilite with the currently running mongrel processes. Kill them, nuke the PIDS and restart. (This script should ideally make checks to check WHY the server died?) cd /var/www/assistment/current kill -9 `ps auxw | grep '^mongrel.*mongrel.rails.start' | grep -v grep | awk '{print $2}'|xargs` rm -f /var/www/assistment/shared/pids/* sh ./start.sh After this, perhaps a check to see if mongrel started: #!/bin/sh # check to see if we've got some mongrels running. # if there are more htan zero we are happy; # if there are zero we are not happy. num=`ps auxw | grep '^mongrel.*mongrel.rails.start'|wc -l` if [ $num -gt 0 ] then echo "$num mongrels are running; good." else echo "zero mongrels are running; bad." fi ==== as7-11 lockup ==== ==== disks filling ==== /var on as9 filles up pretty regularly, but apparently this doesn't matter? /tmp on as11 appears to be filling up more commonly now (October 2011). I sent an email to the assistments people asking what the fix for this would be, since killing off mongrels and memcache and restarting them did NOT appear to cause http://as11.cs.wpi.edu:3000/ to become available again.