Site Tools


Hotfix release available: 2024-02-06a "Kaos". upgrade now! [55.1] (what's this?)
New release available: 2024-02-06 "Kaos". upgrade now! [55] (what's this?)
assistments_troubleshooter

Note well, as10 and as11 are no longer used for the assistment project's production.

Asssitments production uses the machine as12, primarily, and as2 as a small NFS server of image files. We are attempting to divest of as2 due to its age.

–mvoorhis 25 Jan 2014.

Mongrels -- restart technique

Message-Id: <20090904181823.44508AF1DE@eressea>
Subject: procedure for restarting MONGREL on as10/as11
Date: Fri,  4 Sep 2009 14:18:23 -0400 (EDT)

Assistment not working? Check MONGREL processes on as10/as11.

Check individual machines by going to:

NOTE WELL: as of early October 2011, this checking technique appears to fail all the time. I have asked the assistment folks what the technique is, to check for positive asisstment-mongrel function. At this point, if http://www.assistments.org/ works, I guess all is well?? –mvoorhis 8 October 2011

If the machine does not respond here, then its mongrel is dead and needs to be restarted.

  • login to as10/11
  • become MONGREL user (command: sudo su - mongrel)
  • cd /var/www/assistment/current
./stop.sh

[check, are the mongrel ruby jobs still there?] if they are still there:

kill -9 the mongrel-ruby jobs
cd ../shared/pids
rm * [i.e., nuke all pid files]
cd ../../current
./start.sh

Mongrel should be running again; verify by connecting to the machine's port 3000 with a web browser.

BOTH as10 and as11 need to be good for a connection to http://assistments.org/ to function correctly (???).

Script to restart mongrel shouldn't be concerned about the reason to do the restart; shouldn't be concerned about being clean or pilite with the currently running mongrel processes. Kill them, nuke the PIDS and restart.

(This script should ideally make checks to check WHY the server died?)

cd /var/www/assistment/current
kill -9 `ps auxw | grep '^mongrel.*mongrel.rails.start' | grep -v grep | awk '{print $2}'|xargs`
rm -f /var/www/assistment/shared/pids/*
sh ./start.sh

After this, perhaps a check to see if mongrel started:

#!/bin/sh
# check to see if we've got some mongrels running.
# if there are more htan zero we are happy;
# if there are zero we are not happy.
num=`ps auxw | grep '^mongrel.*mongrel.rails.start'|wc -l`
if [ $num -gt 0 ]
then
 echo "$num mongrels are running; good."
else
 echo "zero mongrels are running; bad."
fi

as7-11 lockup

disks filling

/var on as9 filles up pretty regularly, but apparently this doesn't matter?

/tmp on as11 appears to be filling up more commonly now (October 2011). I sent an email to the assistments people asking what the fix for this would be, since killing off mongrels and memcache and restarting them did NOT appear to cause http://as11.cs.wpi.edu:3000/ to become available again.

assistments_troubleshooter.txt · Last modified: 2014/01/25 14:01 by mvoorhis