Feel free to add questions to this FAQ (and answers if you know them).
When I run "qstat" I only see my jobs, but they're not running yet. Why not?
By default, "qstat" only shows the jobs of the user running it (making it look like the cluster is empty). To see all users' jobs, run "qstat -u '*'".
I have jobs on a particular node that should have finished a while ago. I tried to delete them via "qdel", but they're still there (but in the dr state). Why?
The most likely explanation is that the node running the jobs crashed. Since SGE can't talk to the node, it can't confirm deletion of the jobs. To force SGE to delete the jobs (even with the node down), run "qdel -f $JOBID".
- My jobs fail, and the error logs complain about "missing" files that aren't actually missing. Sometimes they contain weird ^M characters as well. What's going on?
Odds are that you created your job scripts on a Windows machine. Windows and Unix use different characters to mark the end of a line. To fix this, run the command "dos2unix $YOURFILE" on chef and submit the job script again.