Mission Statement Generator

Quite a few years ago the Dilbert website had a brilliant mission statement generator up, which was here: dilbert.com/comics/dilbert/games/career/bin/ms.cgi.

Unfortunately, for reasons that were not made clear it is no longer on the Dilbert website. So, I decided to write my own mission statement generator for fun as a side project. The result of this project can be found here: cmorse.org/missiongen

Please leave a comment if you have any problems or find a bug with it.

 

Installing PDSH on RedHat

Download PDSH src RPM from http://sourceforge.net/projects/pdsh/.

Install the dependencies:

yum install elfutils libtermcap-devel ncurses-devel pam-devel-readline-devel rpm-build

Install the sources:

rpm -ivh pdsh-RELEASE.src.rpm

Modify /usr/src/redhat/SPECS/pdsh.spec to make SSH the default shell by removing “readline”:

%define _defaults ssh exec pam

Compile the RPM binaries from the spec:

rpmbuild -v -bb /usr/src/redhat/SPECS/pdsh.spec

Install the RPM binaries:

rpm -iv /usr/src/redhat/RPMS/x86_64/pdsh-rcmd-exec-RELEASE-1.x86_64.rpm
rpm -iv /usr/src/redhat/RPMS/x86_64/pdsh-rcmd-ssh-RELEASE-1.x86_64.rpm
rpm -iv /usr/src/redhat/RPMS/x86_64/pdsh-RELEASE-1.x86_64.rpm

Server 2008 Hangs at “Applying Computer Settings”

Update: Microsoft has released a hotfix for this issue, see: http://support.microsoft.com/kb/2379016

Last week I Windows Server 2008 machine that was taking a very long time to get past the “Applying Computer Settings” screen; Once it finally did get past that screen logging in would take a very long time and things in general were very slow with many applications failing to launch altogether.

In the Application log there were 3 Warning messages from Winlogon that would show up with every boot:

Event ID 6005: The winlogon notification subscriber <GPClient> is taking long time to handle the notification event (Logon).
Event ID 6006: The winlogon notification subscriber <GPClient> took 3599 second(s) to handle the notification event (CreateSession).
Event ID 6005: The winlogon notification subscriber <GPClient> is taking long time to handle the notification event (CreateSession).

Since these warning messages pointed to a potential issue with Group Policy I enabled group policy logging. There were tons of error messages, but none of them proved to be particularly useful. These messages were repeated throughout the log file:

GPSVC(6d0.704) 10:54:06:941 Client_InitialRegisterForNotification: User = machine, changenumber = 0
GPSVC(6d0.704) 10:54:06:941 Client_RegisterForNotification: CheckRegisterForNotification returned error 0x6d9
GPSVC(6d0.704) 10:54:06:941 CGPNotify::RegisterForNotification: Service not RUNNING. waiting
GPSVC(6d0.704) 10:54:06:941 CGPNotify::RegisterForNotification: Trying to recover from error 1753
GPSVC(6d0.704) 10:54:06:941 CGPNotify::RegisterNotificationAsynchronously: Starting async registration
GPSVC(6d0.704) 10:54:06:941 CGPNotify::RegisterNotificationAsynchronously: Created thread 1804
GPSVC(6d0.70c) 10:54:06:941 CGPNotify::RegisterNotificationAsynchronously: Waiting for service to start
GPSVC(6d0.704) 10:54:06:941 CGPNotify::RegisterNotificationAsynchronously: Exiting with status = 0
GPSVC(6d0.704) 10:54:06:941 CGPNotify::RegisterForNotification: Exiting with status = 0
GPSVC(264.2f8) 10:57:09:087 CGPNotify::WaitForServiceChangeAndRegister: Failed to wait for svc change with 258
GPSVC(30c.320) 10:57:09:352 CGPNotify::WaitForServiceChangeAndRegister: Failed to wait for svc change with 258
GPSVC(6d0.70c) 11:04:06:948 CGPNotify::WaitForServiceChangeAndRegister: Failed to wait for svc change with 258

After troubleshooting for quite some time I found that the issue had nothing to do with Group Policy. The problem turned out to be caused by a lock on the Service Control Manager (SCM) database.  To validate that this was the issue I ran “sc querylock” which gave this output:

C:windowssystem32>sc querylock
[SC] QueryServiceLockstatus - Success
        IsLocked : True
        LockOwner : .NT Service Control Manager
        LockDuration : 3751 (seconds since acquired)

This message indicates that there is a deadlock between the Service Control Manager and http.sys. To resolve this issue simply add the following registry key and reboot the system.

In the HKLMSystemCurrentControlSetServicesHTTP create the REG_MULTI_SZ value named DependOnService and set the value to CRYPTSVC

Related Microsoft KB article: http://support.microsoft.com/kb/2004121

Normal output from “sc querylock”

C:windowssystem32>sc querylock
[SC] QueryServiceLockstatus - Success
        IsLocked : False
        LockOwner :
        LockDuration : 0 (seconds since acquired)

Nodes Not Setting DNS When Using /etc/hosts in Perceus

While moving a cluster from assigning node IP’s via DHCP to static assignment I had problems with the Nodes not properly setting the DNS server in /etc/resolve.conf when using /etc/hosts in Perceus to assign IP addresses. The nodes were defaulting to the values that were already in the vnfs capsule instead of updating them. This caused very slow ssh login times because the DNS server that was in the vnfs capsule /etc/resolve.conf file did not exist.

I wasn’t able to figure out how to fix this issue, so for now I just updated the resolve.conf file in the vnfs capsule.

For some background information on what led to this problem please see this old post: [intlink id=”87″ type=”post”]old post[/intlink].

Nodes not Setting Hostname When Using /etc/hosts for Static IP’s in Perceus

Recently I’ve been working on moving a cluster from assigning node IP’s with DHCP to statically defined IP’s, in order to work around Torque/Moab not starting when it is unable to resolve the name of every node.

To do this, I entered all of the relevant information into the /etc/hosts file. But, after doing this and rebooting the nodes they were no longer automatically setting their hostname which had previously been retrieved from the DHCP server. Instead it would look like the following after logging in:

[root@localhost ~]#

This can be solved by enabling the Perceus hostname module.

perceus module activate hostname

After enabling this it should like the following when logging into the nodes:

[root@n0000 ~]#

Problem with Perceus dhcpd import script

After a couple days of banging my head against the wall trying to figure out why the import script kept giving this cryptic error, we finally submitted a question to the Perceus mailing list.

Undefined subroutine &main::add_node called at ./import-isc-dhcpd.pl line 46,  line 8.

I’m sure if I knew Perl that error wouldn’t have been so confusing.

Here’s a diff for anyone who’s interested.

43c43
<    if ( $_ =~ /^s*hosts+([^s]+)s*{s*$/ ) {
---
>    if ( $_ =~ /^s*hosts+([^s]+)s*{?s*$/ ) {
46c46,47
<       &add_node($1, $hostname);
---
> print "Adding: $1, $hostnamen";
>       &node_add($1, $hostname);
51a53
>

New Nodes failing to get DHCP IP after booting in Perceus

As of Perceus 1.4 nodes will no longer automatically get an ip via DHCP. You must first enable and configure the ipaddr module in Perceus.

perceus module activate ipaddr

Then, edit the ipaddr config file in /etc/perceus/modules/ipaddr. Uncommenting the last line seems to be more than sufficient for most configurations. If your machines do not have their second ethernet card plugged in it is worth removing the eth1 portion as this will significantly reduce boot times.

* eth0:[default]/[default] eth1:[default]/[default]/[default]

Reload the Perceus service, and then restart the nodes and they should automatically get a new ipaddress.

Perceus “ERROR No such host: binsh”

I’ve been working a lot with Perceus at work, and I figured I would put up some posts about problems I have encountered, and possible solutions to the problems.

Today I was attempting to boot a node with Perceus 1.3.8 installed. The node would download and run the first kernel, but when it attempted to begin provisioning with provisiond it would exit with this error:

ERROR No such host: binsh

The node would then infinitely loop through the following while loop which printed the error every second right after running provisiond:

# Excerpt from: https://perceus.org/svn/perceus/1.3/scripts/initramfs/init

while [ ! -f "/next" ]; do
	# If this works we wont even get a chance to say goodbye!
	# If it errors out, we need to touch /next to
	# iterate to next count and/or interface.
	if [ $INIT_DEBUG -eq 0 ]; then
	   provisiond -s /bin/sh $MASTERIP init || touch /next
	elif [ $INIT_DEBUG -eq 1 ]; then
	   provisiond -v -s /bin/sh $MASTERIP init || touch /next
	else
	   provisiond -d -s /bin/sh $MASTERIP init || touch /next
	fi
	sleep 1
done

I was able to find a reference to error message in the source code for provisiond. Initially I thought that the node was passing “/bin/sh” to the server instead of the master’s IP address, but after trying various things with the command line parameters I decided to look elsewhere.

Eventually I noted that provisiond was running as a service on the head node, but provisiond should only run on provisioned nodes. I tried uninstalling provisiond from the head node which seemed to fix the problem. Unfortunately I tried a couple other ideas at the same time so I cannot be absolutely sure that provisiond was causing the problem.

If I get a chance I will do a more thorough test to make sure that I am correct.

edit: Never got a chance to test if this worked correctly. If anyone was able to test this situation I would be interested in hearing about it.