So this morning, when I landed into work, I saw an unexpected set of symptoms from my partially built IBM Business Monitor environment.
First of all, I saw this when I attempted to list my WAS profiles: -
$ /opt/IBM/WebSphere/AppServer/profiles/BAMDMProfile/bin/manageprofiles.sh -list
JVMSHRC226E Error opening shared class cache file
JVMSHRC336E Port layer error code = -300
JVMSHRC337E Platform error message: Read-only file system
JVMJ9VM015W Initialization error for library j9shr26(11): JVMJ9VM009E J9VMDllMain failed
Could not create the Java virtual machine.
Then I saw these errors: -
JVMSHRC336E Port layer error code = -300
JVMSHRC337E Platform error message: Read-only file system
JVMJ9VM015W Initialization error for library j9shr26(11): JVMJ9VM009E J9VMDllMain failed
Could not create the Java virtual machine.
Then I saw these errors: -
<snip>
[3/1/13 15:30:05:202 GMT] 0000001c MMRoutingConf E com.ibm.wbimonitor.lifecycle.routing.MMRoutingConfigFlowDaemon checkForStateChanges CWMLC0274E: An error occurred while trying to determine the state of an MM. This will be retried shortly. The condition was caused by: com.ibm.wbimonitor.persistence.metamodel.spi.MetaModelPersistenceException: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=MONITOR.META_MODEL_UNVERSIONED_T, DRIVER=4.11.69.
[3/1/13 15:30:05:740 GMT] 0000001b LifecycleStop E com.ibm.wbimonitor.lifecycle.LifecycleStopRequestScanTask run() CWMLC0012E: Unexpected exception [com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=MONITOR.META_MODEL_T, DRIVER=4.11.69].
</snip>
[3/1/13 15:30:05:740 GMT] 0000001b LifecycleStop E com.ibm.wbimonitor.lifecycle.LifecycleStopRequestScanTask run() CWMLC0012E: Unexpected exception [com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=MONITOR.META_MODEL_T, DRIVER=4.11.69].
</snip>
in the SystemOut.log file for the Deployment Manager, even though I knew that (a) it was fine on Friday and (b) that DB2 was up-and-running ( validated via the command db2 connect to MONITOR ).
When I went to check the file systems on the box: -
$ mount
/dev/mapper/vglinux-rootlv on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/mapper/vglinux-tmplv on /tmp type ext3 (rw)
/dev/mapper/vglinux-varlv on /var type ext3 (rw)
/dev/dasda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
/dev/mapper/vgapp-optlv on /opt type ext3 (rw)
/dev/mapper/vgapp-homelv on /home type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
LBPM002L:/store/ on /store type nfs (rw,addr=10.222.36.21)
mount: warning: /etc/mtab is not writable (e.g. read-only filesystem).
It's possible that information reported by mount(8) is not
up to date. For actual information about system mount points
check the /proc/mounts file.
When I attempted to change the RWX permissions for /tmp: -
$ chmod 777 /tmp
chmod: changing permissions of `/tmp': Read-only file system
When I attempted to update the locate database ( as root ): -
$ updatedb
When I attempted to update the locate database ( as root ): -
$ updatedb
updatedb: can not open a temporary file for `/var/lib/mlocate/mlocate.db'
At this point, I figured that something serious had occurred; this was validated by messages such as: -
<snip>
At this point, I figured that something serious had occurred; this was validated by messages such as: -
<snip>
EXT3-fs error (device dm-1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229390 in dir #229378
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229384 in dir #229378
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229390 in dir #229378
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229384 in dir #229378
attempt to access beyond end of device
</snip>
Remounting filesystem read-only
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229390 in dir #229378
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229384 in dir #229378
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229390 in dir #229378
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229384 in dir #229378
attempt to access beyond end of device
</snip>
in the kernel ring buffer ( via the dmesg command ).
Then I called the Unix sysadm who realised that there was a much bigger problem with the disks, leading to a rebuild ( of / as /opt and /home appeared to be OK ).
Ah well, one lives and one learns ….
No comments:
Post a Comment