String of nulls in /var/adm/mesages [Solaris]

Prev: Forcing cyclade to use SSH v2
Next: PLS COMMENT: *ONE* zfs safest, foolproof(est)? Re: ZFS: A *nested*zfs down within a higher-up zfs: SNAPSHOT the top one, bot one done too?

From: Mr. Nice Guy on 4 Jan 2010 19:01

In debugging a Solaris 10 SPARC problem, I looked through the /var/adm/
messages file and found a long string of nulls before the usual system
startup line:

^@^@^@^@^@^@Sep 28 14:11:42 c2539lui genunix: [ID 540533 kern.notice]
^MSunOS Release 5.10 Version Generic_120011-14 64-bit

This line roughly corresponds with a log entry in our own software
that indicates a problem so I have to believe it's significant. In a
messages file that spans a dozen reboots, is there any particular
reason why this reboot _only_ would have these spurious characters in
its announcement?

Thanks.

From: Casper H.S. Dik on 5 Jan 2010 07:18

"Mr. Nice Guy" <aaron(a)mcs-partners.com> writes:

>In debugging a Solaris 10 SPARC problem, I looked through the /var/adm/
>messages file and found a long string of nulls before the usual system
>startup line:

>^@^@^@^@^@^@Sep 28 14:11:42 c2539lui genunix: [ID 540533 kern.notice]
>^MSunOS Release 5.10 Version Generic_120011-14 64-bit

>This line roughly corresponds with a log entry in our own software
>that indicates a problem so I have to believe it's significant. In a
>messages file that spans a dozen reboots, is there any particular
>reason why this reboot _only_ would have these spurious characters in
>its announcement?

It's because the fileystem has updated the seek pointer (the inode) but
it hasn't written the data yet. This should only happen on UFS
filesystems. The update to the inode is written to the log or to the
inode; the data is not written to the disk and was clearly never
flushed.

Was the system power-cycled? Normally, on a reboot the data is written.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

From: Mr. Nice Guy on 5 Jan 2010 10:41

On Jan 5, 7:18 am, Casper H.S. Dik <Casper....(a)Sun.COM> wrote:
> "Mr. Nice Guy" <aa...(a)mcs-partners.com> writes:
>
> >In debugging a Solaris 10 SPARC problem, I looked through the /var/adm/
> >messages file and found a long string of nulls before the usual system
> >startup line:
> >^@^@^@^@^@^@Sep 28 14:11:42 c2539lui genunix: [ID 540533 kern.notice]
> >^MSunOS Release 5.10 Version Generic_120011-14 64-bit
> >This line roughly corresponds with a log entry in our own software
> >that indicates a problem so I have to believe it's significant. In a
> >messages file that spans a dozen reboots, is there any particular
> >reason why this reboot _only_ would have these spurious characters in
> >its announcement?
>
> It's because the fileystem has updated the seek pointer (the inode) but
> it hasn't written the data yet. This should only happen on UFS
> filesystems. The update to the inode is written to the log or to the
> inode; the data is not written to the disk and was clearly never
> flushed.
>
> Was the system power-cycled? Normally, on a reboot the data is written.
>
> Casper
> --

This is a UFS file system.

I can see from the messages file that this system was not brought down
cleanly before this restart. It was operational and then the nulls
followed by the "SunOS Release" message. There's a period of about
four hours between the last message and the nulls, so it might have
hung before the user abruptly powered off.

Could these nulls indicate a bigger problem with Solaris, itself? I
ask because the log file of another script we run shortly after boot
also has a string of nulls before writing some data, and these nulls
correlate. So I'm wondering if, on this one particular reboot,
Solaris wasn't quite itself and might have behaved improperly, at
least with respect to the file system.

From: Richard B. Gilbert on 5 Jan 2010 11:03

Mr. Nice Guy wrote:
> On Jan 5, 7:18 am, Casper H.S. Dik <Casper....(a)Sun.COM> wrote:
>> "Mr. Nice Guy" <aa...(a)mcs-partners.com> writes:
>>
>>> In debugging a Solaris 10 SPARC problem, I looked through the /var/adm/
>>> messages file and found a long string of nulls before the usual system
>>> startup line:
>>> ^@^@^@^@^@^@Sep 28 14:11:42 c2539lui genunix: [ID 540533 kern.notice]
>>> ^MSunOS Release 5.10 Version Generic_120011-14 64-bit
>>> This line roughly corresponds with a log entry in our own software
>>> that indicates a problem so I have to believe it's significant. In a
>>> messages file that spans a dozen reboots, is there any particular
>>> reason why this reboot _only_ would have these spurious characters in
>>> its announcement?
>> It's because the fileystem has updated the seek pointer (the inode) but
>> it hasn't written the data yet. This should only happen on UFS
>> filesystems. The update to the inode is written to the log or to the
>> inode; the data is not written to the disk and was clearly never
>> flushed.
>>
>> Was the system power-cycled? Normally, on a reboot the data is written.
>>
>> Casper
>> --
>
> This is a UFS file system.
>
> I can see from the messages file that this system was not brought down
> cleanly before this restart. It was operational and then the nulls
> followed by the "SunOS Release" message. There's a period of about
> four hours between the last message and the nulls, so it might have
> hung before the user abruptly powered off.
>
> Could these nulls indicate a bigger problem with Solaris, itself? I
> ask because the log file of another script we run shortly after boot
> also has a string of nulls before writing some data, and these nulls
> correlate. So I'm wondering if, on this one particular reboot,
> Solaris wasn't quite itself and might have behaved improperly, at
> least with respect to the file system.

If you can identify the user you might try asking him what happened.
If the system hung and no one with the root password was available,
power cycling the machine might have been the only option to try to
straighten thing out! It's sometimes called "the big red switch reset"
after the power switch on an early model of the IBM PC. This sort of
thing SHOULD not happen but it sometimes does happen. The downside is
that you almost always need to fsck a file system or two.

From: Casper H.S. Dik on 5 Jan 2010 12:27

"Mr. Nice Guy" <aaron(a)mcs-partners.com> writes:

>Could these nulls indicate a bigger problem with Solaris, itself? I
>ask because the log file of another script we run shortly after boot
>also has a string of nulls before writing some data, and these nulls
>correlate. So I'm wondering if, on this one particular reboot,
>Solaris wasn't quite itself and might have behaved improperly, at
>least with respect to the file system.

The UFS filesystem valids "meta data" before "file data" and so
it has behaved as it was designed.

However, such behaviour is not correct and it is one of the reasons
why ZFS works differently. (Both the change to the file pointer
and the data are in the same transaction group)

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

|
Pages: 1
Prev: Forcing cyclade to use SSH v2
Next: PLS COMMENT: *ONE* zfs safest, foolproof(est)? Re: ZFS: A *nested*zfs down within a higher-up zfs: SNAPSHOT the top one, bot one done too?