From: Udo Grabowski on
Hello,

we get 'Device busy (error 16)' errors when hammering a powerful server
(X4500) with
about 40 clients reading the same file (not changing) at the same time,
with a frequency of
about 5 in 1 million reads (this is too large for our applications).
This happens in our case
with a Fortran inquire on file existence. The server and the clients run
Solaris SXDE 1/08 (B79b),
the filesystem is ZFS, the clients mount with

vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,mirrormount,rsize=1048576,
wsize=1048576,retrans=5,timeo=3000,
acregmin=1,acregmax=1,acdirmin=1,acdirmax=1

We have not seen these problems with NFSv3. The server and clients have
both a
/etc/default/nfs file with these entries changed:

# Maximum number of concurrent NFS requests.
# Equivalent to last numeric argument on nfsd command line.
NFSD_SERVERS=256

# Set connection queue length for lockd over a connection-oriented
transport.
# Default and minimum value is 32.
LOCKD_LISTEN_BACKLOG=256

# Maximum number of concurrent lockd requests.
# Default is 20.
LOCKD_SERVERS=256

# Determines if the NFS version 4 delegation feature will be enabled
# for the server. If it is enabled, the server will attempt to
# provide delegations to the NFS version 4 client. The default is on.
NFS_SERVER_DELEGATION=off

We need v4, otherwise we would have a hard time constructing large
hierarchical
autofs maps (without exceeding the 4096 character limit !) due to nested
ZFS filesystems.

nfsstat -s shows nothing unusual on the server:
Server rpc:
Connection oriented:
calls badcalls nullrecv badlen xdrcall dupchecks
dupreqs
101095267 0 0 0 0 2514000
1

but the clients show a hell of badcalls, badxids, timeouts,interrupts
and cltoomany:
nfsstat -c
Client rpc:
calls badcalls badxids timeouts newcreds badverfs
timers
10619334 10009 982 9492 0
0 0
cantconn nomem interrupts
0 0 79

Client nfs:
calls badcalls clgets cltoomany
10609528 530 10609732 49

Any ideas what goes wrong or what should be tuned elsewhere ?
From: edcrosbys on
> we get 'Device busy (error 16)' errors when hammering a powerful server
> (X4500) with
> about 40 clients reading the same file (not changing) at the same time,


Are you getting these errors on the client or server?
Do you have the NFS fs mounted, or are you letting automount take care
of it?
Approx. how big is the file?
From: Udo Grabowski on
edcrosbys wrote:
>> we get 'Device busy (error 16)' errors when hammering a powerful server
>> (X4500) with
>> about 40 clients reading the same file (not changing) at the same time,
>>
>
>
> Are you getting these errors on the client or server?
> Do you have the NFS fs mounted, or are you letting automount take care
> of it?
> Approx. how big is the file?
>
Errors are on the client, it's automounted, and a ZFS sub-filesystem
(1.level) of a master
ZFS filesystem (NFSv4 in this build traverses and mounts these
automatically, no need
for extra autofs entries). Probably never gets unmounted since its used
all the time during the
tests. Nothing in the /var/adm/messages not on the server nor on the client.
The file is 250 k large, but the error occurs (if it fails) not on read,
but always on existence
inquiry (which probably is a fstat, I don't know how Sun Studio exactly
implements an
inquire(file,exist=...) ), but certainly while the other clients are
reading the same file.

From: Udo Grabowski on
Udo Grabowski wrote:
> Hello,
>
> we get 'Device busy (error 16)' errors when hammering a powerful
> server (X4500) with
> about 40 clients reading the same file (not changing) at the same
> time, with a frequency of
> about 5 in 1 million reads (this is too large for our applications).
> This happens in our case
> with a Fortran inquire on file existence. The server and the clients
> run Solaris SXDE 1/08 (B79b),
> the filesystem is ZFS, the clients mount with
>
> vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,mirrormount,rsize=1048576,
>
> wsize=1048576,retrans=5,timeo=3000,
> acregmin=1,acregmax=1,acdirmin=1,acdirmax=1
>
> ...
> Any ideas what goes wrong or what should be tuned elsewhere ?
I digged a bit deeper: We can produce this error with one client (U20 M2
dual Opteron)
alone, starting a few (5 to 10) instances of the program. ALL instances
but one fail after
a short time with 'device busy', even much faster than calling from
different clients.
Setting back to NFS v3 makes everything working again.
Since v3 uses only 32k read/write windows, I checked NFS v4 again
setting rsize, wsize
to 32768, and the problem went away (did not check the many client
scenario yet).
Setting to 131072 also gives quick failures. So something is still
dependent on the
old 32k Solaris NFS limit. We would like to use the larger windows,
since that doesn't
stress the server and network (1Gb/s switched fiber) side that much, but
where to
tune ? We already set ncsize=1048576 for better DNLS cache hits (now at
94%),
but there seems to be a harder limit somewhere.

So far all I can recommend is not to use NFSv4 with the default
parameters when
doing real world production.
From: Udo Grabowski on
Udo Grabowski wrote:
> Hello,
>
> we get 'Device busy (error 16)' errors when hammering a powerful
> server (X4500) with
> about 40 clients reading the same file (not changing) at the same
> time, with a frequency of
> about 5 in 1 million reads (this is too large for our applications).
> This happens in our case
> with a Fortran inquire on file existence. The server and the clients
> run Solaris SXDE 1/08 (B79b),
> the filesystem is ZFS, the clients mount with
>
> vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,mirrormount,rsize=1048576,
>
> wsize=1048576,retrans=5,timeo=3000,
> acregmin=1,acregmax=1,acdirmin=1,acdirmax=1
> 10609528 530 10609732 49
> Any ideas what goes wrong or what should be tuned elsewhere ?
> (I hate always answering myself.....)

Looks like we hit a bug here. I mixed up test cases somehow, so I didn't
notice
that there was a second difference in my last reply: The v4 test with
32k windows
took place on a plane filesystem, not on an automatic submount.

Here's the clue:
Regardless of the window sizes, the failure occurs always and only if
loading an
NFS v4 AUTOMATIC submount on the same client with more than one
accessing program.
It does not happen when explicitly mounting the submount via a
hierarchical autofs
table. Although nfsstat -m shows it's mounted with the same parameters,
it looks like if
there something different internally. Maybe an issue with the callback
daemon.
So we are back at problem how to construct a large hierarchical autofs
table without
hitting the 4096 character tablesize limit.....
Solaris is something for the devotees.