Wednesday, 4 September 2013

Troubleshooting NFS

Common NFS Errors

  • "No such host" - Name of the server is specified incorrectly
  • "No such file or directory" - Either the local or remote file system is specified incorrectly.
  • "No such device" - NFS is not configured into the client's kernel.
  • "NFS server is not responding" message followed by "NFS server xxxx OK" - Server is heavily loaded causing RPC timeouts, or server has crashed.
  • "Stale file handle" - The file is no longer available.
  • "MOUNT_PROG not registered" - rpc.mountd daemon never started up and registered.
  • "Too many levels of remote in path" - Attempting to mount a file system which is already an NFS mounted file system.
  • "Permission denied" - Accessing as root on the client and root is mapped to nobody. Or the user on the client does not have corresponding UID on the server.
  • "No space" - The server is out of space on the file system. 

Troubleshooting

The showmount command may be used to display server-side mount information. The option -a displays all remote mounts showing the name of the client and the directory, separated by a colon. The -d option displays only the names of the directories mounted by the clients. And the -e option displays the list of file systems exported by the server.
     # showmount -a
     All mount points on local host:
     edcert20.ucs.indiana.edu:/home
     edcert21.ucs.indiana.edu:/usr/local

     # showmount -d
     Directories on local host:
     /home
     /usr/local

     # showmount -e
     Export list on local host
     /home           edcert21.ucs.indiana.edu edcert20.ucs.indiana.edu     
     /usr/local      edcert21.ucs.indiana.edu

The df command may be used to display information on the file systems mounted remotely, the mount point and the amount of available space. The -F option may be specified to list only a specified file system type.
     # df -F nfs
     Filesystem                      Type  blocks     use   avail %use  Mounted on
     edcert21.ucs.indiana.edu:/home  nfs    68510   55804   12706  81%  /usr/share/help
BSD systems use -t option to specify the fstype. The output from the df command also varies among the different operating systems. df also resolves the symbolic links and determines the file system mounted at the link's target. For example:
     # ls -l /usr/local/man
     lrwxr-xr-x    1 root     sys            6 Mar 17  1995 /usr/local/man -> catman/
     # df /usr/local/man
     Filesystem              Type  blocks     use   avail %use  Mounted on
     orange:/usr/local        nfs   68510   55804   12706  81%  /usr/local
Use the command nfsstat -s to display NFS activity on the server side. For example:
     # nfsstat -s

     Server RPC:
     calls      badcalls   nullrecv   badlen     xdrcall    duphits    dupage
     50852      0          0          0          0          0          0.00    

     Server NFS:
     calls        badcalls     
     50852        0            
     null         getattr      setattr      root         lookup       readlink  
     1  0%        233  0%      0  0%        0  0%        1041  2%     0  0%  
     read         wrcache      write        create       remove       rename    
     49498 97%    0  0%        0  0%        0  0%        0  0%        0  0% 
     link         symlink      mkdir        rmdir        readdir      fsstat 
     0  0%        0  0%        0  0%        0  0%        75  0%       4  0%     
The output may be interpreted using the following guidelines.
  • badcalls > 0 - RPC requests are being rejected by the server. This could indicate authentication problems caused by having a user in too many groups, attempts to access exported file systems as root, or an improper Secure RPC configuration.
  • nullrecv > 0 - NFS requests are not arriving fast enough to keep all of the nfsd daemons busy. Reduce the number of NFS server daemons until nullrecv is not incremented.
  • symlink > 10% - Clients are making excessive use of symbolic links that are on file systems exported by the server. Replace the symbolic link with a directory, and mount both the underlying file system and the link's target on the client.
  • getattr > 60% - Check for non-default attribute caching (noac mount option) on NFS clients.
On the client side use the command nfsstat -c to display the client statistics. For example:
     # nfsstat -c

     Client RPC:
     calls      badcalls   retrans    badxid     timeout    wait       newcred
     369003     62         1998       43         2053       0          0 

     Client NFS:
     calls        badcalls     nclget       nclsleep     
     368948       0            368948       0            
     null         getattr      setattr      root         lookup       readlink  
     0  0%        51732 14%    680  0%      0  0%        95069 25%    542  0% 
     read         wrcache      write        create       remove       rename 
     210187 56%   0  0%        2259  0%     1117  0%     805  0%      337  0%   
     link         symlink      mkdir        rmdir        readdir      fsstat    
     120  0%      0  0%        7  0%        0  0%        5510  1%     583  0% 
This output may be interpreted using the guidelines given below.
  • timeout > 5% - The client's RPC requests are timing out before the server can answer them, or the requests are not reaching the server. Check badxid to determine the problem.
  • badxid ~ timeout - RPC requests are being handled by the server, but too slowly. Increase timeo parameter value for this mount, or tune the server to reduce the average request service time.
  • badxid ~ 0 - With timeouts greater than 3%, this indicates that packets to and from the server are getting lost on the network. Reduce the read and write block sizes (mount parameters rsize and wsize) for this mount.
  • badxid > 0 - RPC calls on soft-mounted file systems are timing out. If the server is running, and badcalls is growing, then soft mounted file systems should use a larger timeo or retrans value.


Read More on http://www.cs.bgu.ac.il/~arik/usail/network/nfs/tips.html

The major NFS daemons are:


nfsd
nfsd handles client requests from remote systems. Multiple copies of this daemon are usually run so that several requests can be handled simultaneously. However, too many copies of nfsd can increase the demand for CPU time to the point where a drop in performance results. For the best performance the number of copies of nfsd should be set to four.
biod
biod handles block I/O requests for NFS client processes. As with nfsd, several copies are usually run and the number of copies should be set to four.
rpc.mountd
rpc.mountd handles mount requests from remote systems.
rpc.lockd
rpc.lockd manages file locking on NFS client and server machines.
rpc.statd
rpc.statd manages lock crash and recovery services for both client and server systems.
portmap
portmap is not strictly an NFS daemon, although it is required for NFS to function properly. It facilitates the initial connection between local and remote servers. Under Solaris the rpcbind daemon performs the same function.
Installing NFS is fairly simple. Once the software has been installed and NFS capabilities are enabled in the kernel, the daemons need to be started. This can be done with with a script. Some flavors of Unix start these daemons when NFS is installed or upon a reboot following installation of the NFS software. The daemons can also be started from the command line:
# /usr/sbin/biod 4
# /usr/sbin/nfsd 4
# /usr/sbin/rpc.mountd
# /usr/sbin/rpc.statd
# /usr/sbin/rpc.lockd

LUN Misalignments and poor I/O performance in real and virtual worlds

Sunday, 2 December 2012

Error: There are currently no logon servers available to service the logon request

CIFS share not accessible on windows server. 

"Error: There are currently no logon servers available to service the logon request"

 Cause:
There is a problem with Kerberos ticketing breaking the secure channel trust between the storage system and the domain.

Solution:
Perform the following steps to resolve the issue:

Note: These steps are disruptive and will terminate all CIFS sessions.

    [root@myadminhost ~]# rsh myfiler cifs resetdc ad1.mywindomain.com (domain name)

Then check access for share... 

[root@myadminshost1 ~]# rsh myfiler cifs shares

Name         Mount Point                       Description
----         -----------                       -----------
ETC$         /etc                              Remote Administration
                        BUILTIN\Administrators / Full Control
HOME         /vol/vol0/home                    Default Share
                        everyone / Full Control
C$           /                                 Remote Administration
                        BUILTIN\Administrators / Full Control

myshare     /vol/my_vol/my     crmod_sldc
                        AD\Domain Users / Full Control

Note:
if this doesn't help then cifs terminate and then run cifs setup and add Domain controller again.






Thursday, 5 July 2012

NFS Server Not Responding

Jun 16 22:14:16 mytesthost kernel: nfs: server mytestfiler-nas OK
Jun 16 22:15:12 mytesthost kernel: nfs: server mytestfiler-nas not responding, still trying
Jun 16 22:15:30 mytesthost last message repeated 4 times
Jun 16 22:15:35 mytesthost kernel: nfs: server mytestfiler-nas OK

If above  Error comes then please check host health like lod, iostat, free/used/swap memory and number of processes on host for that time. Also check if any network saturation occurred or packet loss happened at that time when message came.

Sunday, 17 June 2012

SnapMirror and SnapVault PUSH or PULL ?

SnapMirror and SnapVault

SnapMirror and SnapVault use TCP port 10566 for data transfer. Network connections are always initiated by the destination system; that is, SnapMirror and SnapVault pull data rather than push data.
Authentication is minimal with both SnapMirror and SnapVault. To restrict inbound TCP connections on port 10566 to a list of authorized hosts or IP addresses, you should configure the snapmirror.access or snapvault.access option. When a connection is established, the destination storage system communicates its host name to the source storage system, which then uses this host name to determine if a transfer is allowed. You should confirm a match between the host name and its IP address. To confirm that the host name and the IP address match, you should set the snapmirror.checkip.enable option to on.
To disable SnapMirror, you should set the snapmirror.enable option to off. To disable SnapVault, you should set the snapvault.enable option to off.

Thursday, 14 June 2012

Disk Statistics

# priv set "advanced;disk shm_stats"

Warning: These advanced commands are potentially dangerous; use
         them only when directed to do so by NetApp
         personnel.
                        Disk   Average    Max    Retry  Timeout  Sense Data
Disk                    State    I/O      I/O    count  count    1      2      3     4     5     9     B     Other
------------------------------------------------------------------------------------------------------------------
1a.10.0                   007      000    0839    000    000    0000    000    000    00    00    00    00    000
1a.10.1                   007      000    0854    000    000    0000    000    000    00    00    00    00    000
1a.10.2                   007      000    1262    000    000    0000    000    000    00    00    00    00    000
1a.10.3                   007      000    0855    000    000    0000    000    000    00    00    00    00    000
1a.10.4                   007      000    0846    000    000    0000    000    000    00    00    00    00    000
1a.10.5                   007      000    0874    000    000    0000    000    000    00    00    00    00    000
1a.10.6                   007      000    0856    000    000    0000    000    000    00    00    00    00    000
1a.10.7                   007      000    0838    000    000    0000    000    000    00    00    00    00    000
1a.10.8                   007      000    0824    000    000    0000    000    000    00    00 


# priv set "advanced;disk_stat 1b.00.18"

Warning: These advanced commands are potentially dangerous; use
         them only when directed to do so by NetApp
         personnel.
1b.00.18: ??? HPrio=0 out=0 queued=0 (14346/0/0)
      max_Q=1 max_b=1 recovered error=0 medium error=0
      not ready count=0 command timeout count=0
  disk queue histogram:
   4790      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0
  Message history:
    DISK_READ:                   12
    DISK_READ_CHAIN:             4778
    DISK_WRITE:                  0
    DISK_WRITE_CHAIN:            0
    DISK_WRITE_CHAIN_VERIFY:     0
    DISK_READ_WITH_CKSUM:        0
    DISK_READ_CHAIN_WITH_CKSUM:  0

Troubleshooting NFS

Common NFS Errors "No such host" - Name of the server is specified incorrectly "No such file or directory" - Either...