Sunday, 27 May 2012

Upgrading Netapp Filer


Upgrading Data ONTAP vary depending on the model of NetApp controller and the version of ONTAP and System Disk, Shelf and RLM firmware currently running on the system. This procedure attempts to address all supported models and configurations. In summary,  everything falls into one of the following:
                - Models: FAS9x0, FAS3050, FAS3070, FAS3170, FAS60x0
                - Data ONTAP: is some variant of a 7.2x or 7.3x release
The goal of this procedure is to provide steps for upgrading to:
                - Data ONTAP:  v7.3.6P2
                - System Firmware: FAS9x0=v4.3.1, FAS3050=v3.1, FAS3070=v2.5, FAS3170=v4.4, FAS32x0=v5.1.1, FAS60x0=v1.9
                - Shelf Modules: ESH2=v20, ESH4=v14, AT-FCX=v38, IOM3=v131
                - RLM: v4.0 (requires upgrade to v3.1 or v3.1p1 first before upgrading to v4.0)
                - Disks: (various, is best to install the current all.zip to get the latest of all disk firmware)

It has been confirmed that the Data ONTAP 736p2 installation package files contains all the latest System Firmware and Shelf Module versions listed above. This reduces the number of files which must be downloaded to three at most (ONTAP, Disk and RLM).  NOTE: One exception to this is FAS32x0 systems whose latest System Firmware is 5.1.1 and the 736p2 kit includes only v5.1. For FAS32x0 systems, (amend this procedure to download and update the flash card with 5.1.1. before the reboot.
________________________________________
BEFORE THE OUTAGE WINDOW:

Note: confirm aggregates have 3-5% free:
                    On each head: > df -Ah
Downloading software from netApp NOW site to /etc/software:
On each head: pre-load the following files into vol0:/etc/software

•             Data ONTAP (one of):
736P2_setup_e.exe    - pc_elf - use for FAS3020 or FAS3050 only
736P2_setup_i.exe     - intel - use for FAS960 or FAS980 only
736P2_setup_q.exe    - x86-64 - use for FAS3070, FAS3140, FAS3170,  FAS6070, FAS6080 only

•             all.zip  ... contains all disk fw – Is not needed if recently upgraded and no updates since but can't hurt to install latest all.zip

•             RLM_FW.zip - 3.1.0P1 – not needed if FAS980 or if already running RLM 4.0 - download and name file: RLM_FW_3.1.0P1.zip (1)

•             RLM_FW.zip - 4.0.0 – not needed if  FAS980 or if already running RLM 4.0 - download and name file: RLM_FW_4.0.0.zip (1)

NOW site URL links for above files:
•             Data ONTAP (depending on model per above): NOTE: you may have to cut/paste these links into your browser
http://now.netapp.com/NOW/download/software/ontap/7.3.6P2/pc_elf/736P2_setup_e.exe
http://now.netapp.com/NOW/download/software/ontap/7.3.6P2/intel/736P2_setup_i.exe
http://now.netapp.com/NOW/download/software/ontap/7.3.6P2/x86-64/736P2_setup_q.exe

•             Checksums for Data ONTAP files (should check this after the file resides in vol0/etc/software):
736P2_setup_e.exe - 61d719469ede61a9c7ac6acbeff4fced
736P2_setup_i.exe - bf7346995b77dea54075fce47b002437
736P2_setup_q.exe - 3992396d9c60468137ebafbc078c7371

•             Disk and RLM Firmware: NOTE: you may have to cut/paste these links into your browser
http://now.netapp.com/NOW/cgi-bin/diskfwmustread.cgi/download/tools/diskfw/bin/all … scroll to bottom [Download .zip]
http://now.netapp.com/NOW/cgi-bin/rlmblic.cgi/download/tools/rlm_fw/4.0.0/ontap_cli.shtml
http://now.netapp.com/NOW/cgi-bin/rlmblic.cgi/download/tools/rlm_fw/3.1.0P1/ontap_cli.shtml  … see NOTE(1) above


________________________________________

NOTE(1): RLM 4.0 is the goal and requires upgrading to RLM 3.1.0P1 first, so both 3.1.0p1 and 4.0 versions may be needed
________________________________________
BEFORE THE OUTAGE WINDOW:

NOTE: These steps can be performed while connected to the controller via ssh/rsh/telnet. Use a terminal tool (i.e.; putty) which records the session to a log file in the event NetApp support assistance is required.

on each head: > options autosupport.doit before-upgrade

On each head: > software update 736P2_setup_x.exe -r     ...  (where x is e, i, m or q per above)   ( TIME: 10 to 30 minutes )
                The system prints a series of dots "........" while writing the kernel and diagnostics to the flash card and disks
                Use version -b command to confirm that flash card holds 736P2 as the primary kernel

hostname> version -b
1:/x86_elf/kernel/primary.krn: OS 7.3.6P2  <-- correct
...
OPTIONAL: IF any aggregates on the system are RAID4 THEN do the following:

On each head: > software install all.zip    (This ensures the latest disk FW is on the system during the halt/boot process. During boot the systems will upgrade any RAID4 aggregates so this ensures the latest version is unpacked at boot time)
      
OPTIONAL: IF RLM is not at v3.1 or v3.1P1 (see 'rlm status') THEN do the following:

                        On each head: > software install RLM_FW_3.1.0P1.zip
                        On each head: > rlm update    ... ( TIME: 20 to 30 minutes )
                        On each head:  Do you want to reboot the RLM now? (y/n) :? y
                        On each head: > rlm status    ... check/repeat that it has rebooted to Online and is running v3.1p1

       OPTIONAL: IF RLM is not at v4.0 (see 'rlm status') THEN do the following:
                       
On each head: > software install RLM_FW_4.0.0.zip
                        On each head: > rlm update    ... ( TIME: 20 to 30 minutes )
                        On each head:  Do you want to reboot the RLM now? (y/n) :? y
                        On each head: > rlm status    ... check/repeat that it has rebooted to Online and is running v4.0
________________________________________
DURING OUTAGE WINDOW:

NOTE: These steps should only be performed while connected to the serial console (directly or via term server) or via an RLM ssh terminal connection. Use a terminal tool (i.e.; putty) which records the session to a log file in the event NetApp support assistance is required.
      
OPTIONAL: IF AT-FCX Shelf Module firmware needs upgraded THEN do that now (see steps above) ... ( TIME: 10 to 20 minutes )

          On one head: > cf disable
       On each head: > halt -f

       OPTIONAL: IF not a FAS9x0 THEN this step is likely needed (all FAS9x0 are all already up-to-date)
                        On each head: at the Ok>, CFE>, -or- LOADER>

 prompt > update_flash   

                The system prints a series of =+=+=+=+=+=+=+=+=+=+=+=+=+=+ while updating system firmware

       On each head: at Ok/CFE/LOADER prompt > bye

       On each head: > watch boot and  look for errors

       OPTIONAL: IF prompts about "low NVRAM battery" appear THEN override by ctrl-C and y:

WARNING:  The battery voltage is too low to hold data ...etc...
To override this delay, press CTRL-C.   <control-C>
CAUTION: Using this appliance without NVRAM ...etc...
Are you sure you want to continue (y or n)? y

       OPTIONAL: IF shelf modules are still down-rev THEN one head may stall while booting up and update the shelves:
<d>[<h>: sfu.controllerElementsPerShelf:info]: [storage download shelf]: 2 ES controller elements can be updated on 0b.shelf1
<d>[<h>: sfu.controllerElementsPerShelf:info]: [storage download shelf]: 2 ES controller elements can be updated on 0b.shelf2
<d>[<h>: sfu.controllerElementsPerShelf:info]: [storage download shelf]: 2 ES controller elements can be updated on 0b.shelf3
...

       On each head: <login with admin user/password>
       On each head: > version    ... (to confirm version of Data ONTAP is 7.3.4P4)
          On one head: > cf enable    ... (to re-enable cluster failover)
       On each head: > aggr status ;  vol status    ... (to confirm everything is online)

       OPTIONAL: IF shelf firmware is still not up-to-date for some reason (check with sysconfig -a) THEN perform:

    On one head: > priv set advanced
   On one head: *> storage download shelf    ... ( TIME: 10 to 20 minutes )
<d>[<h>: sfu.controllerElementsPerShelf:info]: [storage download shelf]: 2 ES controller elements can be updated on 0b.shelf1
<d>[<h>: sfu.controllerElementsPerShelf:info]: [storage download shelf]: 2 ES controller elements can be updated on 0b.shelf2
<d>[<h>: sfu.controllerElementsPerShelf:info]: [storage download shelf]: 2 ES controller elements can be updated on 0b.shelf3
...
NOTE: The dual modules in each shelf are downloaded in two waves, each taking about 5 to 10 minutes
Ends with the message:  <date> [<hostname>: sfu.downloadSummary:info]: Shelf firmware updated on ## shelves.

    On one head: *> sysconfig -a    ... (look for all Modules to be current ESH2=v20, ESH4=v14, AT-FCX=v38)
For example: ... Look at the end of the list of disks on each loop at the Shelf 1,2,3,4,etc lines for proper version
    ... NETAPP   X275_S15K4146F15 NA01 136.0GB 520B/sect (3KN0AC6J000075505WA8)
93  : NETAPP   X275_S15K4146F15 NA01 136.0GB 520B/sect (3KN0CPBB00007601CF6L)
Shelf 1: ESH2  Firmware rev. ESH A: 20  ESH B: 20
Shelf 2: ESH2  Firmware rev. ESH A: 20  ESH B: 20  <-- both modules running v20
Shelf 3: ESH2  Firmware rev. ...

   OPTIONAL: IF not all modules were updated THEN do the following on the other head

   On one head:  > priv set advanced
   On one head:  *> storage download shelf    ... ( TIME: 10 to 20 minutes )
NOTE: The dual modules in each shelf are downloaded in two waves, each taking about 5 to 10 minutes
Ends with the message:  <date> [<hostname>: sfu.downloadSummary:info]: Shelf firmware updated on ## shelves.

   On one head: > cf status    ... (to confirm cluster failover status is enabled, sometimes it takes a few minutes to fully enable)

       At this point the disruptive part of the upgrade is over, inform users

               
________________________________________
AFTER THE OUTAGE WINDOW:
NOTE: These steps can be performed while connected to the controller via ssh/rsh/telnet. Use a tool which records the session to a log file in the event NetApp support assistance is required.

       OPTIONAL: IF the system is licensed for CIFS (check via license command) then do the following per CSB-1201-02
                      On each head: > options cifs.rpcfd.timeout -1   

        OPTIONAL: IF not done before the upgrade THEN do the following now:
                      On each head: > software install all.zip    (after this, any down-rev disks may begin being upgraded in background)

        OPTIONAL: IF RLM is not at v3.1 or v3.1P1 (see 'rlm status') THEN do the following:
                        On each head: > software install RLM_FW_3.1.0P1.zip
                        On each head: > rlm update    ... ( TIME: 20 to 30 minutes )
                        On each head:  Do you want to reboot the RLM now? (y/n) :? y
                        On each head: > rlm status    ... check/repeat that it has rebooted to Online and is running v3.1p1

       OPTIONAL: IF RLM is not at v4.0 (see 'rlm status') THEN do the following:
                        On each head: > software install RLM_FW_4.0.0.zip
                        On each head: > rlm update    ... ( TIME: 20 to 30 minutes )
                        On each head:  Do you want to reboot the RLM now? (y/n) :? y
                        On each head: > rlm status    ... check/repeat that it has rebooted to Online and is running v4.0

        OPTIONAL: IF there is no backup kernel on flash ( check with version -b as shown below ) ( per TSB-1011-01 )

hostname> version -b
1:/x86_elf/kernel/primary.krn: OS 7.3.4P4
1:/backup/x86_elf/kernel/primary.krn: OS 7.3.3P5 <-- backup kernel exists so re-doing 'sw update' cmd is NOT required
1:/x86_elf/diag/diag.krn:  5.4.6 ...
                      On each head: > software update 734P4_setup_e.exe -r       ... ( TIME: 10 to 30 minutes )

       On each head: > options autosupport.doit after-upgrade

No comments:

Post a Comment

Troubleshooting NFS

Common NFS Errors "No such host" - Name of the server is specified incorrectly "No such file or directory" - Either...