3PAR 7440 can't use the disk after changed the failed disk

andrew.leecw · **Joined:** Mon Oct 16, 2023 3:05 am **Posts:** 6

There are 3 failed disk in 3PAR and we want to change it one by one but after changed the first failed disk, the Sed state is still unknown, what i can do to make it online again?

cli% showsys
---------------(MB)----------------
ID ---Name---- ----Model---- -Serial- Nodes Master TotalCap AllocCap FreeCap FailedCap
47746 EUUK-3PAR02 HP_3PAR 7440c 1647746 1 0 17293312 14332928 1839104 564224

cli% showversion
Release version 3.2.1 (MU3)
Patches: P17,P18,P19,P23,P25

Component Name Version
CLI Server 3.2.1 (MU3)
CLI Client 3.2.1
System Manager 3.2.1 (P19)
Kernel 3.2.1 (MU3)
TPD Kernel Code 3.2.1 (P17)
TPD Kernel Patch 3.2.1 (P25)

cli% servicemag status
Cage 0, magazine 5:
A servicemag resume command failed on this magazine.
The command completed at Tue Sep 19 10:17:42 2023.
servicemag resume 0 5 -- Failed. Please run servicemag status -d for more detail

Cage 0, magazine 7:
The magazine was successfully brought offline by a servicemag start command.
The command completed at Tue Jul 11 19:34:01 2023.
servicemag start -wait -pdid 7 -- Succeeded

Cage 1, magazine 13:
A servicemag resume command failed on this magazine.
The command completed at Wed Apr 24 16:45:42 2019.
servicemag resume 1 13 -- Failed. Please run servicemag status -d for more detail

cli% showpd -s
Id CagePos Type -State-- ---------------------------------------------Detailed_State--------------------------------------------- -SedState--
--- 0:5:0 FC degraded missing_A_port not_capable
--- 1:13:0 FC degraded missing_A_port not_capable
0 0:0:0 FC degraded missing_A_port,disabled_A_port,disabled_B_port,over_temperature_alert,prolonged_not_ready,no_valid_ports not_capable
1 0:1:0 FC degraded missing_A_port not_capable
2 0:2:0 FC degraded missing_A_port not_capable
3 0:3:0 FC degraded missing_A_port not_capable
4 0:4:0 FC degraded missing_A_port not_capable
5 0:5:0? FC degraded missing,servicing unknown
6 0:6:0 FC degraded missing_A_port,disabled_A_port,disabled_B_port,prolonged_not_ready,no_valid_ports not_capable
7 0:7:0 FC failed vacated,missing,invalid_media,failed_hardware,servicing unknown
8 0:8:0 FC degraded missing_A_port not_capable
9 0:9:0 FC degraded missing_A_port not_capable
10 0:10:0 FC degraded missing_A_port not_capable
11 0:11:0 FC degraded missing_A_port not_capable
12 0:12:0 FC degraded missing_A_port not_capable
13 0:13:0 FC degraded missing_A_port not_capable
14 0:14:0 FC degraded missing_A_port not_capable
15 0:15:0 FC degraded missing_A_port not_capable
16 0:16:0 FC degraded missing_A_port,invalid_media,smart_threshold_exceeded not_capable
17 0:17:0 FC degraded missing_A_port not_capable
18 1:0:0 FC degraded missing_A_port not_capable
19 1:1:0 FC degraded missing_A_port not_capable
20 1:2:0 FC degraded missing_A_port not_capable
21 1:3:0 FC degraded missing_A_port not_capable
22 1:4:0 FC degraded missing_A_port,invalid_media,smart_threshold_exceeded not_capable
23 1:5:0 FC degraded missing_A_port not_capable
24 1:6:0 FC degraded missing_A_port not_capable
25 1:7:0 FC degraded missing_A_port not_capable
26 1:8:0 FC degraded missing_A_port not_capable
27 1:9:0 FC degraded missing_A_port not_capable
28 1:10:0 FC degraded missing_A_port,invalid_media,smart_threshold_exceeded not_capable
29 1:11:0 FC degraded missing_A_port not_capable
30 1:12:0 FC degraded missing_A_port not_capable
31 1:13:0? FC failed vacated,missing,invalid_media unknown
32 1:14:0 FC degraded missing_A_port not_capable
33 1:15:0 FC degraded missing_A_port not_capable
34 1:16:0 FC degraded missing_A_port not_capable
35 1:17:0 FC degraded missing_A_port not_capable
36 0:18:0 FC degraded missing_A_port not_capable
37 0:19:0 FC degraded missing_A_port not_capable
38 0:20:0 FC degraded missing_A_port not_capable
39 0:21:0 FC degraded missing_A_port not_capable
40 0:22:0 FC degraded missing_A_port not_capable
41 0:23:0 FC degraded missing_A_port not_capable
42 1:18:0 FC degraded missing_A_port not_capable
43 1:19:0 FC degraded missing_A_port not_capable
44 1:20:0 FC degraded missing_A_port not_capable
45 1:21:0 FC degraded missing_A_port not_capable
46 1:22:0 FC degraded missing_A_port not_capable
47 1:23:0 FC degraded missing_A_port not_capable
----------------------------------------------------------------------------------------------------------------------------------------------
50 total

Can anyone please assist or advise what we can do to clear the error?

Many Thanks,
Andrew

MammaGutt · **Posted:** Mon Oct 16, 2023 4:43 am

Well, you have a lot of issues on the system (seems like one node is dead).

First things first and related to your question, which drive did you replace and what did you do?

andrew.leecw · **Joined:** Mon Oct 16, 2023 3:05 am **Posts:** 6

i had replaced the following

5 0:5:0? FC degraded missing,servicing unknown

just replaced the failed disk with a new one at this moment

MammaGutt · **Posted:** Mon Oct 16, 2023 7:13 am

andrew.leecw wrote:

i had replaced the following

5 0:5:0? FC degraded missing,servicing unknown

just replaced the failed disk with a new one at this moment

servicemag resume 0 5

servicemag status 0 5

what do these commands say?

andrew.leecw · **Joined:** Mon Oct 16, 2023 3:05 am **Posts:** 6

This the replies for the command

cli% servicemag resume 0 5
Are you sure you want to run servicemag?
select q=quit y=yes n=no: y
servicemag : Permission denied
cli% servicemag status 0 5
A servicemag resume command failed on this magazine.
The command completed at Tue Sep 19 10:17:42 2023.
servicemag resume 0 5 -- Failed. Please run servicemag status -d for more detail
cli% servicemag status -d
Cage 0, magazine 5:
A servicemag resume command failed on this magazine.
The command completed at Tue Sep 19 10:17:42 2023.
The output of the servicemag resume was:
servicemag resume 0 5
... onlooping mag 0 5
... firmware is current on pd WWN [5000C5005F7884F8]
... firmware is current on pd WWN [5000C5007EF4FB04] Id [ 5]
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 0 5
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 0 5
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 0 5
... checking for valid disks...
Failed --
disk WWN [5000C5005F7884F8] not admitted
Failed --
disk WWN [5000C5005F7884F8] is not normal. Please use showpd -s to see details of disk state
servicemag resume 0 5 -- Failed

Please help to advise what I can do.

Thanks,
Andrew

MammaGutt · **Posted:** Tue Oct 17, 2023 12:14 am

andrew.leecw wrote:

This the replies for the command

cli% servicemag resume 0 5
Are you sure you want to run servicemag?
select q=quit y=yes n=no: y
servicemag : Permission denied
cli% servicemag status 0 5
........
Failed --
disk WWN [5000C5005F7884F8] not admitted
Failed --
disk WWN [5000C5005F7884F8] is not normal. Please use showpd -s to see details of disk state
servicemag resume 0 5 -- Failed

What user are you using?

What does "showpd -i" say?

andrew.leecw · **Joined:** Mon Oct 16, 2023 3:05 am **Posts:** 6

i am using the user 3parcust for that, is there other account that i can use?

the following is the reply of the command
cli% showpd -i
Id CagePos State ----Node_WWN---- --MFR-- -----Model------ -Serial- -FW_Rev- Protocol MediaType -----AdmissionTime-----
--- 0:5:0 degraded 5000C5005F7884F8 SEAGATE SYJKT0300GBAS15K 6XN385D7 3P01 SAS Magnetic -----------------------
--- 1:13:0 degraded 5000C5007150538C SEAGATE SYJKT0300GBAS15K 6XN5MX4L 3P01 SAS Magnetic -----------------------
0 0:0:0 degraded 5000C5007E732FE0 SEAGATE SYJKT0300GBAS15K 6XN7S1D5 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
1 0:1:0 degraded 5000C5007F08B1DC SEAGATE SYJKT0300GBAS15K 6XN82F5V 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
2 0:2:0 degraded 5000C5007EF83BF0 SEAGATE SYJKT0300GBAS15K 6XN829DH 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
3 0:3:0 degraded 5000C5007EF47A00 SEAGATE SYJKT0300GBAS15K 6XN80XQX 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
4 0:4:0 degraded 5000C5007EEAE4BC SEAGATE SYJKT0300GBAS15K 6XN7ZWDL 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
5 0:5:0? degraded 5000C5007EF4FB04 SEAGATE SYJKT0300GBAS15K 6XN80C5X 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
6 0:6:0 degraded 5000C5007EF50308 SEAGATE SYJKT0300GBAS15K 6XN80C2N 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
7 0:7:0 failed 5000C5007EEAE7E0 SEAGATE SYJKT0300GBAS15K 6XN804CE 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
8 0:8:0 degraded 5000C5007E7EBD50 SEAGATE SYJKT0300GBAS15K 6XN7T0QS 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
9 0:9:0 degraded 5000C5007E7B7CFC SEAGATE SYJKT0300GBAS15K 6XN7RJ3N 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
10 0:10:0 degraded 5000C5007F089DA8 SEAGATE SYJKT0300GBAS15K 6XN82FLV 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
11 0:11:0 degraded 5000C5007E720828 SEAGATE SYJKT0300GBAS15K 6XN7QW0N 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
12 0:12:0 degraded 5000C5007E7EBAE4 SEAGATE SYJKT0300GBAS15K 6XN7Q81N 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
13 0:13:0 degraded 5000C5007EF83B88 SEAGATE SYJKT0300GBAS15K 6XN829D9 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
14 0:14:0 degraded 5000C5007EF50C80 SEAGATE SYJKT0300GBAS15K 6XN80BY4 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
15 0:15:0 degraded 5000C5007E7EB948 SEAGATE SYJKT0300GBAS15K 6XN7RVK2 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
16 0:16:0 degraded 5000C5007EF503F4 SEAGATE SYJKT0300GBAS15K 6XN80C21 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
17 0:17:0 degraded 5000C5007EF83BC8 SEAGATE SYJKT0300GBAS15K 6XN829DC 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
18 1:0:0 degraded 5000C5007EEE4DD4 SEAGATE SYJKT0300GBAS15K 6XN7YGVV 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
19 1:1:0 degraded 5000C5007F08F2D4 SEAGATE SYJKT0300GBAS15K 6XN82ERP 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
20 1:2:0 degraded 5000C5007F08FB1C SEAGATE SYJKT0300GBAS15K 6XN82EKD 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
21 1:3:0 degraded 5000C5007F091C7C SEAGATE SYJKT0300GBAS15K 6XN82E7X 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
22 1:4:0 degraded 5000C5007F08C964 SEAGATE SYJKT0300GBAS15K 6XN82F11 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
23 1:5:0 degraded 5000C5007F08D624 SEAGATE SYJKT0300GBAS15K 6XN82EWV 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
24 1:6:0 degraded 5000C5007E7166AC SEAGATE SYJKT0300GBAS15K 6XN7R4GQ 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
25 1:7:0 degraded 5000C5007F08D380 SEAGATE SYJKT0300GBAS15K 6XN82FAV 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
26 1:8:0 degraded 5000C5007F08FF64 SEAGATE SYJKT0300GBAS15K 6XN82FL0 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
27 1:9:0 degraded 5000C5007F08EA6C SEAGATE SYJKT0300GBAS15K 6XN82ER7 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
28 1:10:0 degraded 5000C5007F08DC40 SEAGATE SYJKT0300GBAS15K 6XN82F5M 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
29 1:11:0 degraded 5000C5007F091FD0 SEAGATE SYJKT0300GBAS15K 6XN82E69 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
30 1:12:0 degraded 5000C5007F08C538 SEAGATE SYJKT0300GBAS15K 6XN82EZV 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
31 1:13:0? failed 5000C5007F091230 SEAGATE SYJKT0300GBAS15K 6XN82EDF 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
32 1:14:0 degraded 5000C5007EEACED4 SEAGATE SYJKT0300GBAS15K 6XN80MHM 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
33 1:15:0 degraded 5000C5007EF4FFE0 SEAGATE SYJKT0300GBAS15K 6XN80C79 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
34 1:16:0 degraded 5000C5007F08E844 SEAGATE SYJKT0300GBAS15K 6XN80E7B 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
35 1:17:0 degraded 5000C5007EF47AE0 SEAGATE SYJKT0300GBAS15K 6XN80CVF 3P01 SAS Magnetic 2016-01-27 09:49:17 PST
36 0:18:0 degraded 5000CCA0595F12F7 HGST HKCF0600S5xeN015 0XHP8X5P 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
37 0:19:0 degraded 5000CCA0595F438F HGST HKCF0600S5xeN015 0XHPD4HP 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
38 0:20:0 degraded 5000CCA0595DB2C7 HGST HKCF0600S5xeN015 0XHNJG2P 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
39 0:21:0 degraded 5000CCA0595ED99B HGST HKCF0600S5xeN015 0XHP52SP 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
40 0:22:0 degraded 5000CCA0595F84B7 HGST HKCF0600S5xeN015 0XHPJH0P 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
41 0:23:0 degraded 5000CCA0595EEC63 HGST HKCF0600S5xeN015 0XHP6AJP 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
42 1:18:0 degraded 5000CCA0595F95DF HGST HKCF0600S5xeN015 0XHPKMEP 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
43 1:19:0 degraded 5000CCA0595F5D63 HGST HKCF0600S5xeN015 0XHPEVUP 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
44 1:20:0 degraded 5000CCA0595E5BEF HGST HKCF0600S5xeN015 0XHNWR9P 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
45 1:21:0 degraded 5000CCA0595F4803 HGST HKCF0600S5xeN015 0XHPDEPP 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
46 1:22:0 degraded 5000CCA0595F128F HGST HKCF0600S5xeN015 0XHP8WAP 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
47 1:23:0 degraded 5000CCA0595F444B HGST HKCF0600S5xeN015 0XHPD60P 3P00 SAS Magnetic 2016-02-26 05:16:02 PST
---------------------------------------------------------------------------------------------------------------------------
50 total

MammaGutt · **Posted:** Tue Oct 17, 2023 12:49 am

andrew.leecw wrote:

i am using the user 3parcust for that, is there other account that i can use?

Yes. 3parcust indicates that you're connecting thru the Service Processor with limited rights/access.

You should connect directly to the 3PAR system with user 3paradm.

andrew.leecw · **Joined:** Mon Oct 16, 2023 3:05 am **Posts:** 6

Hi Mamma,

After i connect with 3paradm directly to the 3PAR to run the command, the reply is as follow:

cli% servicemag resume 0 5
Are you sure you want to run servicemag?
select q=quit y=yes n=no: y
servicemag resume 0 5
... onlooping mag 0 5
... firmware is current on pd WWN [5000C5005F7884F8]
... firmware is current on pd WWN [5000C5007EF4FB04] Id [ 5]
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 0 5
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 0 5
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 0 5
... checking for valid disks...
Failed --
disk WWN [5000C5005F7884F8] not admitted
Failed --
disk WWN [5000C5005F7884F8] is not normal. Please use showpd -s to see details of disk state
servicemag resume 0 5 -- Failed
Command failed

Thanks,
Andrew

MammaGutt · **Posted:** Thu Oct 19, 2023 8:15 am

andrew.leecw wrote:

Hi Mamma,

After i connect with 3paradm directly to the 3PAR to run the command, the reply is as follow:

cli% servicemag resume 0 5
Are you sure you want to run servicemag?
select q=quit y=yes n=no: y
servicemag resume 0 5
... onlooping mag 0 5
... firmware is current on pd WWN [5000C5005F7884F8]
... firmware is current on pd WWN [5000C5007EF4FB04] Id [ 5]
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 0 5
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 0 5
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 0 5
... checking for valid disks...
Failed --
disk WWN [5000C5005F7884F8] not admitted
Failed --
disk WWN [5000C5005F7884F8] is not normal. Please use showpd -s to see details of disk state
servicemag resume 0 5 -- Failed
Command failed

Thanks,
Andrew

Now the permissions issue is gone.

Now on to your next issue .... New drive will never go "normal" with your failed node because it will never onloop the missing A-port.

Reviewing "help servicemag" I don't see an option to force the admitting of a drive with a failed controller node.

I'm just throwing it out there.....

You have 1 of 2 nodes failed
You have 3 failed drives (5, 7, 31) and 2 of those have been tried to be replaced (5 and 31).
You have 2 drives (0, 6) throwing "over_temperature_alert" with "disabled_A_port" and "disabled_B_port" which should mean that they are out of the system.
You have 3 drives (16, 22, 28) throwing pre-failure warning that have not been acted on due to the number of failed drives in the system.

So what you do have, is 1 of 2 nodes online and 40 of 48 drives "Okay-ish". I think this system will probably cost more time and money to get back in a healthy state than it would replacing it.

HPE Storage Users Group

3PAR 7440 can't use the disk after changed the failed disk

Who is online