HPE Storage Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: 3.1.3 MU2 Weirdness - RCIP Ports & CPG Allocation
PostPosted: Fri Mar 27, 2015 12:19 pm 

Joined: Fri Dec 12, 2014 11:54 am
Posts: 20
Location: Stamford, CT
I already have tickets in with HP for these issues, but figured I'd reach out to this community just in case anyone has some insight.

Our V400 was upgraded to 3.1.3 MU2 on Wednesday evening. Later that night, our monitoring software alerted that the RCIP ports were no longer available. I started up a CLI session and determined that the ports were up and Remote Copy was transferring properly. I could not, however, ping the Remote Copy ports. Now, I don't know if I was ever able to ping them, but it seemed strange to me.

Logging into the IMC, I went to Systems --> Ports --> Remote Copy. Again, everything looked fine in terms of IPs, Gateways, etc. I then used the ping function from node 0's RCIP port to ping the gateway: success. Its replication partner's node 0 RCIP port: success. Then I tried pinging my computer's IP. This is where it gets fun. Immediately, my IMC froze up, and my CLI sessions shut down. I kill the IMC and try to log in, but the 3PAR array isn't reachable. Ping the management port... nothing. I ask a coworker, and he's able to get to the array fine. My PC, however, was completely locked out from communicating with the array. Thinking maybe something got wonky with my network adapter, I reset it, flush my arp cache, and try again... no dice.

I log onto a server where I keep a bunch of management tools, and I'm able to access the array. Now everything looks good, but Remote Copy link 0 (node 0 here to node 0 remote) is down. Now's a good time to mention I'm running a constant ping to the management IP from my desktop with 100% packet loss. I disable RCIP port 0 on the 3PAR, thinking I'll reset it. Instantly, my computer starts being able to ping the management port (which is on a different VLAN entirely from RCIP). Sure enough, I'm able to log into the IMC and CLI. I reenable port 0, Remote Copy indicates the link is up, data begins transferring, and I'm still able to reach the 3PAR from my PC.

Figuring it may be a fluke or some routing issue, I try the same thing, but this time pinging my management server (which is on the same VLAN as the 3PAR management port) from the Remote Copy port. Same thing! A constant ping to the 3PAR immediately dropped. Resetting the port restored its access.

Okay, that's issue number 1. Issue number 2 is even worse, because at least Remote Copy is working, despite the ports not responding to pings (except if performed from the core). Despite a CPG having plenty of room (>25TB), a volume was unable to allocate space.

Event id: 7252821 Node 1 Cust Alert - Yes, Svc Alert - Yes
Severity: Critical
Event type: TP VV allocation failure
Alert ID: 554
Msg ID: 270007
Component: Virtual Volume 31556 3PAR-Volume-Name CPG 11 CPG_Name
Short Dsc: TP VV 3PAR-Volume-Name allocation failure
Event String: Thin provisioned VV 3PAR-Volume-Name unable to allocate SD space from CPG CPG_Name


This directly impacted underlying hosts. Luckily, our VMware admin was just doing a storage vmotion, so no data was lost, but it could have been. I used DO on the affected volume to tune it back to the same CPG. This caused the free space on that CPG to grow, essentially avoiding the automatic SD space allocation process. The VV was then able to grow as required... for a while. When that free space was consumed, the error reappeared, and I had to DO the volume again to create wiggle room.

So yeah, fun times. Anyone have any ideas or have an array on 3.1.3 MU2 that they can check for the same RCIP port behavior?

Thanks,
Adam

::edit::

Got an update from HP on the CPG issue.

L2 investigation yields that we currently have a SW glitch that is preventing thin-provisioned volumes from grabbing more space from the CPG occasionally, even though there is free space available. The issue is under investigation.

Options at this time are:
1) convert your volumes to fully privisioned, which might not be feasible for you
2) upgrade your current OS 3.1.3MU2 to 3.2.1MU2, as we have not seen the issue in the 3.2.1 code base


There's no way we could accommodate fully provisioning our VVs and we, as an organization, try to steer clear of the bleeding edge, so neither option is really suitable for us. Thanks HP!


Top
 Profile  
Reply with quote  
 Post subject: Re: 3.1.3 MU2 Weirdness - RCIP Ports & CPG Allocation
PostPosted: Fri Mar 27, 2015 2:22 pm 

Joined: Mon Feb 03, 2014 9:40 am
Posts: 116
Do you have RCIP and controller's management on the same subnet/vlan by any chance? :)


Top
 Profile  
Reply with quote  
 Post subject: Re: 3.1.3 MU2 Weirdness - RCIP Ports & CPG Allocation
PostPosted: Fri Mar 27, 2015 2:25 pm 

Joined: Fri Dec 12, 2014 11:54 am
Posts: 20
Location: Stamford, CT
Nope, different VLANs.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3.1.3 MU2 Weirdness - RCIP Ports & CPG Allocation
PostPosted: Mon Mar 30, 2015 8:13 pm 

Joined: Mon Feb 03, 2014 9:40 am
Posts: 116
We had similar issue after upgrade to 3.1.3 MU2 but it was related to improper configuration (both management and RCIP were on the same network). We couldn't access array from networks accessible via gateway, only LAN.

I suggest to open up ticket with HP and have them investigate. They can go inside linux and check arp table, etc...


Top
 Profile  
Reply with quote  
 Post subject: Re: 3.1.3 MU2 Weirdness - RCIP Ports & CPG Allocation
PostPosted: Tue Mar 31, 2015 8:39 am 

Joined: Fri Dec 12, 2014 11:54 am
Posts: 20
Location: Stamford, CT
I already have a ticket open. Next step is to put a server in the same subnet as the RC ports and see if I can duplicate the behavior. We're able to ping the RC ports from the core router.

It's unfortunate that HP won't let us access a pure Linux session, as that would really allow us to troubleshoot the issue.

With the CPG growth issue, we might not be on 3.1.3 MU2 for long. Our DR array is scheduled to be upgraded to 3.2.1 MU2 on Thursday morning, prod likely to follow about a week later.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3.1.3 MU2 Weirdness - RCIP Ports & CPG Allocation
PostPosted: Tue Mar 31, 2015 9:58 am 

Joined: Sun Jul 29, 2012 9:30 am
Posts: 576
3par has had notorious issues with NIC driver changes that impact how the NICs link to the upstream switch. 3.1.3 was very particular about wanting ports set to AUTO/AUTO. Look at your network switch ports and see how the NICs are connecting. They also may be flapping. 3.2.1 MU2 breaks them yet again, seems like every OS upgrade causes us to keep re-configuring our NICs and switch ports. We are back to configuring both to 1gb / FULL for RCIP and they are solid.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3.1.3 MU2 Weirdness - RCIP Ports & CPG Allocation
PostPosted: Tue Mar 31, 2015 12:20 pm 

Joined: Fri Dec 12, 2014 11:54 am
Posts: 20
Location: Stamford, CT
Thanks for the information. I'll take a look at the switch settings, but I'm pretty sure they're set to auto (I also set the RCIP ports to auto as per the upgrade guide).

Remote Copy itself is working fine. In fact, earlier today I was looking at the Total Data Throughput in the IMC and the ports were actually showing as >110% utilized.

I finished my testing from the same VLAN as the RCIP ports (on the same FEX, too). From there, I was able to ping the Remote Copy ports, but I couldn't ping the 3PAR management port. I could ping things on the same VLAN as the management port, but not the management IP itself.

Pinging from the RCIP port to my computer was also successful when it was on the same VLAN, and when I connected my computer back to its normal VLAN, it was able to access the management IP again (but not RCIP).

For such a (for the most part) well thought out and brilliantly engineered piece of hardware, it's amazing how the little things like this seem to plague every software release.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3.1.3 MU2 Weirdness - RCIP Ports & CPG Allocation
PostPosted: Wed Apr 01, 2015 6:20 am 

Joined: Sun Jul 29, 2012 9:30 am
Posts: 576
Yes, I have been burned by 3.1.3 and 3.2.1 NIC issue that took our RC down for 15 hours on 3.1.3 and 5+ hours on 3.2.1. I am growing very disappointed with 3par QA especially when I specifically asked to verify 3.2.1 was not impacted by more NIC issues and was assured only the management ports had known issues. That and RC in general has been huge disappointment and Achilles heel for this product, such that we are considering looking at other options. While the array is excellent performer and great architecture the RC is in the stone ages compared to their competition. What good is a great performing array if I can;t DR the damn thing!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 


Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 202 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt