Storage Admins: What do you monitor to ensure a healthy env?

walter_white · **Joined:** Wed Nov 08, 2017 8:57 am **Posts:** 42

Howdy.. Was hoping I could get some advice on what everyone monitors, how often then monitor it, what alerts they have setup with what thresholds, etc..

Basically, what do you do as a storage admin to ensure a healthy environment? I am very green to 3Par so I was hoping to get some general "must haves"..

IE: Setup Threshold Alerts - This threshold with these metrics

Setup Reporting - These reports help with X and I run them X..

Anything else that would be beneficial!

Thanks for your time!!

ailean · **Joined:** Wed Nov 09, 2011 12:01 pm **Posts:** 392

Space is probably the first thing to watch, especially if not used to thin based arrays (you will have typically allocated out more space then exists in the array, so you need to track actual usage and be aware of potential spikes).

Raw space in each tier can be alerted from the array system properties, depending on your scale, growth and time to turn around additional disk purchases I'd set this around 75-85%.

SSMC has some built-in dashboards for showing growth rate graphs, worth keeping an eye on.

Performance probably the next thing, I tend to do daily reports on busy systems for 'Exported Volumes Compare Hosts' so I can keep on eye on who is pushing the most IO to/from the arrays (I have these set to email so I can dig back easily, they take a while to generate so looking at daily PDFs normally quicker then doing a new report).

Also have alerts based on port bandwidth (around 85-90%), phyical disk IOPS (around the stress point of the type of disk, e.g. 15k FC/SAS ~220iops) and disk port service time (for some potential hotspot and silent fault spotting, around 35ms).

Most of the fault monitoring is all automatic via the SP/SSMC with some alerts on patches etc thrown in, it's also worth setting up the Infosight link to get more general info like balanced host ports, system load, recommended patches and trending etc.

You can also run the built-in healthcheck commands before doing anything to the system (HPE support will typically do this before/after they do things).

BryanW · **Posted:** Mon Jul 16, 2018 11:49 pm

VLUN and PD thresholds for latency in SystemReporter/SSMC are a good place to start as well as some sort of host/guest based latency reporting (Windows PerfMon, ESXi vRO etc). Latency is typically the main thing we watch out for other than capacity.

The 3PAR reporting tends to report on the low side of actual latency we see at hosts so we tend to use a combination of it and host or DB reporting to verify all is well.

If you run FC, implement something like Brocade MAPS

walter_white · **Joined:** Wed Nov 08, 2017 8:57 am **Posts:** 42

ailean wrote:

Space is probably the first thing to watch, especially if not used to thin based arrays (you will have typically allocated out more space then exists in the array, so you need to track actual usage and be aware of potential spikes).

Raw space in each tier can be alerted from the array system properties, depending on your scale, growth and time to turn around additional disk purchases I'd set this around 75-85%.

SSMC has some built-in dashboards for showing growth rate graphs, worth keeping an eye on.

Performance probably the next thing, I tend to do daily reports on busy systems for 'Exported Volumes Compare Hosts' so I can keep on eye on who is pushing the most IO to/from the arrays (I have these set to email so I can dig back easily, they take a while to generate so looking at daily PDFs normally quicker then doing a new report).

Also have alerts based on port bandwidth (around 85-90%), phyical disk IOPS (around the stress point of the type of disk, e.g. 15k FC/SAS ~220iops) and disk port service time (for some potential hotspot and silent fault spotting, around 35ms).

Most of the fault monitoring is all automatic via the SP/SSMC with some alerts on patches etc thrown in, it's also worth setting up the Infosight link to get more general info like balanced host ports, system load, recommended patches and trending etc.

You can also run the built-in healthcheck commands before doing anything to the system (HPE support will typically do this before/after they do things).

Thanks a lot for the reply!! As far as setting up the threshold alerts via SSMC.. If NinjaStar says around 3,000 IO is the total I can expect should I set up the alert to be "Physical Drives > Total IOps > 3,000" ?

If the sampling is set to Hi-Res which is every 5 minutes, does that put a performance impact on the SAN itself?

Also, I do not see a report 'Exported Volumes Compare Hosts'.. Do you just schedule your reports to run nightly at like midnite or something?

Thanks again!

walter_white · **Joined:** Wed Nov 08, 2017 8:57 am **Posts:** 42

ailean wrote:

Space is probably the first thing to watch, especially if not used to thin based arrays (you will have typically allocated out more space then exists in the array, so you need to track actual usage and be aware of potential spikes).

Raw space in each tier can be alerted from the array system properties, depending on your scale, growth and time to turn around additional disk purchases I'd set this around 75-85%.

SSMC has some built-in dashboards for showing growth rate graphs, worth keeping an eye on.

Performance probably the next thing, I tend to do daily reports on busy systems for 'Exported Volumes Compare Hosts' so I can keep on eye on who is pushing the most IO to/from the arrays (I have these set to email so I can dig back easily, they take a while to generate so looking at daily PDFs normally quicker then doing a new report).

Also have alerts based on port bandwidth (around 85-90%), phyical disk IOPS (around the stress point of the type of disk, e.g. 15k FC/SAS ~220iops) and disk port service time (for some potential hotspot and silent fault spotting, around 35ms).

Most of the fault monitoring is all automatic via the SP/SSMC with some alerts on patches etc thrown in, it's also worth setting up the Infosight link to get more general info like balanced host ports, system load, recommended patches and trending etc.

You can also run the built-in healthcheck commands before doing anything to the system (HPE support will typically do this before/after they do things).

Ailean, I don't see where I can schedule the system reports.. Are you able to assist? I would like the Exported Volumes Compare by Performance emailed to me daily, if possible..

Thanks!

HPE Storage Users Group

Storage Admins: What do you monitor to ensure a healthy env?

Who is online