Hi,
We have just implemented a T800 with 48 disks, vSphere ESXi 4.1u1 running on HP BL490c G7 Blades with FlexFabric Converged adapters. The systems are in test phase with almost no load on the array.
We have a few test VM Guests installed with Windows 2008 R2. When Running IOmeter on the guests, we are seeing performance of about 600 IOPs (4K, 100% Read).
When running the same IOMeter test on a LUN presented to a Windows 2008 R2 server installed on a Blade (no VMware), we are seeing performance of over 5000 IOPs (running the same test). This level of performance is about what we expect with the number of spindles configured.
There is only one CPG configured. The volumes have the same configuration. The 3PAR VAAI plugin is installed on the hosts. The vSphere hosts are running round robin multipathing. We have changed the multipathing policy which makes no difference.
Any idea why there is so much difference in IOPs?
Thanks
Nik
Poor IOPs in vSphere Guest
- Richard Siemers
- Site Admin
- Posts: 1333
- Joined: Tue Aug 18, 2009 10:35 pm
- Location: Dallas, Texas
Re: Poor IOPs in vSphere Guest
I need to run those same tests in my environment, I am curious about block sizes and how those translate up the storage chain. A 4k io done by Iometer is likely getting converted to a 64k io by the vmware storage driver, and perhaps converted again by the VMFS system over the LUN. Are you benchmarking random 4k or sequential? Based on the low result I assume random? How much better/worse does it get when you switch to 8k?
Did you create your VMFS partition during the ESX install or with the GUI based client afterword? Partitions created by the installer are not "aligned" properly causing blocks to overlap tracks. Also Vmware recommends that the NTFS partition be formatted at 32k allocation unit size. http://communities.vmware.com/docs/DOC-11458
Also ESX 4 has proven a tricky beast when it comes to setting the luns up for Round Robin. After painstakingly using the gui to make all 10 luns on 5 esx servers round robin... (50 changes total) we applied some ESX patches and it reverted everything back to fixed path. The 3PAR implementation guide for ESX 4 has the commands to change the ESX defaults to RR.
Verify your ESX host setup inside the 3par as persona 6 and your ESX QFullSampleSize=32 and QFullThreshold=4 (Per 3PAR implementation guide for ESX 4)
Will have more to say once I get my own tests in order.
Did you create your VMFS partition during the ESX install or with the GUI based client afterword? Partitions created by the installer are not "aligned" properly causing blocks to overlap tracks. Also Vmware recommends that the NTFS partition be formatted at 32k allocation unit size. http://communities.vmware.com/docs/DOC-11458
Also ESX 4 has proven a tricky beast when it comes to setting the luns up for Round Robin. After painstakingly using the gui to make all 10 luns on 5 esx servers round robin... (50 changes total) we applied some ESX patches and it reverted everything back to fixed path. The 3PAR implementation guide for ESX 4 has the commands to change the ESX defaults to RR.
Verify your ESX host setup inside the 3par as persona 6 and your ESX QFullSampleSize=32 and QFullThreshold=4 (Per 3PAR implementation guide for ESX 4)
Will have more to say once I get my own tests in order.
Richard Siemers
The views and opinions expressed are my own and do not necessarily reflect those of my employer.
The views and opinions expressed are my own and do not necessarily reflect those of my employer.
Re: Poor IOPs in vSphere Guest
After doing some playing in IOmeter i have sorted the issue. The IOmeter tool has a value called "number of oustanding IO's". To conculde - I wasn't pushing the tool hard enough to stress the SAN. I have managed to go from 600 IOPS to 19,000 IOPS which is very good.
Thanks
Thanks
- Richard Siemers
- Site Admin
- Posts: 1333
- Joined: Tue Aug 18, 2009 10:35 pm
- Location: Dallas, Texas
Re: Poor IOPs in vSphere Guest
Good to hear. I've always related that setting as similar to the "queue depth" setting on the HBA.
Our T800 performance scale upward in performance from 1 to aprox 22 queued I/Os, from 22 - 64 it seemed to hit diminishing returns. I considered that validation for the emulex default queue depth of 32. Our plan had been to use queue depth (setting it to 8 for example) as a throttle to prevent non-production boxes from over utilizing the storage and competing with production hosts... but we have yet to see a need.
Our T800 performance scale upward in performance from 1 to aprox 22 queued I/Os, from 22 - 64 it seemed to hit diminishing returns. I considered that validation for the emulex default queue depth of 32. Our plan had been to use queue depth (setting it to 8 for example) as a throttle to prevent non-production boxes from over utilizing the storage and competing with production hosts... but we have yet to see a need.
Richard Siemers
The views and opinions expressed are my own and do not necessarily reflect those of my employer.
The views and opinions expressed are my own and do not necessarily reflect those of my employer.