HPE Storage Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: 3par Block size
PostPosted: Tue Dec 19, 2017 8:49 am 

Joined: Tue Dec 19, 2017 8:36 am
Posts: 11
Best Practices of TDVVs in 3par storage says:

The granularity of deduplication is 16 KiB and therefore the efficiency is greatest when the I/Os are aligned to this granularity. For hosts that use file systems with tunable allocation units consider setting the allocation unit to a multiple of 16 KiB.

I have cluster of ESXi 5.5/6 hosts (6.0 vCenter) but I don't understand if 16 KB (or multiple) block size must be set to VMFS Datastore or to every VM or to both?
Block size in my datastores is 1 MB, so I think It'ok because It's multiple of 16 KB.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3par Block size
PostPosted: Tue Dec 19, 2017 12:43 pm 

Joined: Wed Nov 19, 2014 5:14 am
Posts: 505
ESX should already be fine, depending on the datastore version the block size will be measured in MB's https://kb.vmware.com/s/article/1003565.

It's the guest's file system that you want to align if you need to get the most out of dedupe between VM's.
See page 5 and appendix B of this document.
https://h20195.www2.hpe.com/V2/Getdocument.aspx?docname=a00006358enw

It's the IO blender effect in which you will have multiple file systems writing randomly within the same blocks. As a result of the overlapping random writes you will see fewer matches between blocks and so lower dedupe ratios as a result.

If you can't do this for existing VM's then consider modifying the templates used for any new VM's.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3par Block size
PostPosted: Tue Dec 19, 2017 12:54 pm 

Joined: Tue Dec 19, 2017 8:36 am
Posts: 11
JohnMH wrote:
ESX should already be fine, depending on the datastore version the block size will be measured in MB's https://kb.vmware.com/s/article/1003565.

It's the guest's file system that you will need to align if you want to get the most of dedupe between VM's. Otherwise you will have multiple file systems writing randomly within blocks and so you will see fewer matches between blocks and so lower dedupe ratios.

See page 5 and appendix B of this document.
url]https://h20195.www2.hpe.com/V2/Getdocument.aspx?docname=a00006358enw[/url]

If you can't do this for existing VM's then consider modifying the templates used for any new VM's.



I read document which you indicated and I'd like to know other info because my dedup ratio is not over 1.2 for each virtual volume, although all vm with same OS are in the same VV.
Where can I find problem? Suggestions? I thought to block size.


Last edited by zio_mangrovia on Tue Dec 19, 2017 2:17 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: 3par Block size
PostPosted: Tue Dec 19, 2017 1:07 pm 

Joined: Wed Nov 19, 2014 5:14 am
Posts: 505
Assuming NTFS then you will have a 4KB block size per file system at the guest.

If you are writing into a 16KB block on the back end then since this is a shared datastore what can happen as a worst case is multiple VM's could write to the same page at say 4KB each e.g. 4VM's x 4KB = 16KB.

In order to dedupe the data against other blocks you have to have an exact match of that collection of 4KB blocks in another 16KB block. However since it's a shared datastore you're highly likely to have overlapping writes from different guest file systems going to the same block.

Since you can have multiple VM's writing into the same block your probability of a match is greatly reduced. By using a 16KB block or derivative then you prevent this and greatly increase your chances of a match.

There's more to it than that but the block size increase will maximize the chances of a match. Also you should consider running the dedupe estimator before converting to TDVV's. Some data types simply won't dedupe and others may be better separated if you want to achieve a high ratio e.g. O/S and data volumes. You should also make sure you are on a current firmware release.

One other thing is looking at the dedupe ratio at the volume level isn't truly accurate as it's a load factored value. The true ratio is measured at the CPG i.e across volumes.

The most important thing though is to look at holistic savings rather just the dedupe ratio (it's just a ratio) as this number can be dragged down by the none duplicate data, especially if it that makes up a large proportion of the data in the VM's.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3par Block size
PostPosted: Wed Dec 20, 2017 1:47 am 

Joined: Tue Dec 19, 2017 8:36 am
Posts: 11
JohnMH wrote:
Assuming NTFS then you will have a 4KB block size per file system at the guest.

Right!

Quote:
In order to dedupe the data against other blocks you have to have an exact match of that collection of 4KB blocks in another 16KB block. However since it's a shared datastore you're highly likely to have overlapping writes from different guest file systems going to the same block.

So, if I understand correctly, datastore block size is ok and it's always multiple of 16KB (1MB, 2MB , ...) while to increase probability of a match it's necessary to change Block Size into VM's.
I should to reinstall OS and format file system Wndows/Linux, do you know if these OS permit to set BS=16KB?

Quote:
Also you should consider running the dedupe estimator before converting to TDVV's

How can I estimate it? Where can I find this task?

Quote:
Some data types simply won't dedupe and others may be better separated if you want to achieve a high ratio e.g. O/S and data volumes

I read it, infact I put identical Operating Systems version into the same Virtual Volume. (eg. Windows 2008 R2 in TDVV1 and Windows 2003 in TDVV2).
My VV's are all created as TDVV's
I created 2 CPG: one for application (I thouught dedup is more efficient) and the other for database (I know dedup is very low).
But after your words, maybe to convert TDVV to TPVV inside DataBase CPG to not increase fruitlessly 3par load.

Quote:
You should also make sure you are on a current firmware release

Now I have 3.2.2 MU3 but I think to update it to last version.

Quote:
One other thing is looking at the dedupe ratio at the volume level isn't truly accurate as it's a load factored value. The true ratio is measured at the CPG i.e across volumes.

I'm using 'showvv -space' to check dedup ratio, instead I should to use 'showcpg -s' ?
What means 'load factored value'?


Top
 Profile  
Reply with quote  
 Post subject: Re: 3par Block size
PostPosted: Wed Dec 20, 2017 6:12 am 

Joined: Wed Nov 19, 2014 5:14 am
Posts: 505
Quote:
I should to reinstall OS and format file system Wndows/Linux, do you know if these OS permit to set BS=16KB?

Yes See WP I originally linked to Appendix B, in fact if you want a better understanding it's worth reading the whole WP.
https://h20195.www2.hpe.com/V2/Getdocument.aspx?docname=a00006358enw

Quote:
How can I estimate it? Where can I find this task?

See page 15 of this whitepaper, you can also use the checkvv command with -dedup_dryrun switch https://h20195.www2.hpe.com/v2/GetPDF.aspx/4AA5-5731ENW.pdf
The recommendation is to start with TPVV and then convert to TDVV if the results show a benefit, unfortunately this tends to happen in reverse in many cases.

Quote:
Now I have 3.2.2 MU3 but I think to update it to last version.

Incremental improvements were made throughout the 3.2.2 release for TDVV2 with EMU4, MU6 being the latest updates. 3.3.1 is the latest release and adds a new TDVV3 metadata layout that has improvements for dedupe and adds compression if you have the hardware. However it means creating a new CPG which will get the new TDVV3 format and then tuning the volumes to the new CPG for which you need plenty of space.

Quote:
I'm using 'showvv -space' to check dedup ratio, instead I should to use 'showcpg -s' ?

use showcpg -space as this will show you the correct ratio across all volumes in a CPG which is the deduplication domain.

Quote:
What means 'load factored value'?

It means it's an estimate since many blocks between volumes are shared within the CPG and data is changing constantly it's impossible to provide a 100% accurate number for consumption per volume within the dedupe store at any one time. Hence why you should look at the CPG as that takes savings across all volumes into account. The showvv dedupe ratio has been removed in 3.3.1, the load factoring was an attempt to provide more information but ultimately led to confusion.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 


Who is online

Users browsing this forum: Google [Bot] and 60 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt