So while learning about ZFS dedup and compression I came across the command zdb -DD which tells me about the dedup and compression ratios.. But I didn't really understand the output and couldn't find much info on the net to explain it. So..
Most of this is a wild ass guess.
Allocated is the space actually allocated in the pool.
Referenced is the amount of data referenced to by dedup/compression/copies this is how large things might be if all the data was copied to a non-zfs filesystem.
LSIZE seems to be related to zfs dedup and I'm guessing it's the size -after- dedup takes place.
PSIZE seems to be related to zfs compression and I'm guessing it's the size -after- compression takes place.
DSIZE seems to be related to zfs copies and I'm guessing it's the size -after- copies takes place.
IT would seem that DSIZE is the final answer of “How big is it” on both Allocated and Referenced.
You can use the 3 values to break down differences from dedup/compression/copies individually and see how each one contributes to the total size
All examples are for a single pool with 9 copies of identical file
DDT-sha256-zap-duplicate: 1 entries, size 4608 on disk, 8192 in core
DDT histogram (aggregated over all DDTs):
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
8 1 6.50K 6.50K 6.50K 9 58.5K 58.5K 58.5K
Total 1 6.50K 6.50K 6.50K 9 58.5K 58.5K 58.5K
dedup = 9.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 9.00
DDT-sha256-zap-duplicate: 1 entries, size 4608 on disk, 8192 in core
DDT histogram (aggregated over all DDTs):
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
8 1 6.50K 2.50K 2.50K 9 58.5K 22.5K 22.5K
Total 1 6.50K 2.50K 2.50K 9 58.5K 22.5K 22.5K
dedup = 9.00, compress = 2.60, copies = 1.00, dedup * compress / copies = 23.40
DDT-sha256-zap-duplicate: 1 entries, size 4608 on disk, 8192 in core
DDT histogram (aggregated over all DDTs):
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
8 1 6.50K 2.50K 7.50K 9 58.5K 22.5K 67.5K
Total 1 6.50K 2.50K 7.50K 9 58.5K 22.5K 67.5K
dedup = 9.00, compress = 2.60, copies = 3.00, dedup * compress / copies = 7.80
This output shows an example of some real world (for me) data. It consists of nightly backups of our user's folder server. In total that system stores about 900GB of individual user data that ranges from Outlook archive files, music, photos, office documents, and other random things. The backup runs nightly and there are about 10 nights data here. Each night it does a –link-dest against the previous nights data. So there's a large number of hard-link files that helps reduce the total size substantially already.
… to be added