Skip to main content

Data Deduplication Demo

Data Deduplication can save valuable amounts of hard drive space.  In today’s cost conscious environments, saving hard drive space can translate into budgets that can be utilized elsewhere.  The question that often pops up is “How much space will data dedup save me?”  Unfortunately, there is no way to make a accurate prediction.  Data dedup works best with static data.  That is because there is no reason to dedup data that changes often. 
The PowerShell code below will generate a few thousand text files that will share a lot of common bit patterns.  This will help to demonstrate some space savings with dedup.
$NewLineIndex = 0

For ($X=0;$X -lt 10000;$X++)

    $C1 = [Char]((Get-Random(35)) + 65)
    $C2 = [Char]((Get-Random(35)) + 65)
    $C3 = [Char]((Get-Random(35)) + 65)
    $C4 = [Char]((Get-Random(35)) + 65)

    $String += "$($C1)$($C2)$($C3)$($C4) "

    If ($NewLineIndex -gt 15)
        $Strgin += "`n"
        $NewLineIndex = 0

    $Name = "E:\PS\Files$($X).txt"
    $String | Out-File -LiteralPath $Name


Executing this code will generate a lot of files, but this will also help make more of a visual impact with this demonstration.  Once this code has executed, we need to install the data deduplcation
Install-WIndowsFeature FS-Data-Deduplication
Once this feature is installed, you need to enable this feature on the volume that is holding your data.  In this case, it is the E: drive.
Enable-DedupVolume –Volume E:
Data deduplication will only work when scheduled and on data that is at least 5 days old.   To make this demo work, we need to set the age requirment to 0 days.
Set-DedupVolume –Volume E: –MinimumFileAgeDays 0
Now we can start a deduplication.  To see our space savings:
Get-DedupStatus –Volume E:
Now we can start the deduplication.
You can check the status of the deduplication job:

And now to see what we got back:
This may not seem like a lot of savings for 10,000 files, but then again these were small files for the most part.  You results will vary.  In the end, this could be used as a tactic to free up space on hard drives that are critically short on space.  Also take a look at the File Server Resource Manager for more tools to help identify data that may be able to be moved to offline storage.


Popular posts from this blog

Adding a Comment to a GPO with PowerShell

As I'm writing this article, I'm also writing a customization for a PowerShell course I'm teaching next week in Phoenix.  This customization deals with Group Policy and PowerShell.  For those of you who attend my classes may already know this, but I sit their and try to ask the questions to myself that others may ask as I present the material.  I finished up my customization a few hours ago and then I realized that I did not add in how to put a comment on a GPO.  This is a feature that many Group Policy Administrators may not be aware of. This past summer I attended a presentation at TechEd on Group Policy.  One organization in the crowd had over 5,000 Group Policies.  In an environment like that, the comment section can be priceless.  I always like to write in the comment section why I created the policy so I know its purpose next week after I've completed 50 other tasks and can't remember what I did 5 minutes ago. In the Group Policy module for PowerShell V3, th

Return duplicate values from a collection with PowerShell

If you have a collection of objects and you want to remove any duplicate items, it is fairly simple. # Create a collection with duplicate values $Set1 = 1 , 1 , 2 , 2 , 3 , 4 , 5 , 6 , 7 , 1 , 2   # Remove the duplicate values. $Set1 | Select-Object -Unique 1 2 3 4 5 6 7 What if you want only the duplicate values and nothing else? # Create a collection with duplicate values $Set1 = 1 , 1 , 2 , 2 , 3 , 4 , 5 , 6 , 7 , 1 , 2   #Create a second collection with duplicate values removed. $Set2 = $Set1 | Select-Object -Unique   # Return only the duplicate values. ( Compare-Object -ReferenceObject $Set2 -DifferenceObject $Set1 ) . InputObject | Select-Object – Unique 1 2 This works with objects as well as numbers.  The first command creates a collection with 2 duplicates of both 1 and 2.   The second command creates another collection with the duplicates filtered out.  The Compare-Object cmdlet will first find items that are diffe

How to list all the AD LDS instances on a server

AD LDS allows you to provide directory services to applications that are free of the confines of Active Directory.  To list all the AD LDS instances on a server, follow this procedure: Log into the server in question Open a command prompt. Type dsdbutil and press Enter Type List Instances and press Enter . You will receive a list of the instance name, both the LDAP and SSL port numbers, the location of the database, and its status.