Storage & CleanupApril 12, 2025

Duplicate Files on Mac: How to Find and Remove Them Safely

How duplicate files accumulate, why name-based detection fails, and how progressive SHA-256 hashing finds true duplicates without false positives.

Duplicate files are one of the sneakiest ways your Mac loses storage space. They accumulate slowly and invisibly — a file downloaded twice, a photo copied to multiple folders, a document saved under different names.

How duplicates accumulate

The most common sources of duplicate files:

Multiple downloads — Downloading the same file twice creates "file.zip" and "file (1).zip"
Manual backups — Copying important folders to "backup" directories
Photo organization — Moving photos between folders while keeping copies "just in case"
File sharing — Receiving the same file through different channels (email, AirDrop, Slack)
Project copies — Duplicating project folders for different versions
Cloud sync conflicts — iCloud, Dropbox, or Google Drive creating conflict copies

Over months and years, these duplicates can consume gigabytes of space without you noticing.

Why name-based detection is unreliable

Some tools try to find duplicates by comparing file names. This approach has two major problems:

False positives: Two files with the same name might have completely different content. A "report.pdf" in your Documents folder and a "report.pdf" in Downloads could be entirely different documents.

False negatives: The same file can have different names. "IMG_1234.jpg" and "vacation-photo.jpg" could be the exact same photo, renamed when shared or moved.

The only reliable way to detect duplicates is by comparing the actual file content.

Progressive hashing: the smart approach

Computing a full hash (like SHA-256) of every file on your drive would be accurate but extremely slow. A progressive approach is much faster:

Step 1: Size grouping Files with unique sizes can't possibly be duplicates. If only one file on your drive is exactly 4,847,231 bytes, it has no duplicates. This step eliminates the vast majority of files immediately.

Step 2: Quick hash For files that share the same size, compute a quick hash using the first 4 KB and last 4 KB of the file. This catches most non-duplicates at minimal I/O cost. Two files of the same size but with different headers or endings are eliminated.

Step 3: Full SHA-256 Only for files that pass both previous checks, compute the full SHA-256 hash. At this point, you're comparing a much smaller set of files, so the full hash is practical.

This progressive approach can scan hundreds of thousands of files in minutes rather than hours.

The deletion dilemma

Finding duplicates is one thing. Deciding which copy to keep is another. You need at least one copy, and ideally you keep the "best" one:

Keep the newest? It might have the most recent metadata or be in the most organized location.
Keep the oldest? It's the original, and the location might be where you expect to find it.
Keep the shortest path? It's in the most accessible location.

Whatever you choose, the critical rule is: never auto-delete all copies. At least one must remain.

Safe duplicate cleanup

CleanMyMacOS uses the progressive hashing approach described above — size grouping, quick hash (first 4 KB + last 4 KB), then full SHA-256 — to find true duplicates accurately and efficiently. It scans up to 200K files, skips tiny files (under 4 KB), symlinks, and .app bundles.

For each group of duplicates, it offers smart suggestions: keep the newest, oldest, or shortest path. You can also manually select which copies to keep. The app never auto-selects all copies for deletion — at least one is always preserved.

CleanMyMacOS can help with this — download it free from the Mac App Store.