Tired of duplicate contacts clogging up your CRM? You're not alone. If you’re here, you probably want to clean things up—without shelling out for yet another SaaS tool, or spending your weekend hand-merging records. This is for folks who want a clear, hands-on guide to deduplicating CRM records using N8n, the open-source automation tool that doesn’t lock you in.
Below, you’ll find a no-nonsense, step-by-step process for building a deduplication workflow in N8n. I’ll point out what’s worth doing, what you can skip, and some honest pitfalls to watch out for. Let’s get your CRM tidy.
Step 1: Understand What You’re Dealing With
Before you even open N8n, figure out what “duplicate” means for your CRM. Not all duplicates are obvious, and most CRMs have their own quirks:
- What fields do you want to match on? Email is usually best, but sometimes names, phone numbers, or company fields are handy if emails are missing or inconsistent.
- How do you want to handle partial matches? “Jon Smith” vs. “John Smith” can be tricky. Decide if you want to go strict (exact matches only) or fuzzy (allow typos/similarities).
- What do you do with duplicates? Merge them, tag them, or just flag for review? Know your end goal.
Pro tip: Start simple. Begin with exact email matches. You can get fancier later if you need.
Step 2: Prep Your CRM and N8n
You’ll need:
- Access to your CRM’s API (or at least the ability to export/import CSVs)
- An N8n instance you can run workflows on (self-hosted or cloud—it doesn’t matter)
- API credentials or an export of your CRM data
Don’t overthink the stack. If your CRM isn’t supported out of the box, the HTTP Request node and some basic JSON wrangling will do the job.
Step 3: Pull CRM Data Into N8n
-
Add a Trigger
Start your N8n workflow with a trigger node. For testing, use the “Manual” trigger so you can run things on demand. -
Pull the Records
- If your CRM has a native integration (like HubSpot, Salesforce, etc.), use the appropriate node to fetch contacts/leads.
- If not, use the HTTP Request node with your CRM’s API to pull contacts.
- If all else fails, use the “Read Binary File” and “CSV” nodes to import a CSV export.
Fields to pull: At minimum, grab unique IDs, emails, and names. Anything extra (phone, company, etc.) can help later if you want to merge data.
Heads up: Some CRMs limit how many records you can pull per request (pagination). Test with a small batch before scaling up.
Step 4: Find the Duplicates
Here’s where most guides gloss over the details. You’ve got your list of records—now what?
4.1: Simple Deduplication (Exact Match)
- Add an “Item Lists” node
- Use the “Aggregate” operation.
- Group by the field you care about (e.g., “email”).
-
This will give you a list of all records that share the same email.
-
Filter for Duplicates
- Use a “Filter” node to only keep groups with more than one record (i.e., true duplicates).
4.2: Fuzzy Matching (Optional, Tricky)
If you want to get clever (e.g., catch “bob.smith@gmail.com” vs. “bobsmith@gmail.com”), you can use the “Fuzzy Matching” npm package via the “Code” node, or similar string comparison logic.
But honestly: Fuzzy matching gets messy, fast. You risk merging people who aren’t actually the same. Stick with exact matches unless you’re ready to manually review results.
Step 5: Decide Which Record to Keep
For each group of duplicates, you need to pick a “primary” record. Some common strategies:
- Most recently updated
- Has the most complete data
- Arbitrary (e.g., the first in the list)
How to do it in N8n:
- Add a “Code” node after your Filter node.
- Write a small bit of JavaScript to pick your winner per group, and flag the others as duplicates.
js // Example: Keep the record with the most non-empty fields return items.map(group => { const sorted = group.json.records.sort((a, b) => { // Count filled fields const filledA = Object.values(a).filter(Boolean).length; const filledB = Object.values(b).filter(Boolean).length; return filledB - filledA; }); return { json: { keep: sorted[0], duplicates: sorted.slice(1), } }; });
Don’t worry about perfection. The key is consistency—make your rule, stick to it, and move on.
Step 6: Merge or Remove Duplicates in Your CRM
Now, you’ve identified which records to keep and which are duplicates. Time to clean up.
Option 1: Automatic Merging (If Your CRM Supports It)
- Use the CRM’s API (via HTTP Request or native node) to update the “primary” record with any missing data from duplicates.
- Delete or tag the duplicates.
Option 2: Mark for Manual Review
- If you’re nervous about losing data, add a “Tag” or custom field (e.g., “Potential Duplicate”) to the duplicates so you can review in the CRM before deleting.
Option 3: CSV Export
- If your CRM is stubborn, export the list of duplicates for manual cleanup. Sometimes, that’s the fastest way.
Warning: Back up your CRM before mass deletions. APIs are not always forgiving—one bad node and you’ve zapped real customers. Test with a small sample first.
Step 7: Automate (But Start Slow)
Once you’ve verified your workflow works (and you haven’t accidentally deleted your boss’s contacts), schedule it to run regularly. Most CRMs collect duplicates over time, so a weekly or monthly dedupe job is smart.
- Swap your “Manual” trigger for a “Cron” node (e.g., every Sunday night).
- Send yourself a summary email or Slack message with what was merged or flagged.
Iterate: Don’t try to handle every edge case from day one. Stick to exact matches and add complexity as your confidence (and patience) grows.
What Works (and What Doesn’t)
What works:
- N8n is flexible. You’re not beholden to expensive, half-baked deduplication features in your CRM.
- Exact-match deduplication (on emails) is fast, safe, and usually covers 80% of the problem.
- Tagging duplicates for later review keeps you in control.
What to ignore:
- Overly complex fuzzy matching — unless you have time to QA every result, it just isn’t worth the risk.
- Trying to build the “perfect” deduplication solution. Good enough is good enough.
What to watch out for:
- API rate limits. If your CRM throttles requests, batch your updates.
- Data loss. Always backup before deleting or merging.
- Unexpected data formats (weird CSVs, nested JSON, etc.).
Final Thoughts
Deduplicating CRM records in N8n isn’t rocket science, but it does take a little planning and a few test runs. Keep it simple at first: focus on exact matches, automate carefully, and always keep a backup handy. You can get fancier later, but most teams just need a regular dose of “less clutter, more sanity.”