✓ Verified 💻 Development ✓ Enhanced Data

Crm Data Cleaner

Deduplicate, normalize, and enrich CRM contacts and companies.

Rating
4 (428 reviews)
Downloads
698 downloads
Version
1.0.0

Overview

Deduplicate, normalize, and enrich CRM contacts and companies.

Complete Documentation

View Source →

CRM Data Cleaner — Dedup, Normalize & Enrich Contacts

Overview

Clean, accurate CRM data is the foundation of effective sales and marketing operations. Poor data quality costs businesses an average of $3.1 million annually through wasted time, missed opportunities, and ineffective campaigns. This skill provides comprehensive frameworks, tools, and automation strategies to maintain pristine contact and company data across all major CRM platforms.

This guide covers the three pillars of CRM data hygiene: Deduplication (removing duplicate records), Normalization (standardizing data formats), and Enrichment (filling missing information with reliable external sources).

Table of Contents

Understanding Data Quality Issues

Common CRM Data Problems

Duplicate Records (30-40% of databases)

  • Multiple entries for same person/company
  • Slight variations in name spelling
  • Different email addresses for same contact
  • Incomplete vs. complete records
Inconsistent Formatting
  • Phone numbers: (555) 123-4567 vs 555-123-4567 vs +1.555.123.4567
  • Company names: "IBM Corp" vs "International Business Machines Corporation"
  • Addresses: "St" vs "Street", "CA" vs "California"
  • Job titles: "VP Marketing" vs "Vice President of Marketing"
Missing Information
  • 60% of B2B contacts missing phone numbers
  • 40% missing company information
  • 35% missing job titles
  • 25% missing complete addresses
Outdated Information
  • Contact changes jobs (20% annually)
  • Email addresses become invalid (25% every 2 years)
  • Phone numbers change
  • Companies reorganize, merge, or rebrand

Impact on Business Operations

Sales Productivity Loss

  • 27% of sales time spent on data entry and management
  • Missed follow-ups due to duplicate records
  • Confusion over primary contact information
  • Difficulty identifying decision makers
Marketing Campaign Inefficiency
  • Increased email bounce rates
  • Multiple messages to same recipient
  • Poor segmentation due to incomplete data
  • Inaccurate reporting and attribution
Customer Experience Issues
  • Multiple sales reps contacting same prospect
  • Conflicting information across touchpoints
  • Poor personalization due to incomplete profiles
  • Frustration from repeated information requests

Data Quality Assessment Framework

Completeness Score

  • Required fields populated: Target 95%
  • Optional fields populated: Target 70%
  • Critical fields (email, company): Target 98%
Accuracy Score
  • Valid email format: Target 99%
  • Valid phone format: Target 95%
  • Verified addresses: Target 90%
Consistency Score
  • Standardized formatting: Target 95%
  • Consistent naming conventions: Target 90%
  • Aligned data across systems: Target 95%
Uniqueness Score
  • Duplicate contact rate: Target <2%
  • Duplicate company rate: Target <1%
  • Clean merge history: Target 100%

Deduplication Strategy

Types of Duplicates

Exact Duplicates

  • Identical records with same key fields
  • Usually caused by import errors
  • Easy to identify and merge automatically
Near Duplicates
  • Similar but not identical records
  • Name variations: "Bob Smith" vs "Robert Smith"
  • Email variations: personal vs business emails
  • Require fuzzy matching algorithms
Company Duplicates
  • Same company with different names
  • "Apple Inc" vs "Apple Computer" vs "Apple"
  • Subsidiary vs parent company confusion
  • Domain-based matching challenges
Household/Account Duplicates
  • Multiple contacts at same company
  • Family members at same address
  • Different roles but same organization

Duplicate Detection Methods

#### Primary Key Matching Email-Based Matching (Most Reliable)

text
Match Criteria:
- Exact email match = 100% duplicate probability
- Domain + similar names = 85% probability
- Multiple emails for same person = merge candidates

Phone-Based Matching (Secondary)

text
Match Criteria:
- Exact phone match + similar name = 90% probability
- Same phone, different names = investigate
- Multiple formats of same number = normalize first

Name + Company Matching (Fuzzy)

text
Match Criteria:
- Exact name + exact company = 95% probability
- Similar name + exact company = 80% probability
- Exact name + similar company = 70% probability

#### Advanced Matching Algorithms

Levenshtein Distance

  • Measures character differences between strings
  • Useful for typos and variations
  • Example: "Smith" vs "Smyth" = distance of 1
Soundex Matching
  • Phonetic matching algorithm
  • Groups similar-sounding names
  • Example: "Smith", "Smyth", "Smithe" = same soundex
Token Matching
  • Breaks names into components
  • Matches individual parts
  • Example: "John Michael Smith" matches "J.M. Smith"

Deduplication Workflow

#### Phase 1: Automated Detection

High-Confidence Matches (90%+ probability)

  • Exact email matches
  • Identical phone + similar names
  • Same LinkedIn profile URLs
  • Automatic flagging for review
Medium-Confidence Matches (60-89% probability)
  • Similar names + same company
  • Name variations + same domain
  • Fuzzy phone number matches
  • Queue for manual review
Low-Confidence Matches (40-59% probability)
  • Loose name similarities
  • Possible company matches
  • Require detailed investigation
#### Phase 2: Manual Review Process

Review Queue Prioritization

  • High-value accounts (enterprise clients)
  • Active opportunities
  • Recent activity (last 30 days)
  • Marketing qualified leads
  • Bulk import suspects
Review Criteria Checklist
  • [ ] Same person confirmation
  • [ ] Most complete record identification
  • [ ] Activity history preservation
  • [ ] Integration considerations
  • [ ] Sales team notifications needed
#### Phase 3: Merge Execution

Pre-Merge Validation

  • Backup critical data
  • Identify master record
  • Map fields to preserve
  • Note dependencies (campaigns, workflows)
Field Merge Rules
  • Primary email: Business > Personal > Most recent
  • Phone: Mobile > Direct line > Main number
  • Address: Most complete > Most recent
  • Job title: Most senior > Most recent
  • Company: Most complete > Most recent
Post-Merge Cleanup
  • Update related records
  • Refresh reports and lists
  • Notify affected team members
  • Document merge decisions

Platform-Specific Deduplication

#### HubSpot Deduplication

Native Duplicate Management

  • Automatic duplicate detection
  • Merge suggestions in contacts view
  • Bulk merge capabilities
  • Activity history preservation
Custom Duplicate Rules
text
Email + Company Domain matching
Name similarity + Phone matching
LinkedIn URL exact matching
Custom property combinations

API-Based Deduplication

python
# Example HubSpot duplicate detection
import requests

def find_hubspot_duplicates(api_key, batch_size=100):
    url = f"https://api.hubapi.com/contacts/v1/lists/all/contacts/all"
    params = {
        'hapikey': api_key,
        'count': batch_size,
        'property': ['email', 'firstname', 'lastname', 'company']
    }
    # Implementation details in scripts/

#### Salesforce Deduplication

Duplicate Rules Setup

  • Standard duplicate rules (Lead/Contact)
  • Custom matching rules
  • Automatic alerts vs blocking
  • Duplicate job monitoring
Third-Party Tools
  • Duplicate Check by CRM Science
  • Cloudingo duplicate management
  • DemandTools by Validity
  • RingLead data management
#### Pipedrive Deduplication

Manual Duplicate Detection

  • Smart Contact Data feature
  • Bulk operations for merging
  • Organization-level deduplication
  • Custom field mapping

Data Normalization

Phone Number Standardization

#### Global Phone Format Standards

North American Numbers

text
Input Variations:
- (555) 123-4567
- 555-123-4567
- 555.123.4567
- +1 555 123 4567
- 5551234567

Standardized Output:
- Display: +1 (555) 123-4567
- Storage: +15551234567
- Search: 15551234567

International Numbers

text
Input Variations:
- +44 20 7946 0958 (UK)
- 020 7946 0958 (UK local)
- +49 30 12345678 (Germany)
- 030-12345678 (Germany local)

Standardized Output:
- Display: +44 20 7946 0958
- Storage: +442079460958

#### Phone Validation Rules

Format Validation

  • Length checks by country
  • Area code validation
  • Mobile vs landline identification
  • Do Not Call registry checking
Quality Indicators
  • Valid: Properly formatted, verified number
  • Invalid: Wrong format, disconnected
  • Mobile: Cell phone identified
  • International: Non-domestic number
  • Suspicious: Pattern matching fake numbers

Email Address Normalization

#### Email Format Standardization

Case Normalization

Domain Standardization

text
Common Variations:
- gmail.com vs googlemail.com → gmail.com
- hotmail.com vs live.com vs outlook.com → outlook.com
- yahoo.com vs ymail.com → yahoo.com

Plus Addressing Removal

Dot Normalization (Gmail)

#### Email Validation Levels

Syntax Validation (Level 1)

  • RFC 5322 compliance
  • Valid character checking
  • Proper format structure
Domain Validation (Level 2)
  • MX record verification
  • Domain existence checking
  • Subdomain validation
Mailbox Validation (Level 3)
  • SMTP connection testing
  • Mailbox existence verification
  • Deliverability scoring

Name Standardization

#### Personal Name Formatting

Name Case Normalization

text
Input Variations:
- JOHN SMITH
- john smith
- John SMITH
- jOHN sMITH

Standardized Output:
- John Smith

Name Component Parsing

text
Input: "Dr. John Michael Smith Jr."
Parsed Components:
- Title: Dr.
- First Name: John
- Middle Name: Michael
- Last Name: Smith
- Suffix: Jr.

Cultural Name Considerations

  • Eastern vs Western name orders
  • Hyphenated names handling
  • Multiple surname traditions
  • Title and honorific preservation
#### Company Name Standardization

Legal Entity Normalization

text
Input Variations:
- Apple Inc.
- Apple Incorporated
- Apple, Inc
- Apple Computer Inc.

Standardized Output:
- Apple Inc.

Common Abbreviations

text
Standard Mappings:
- Corp → Corporation
- Co → Company
- Ltd → Limited
- LLC → Limited Liability Company
- LP → Limited Partnership

DBA (Doing Business As) Handling

text
Primary: Microsoft Corporation
DBA: Microsoft, MSFT
Subsidiaries: GitHub, LinkedIn

Address Normalization

#### Address Component Standardization

Street Address Formatting

text
Input Variations:
- 123 Main St.
- 123 Main Street
- 123 MAIN ST
- 123 main st

Standardized Output:
- 123 Main Street

State/Province Normalization

text
US States:
- California → CA
- New York → NY
- Texas → TX

Canadian Provinces:
- Ontario → ON
- British Columbia → BC
- Quebec → QC

Postal Code Formatting

text
US ZIP Codes:
- 12345 → 12345
- 12345-6789 → 12345-6789
- 123456789 → 12345-6789

Canadian Postal Codes:
- k1a0a6 → K1A 0A6
- K1A0A6 → K1A 0A6

#### International Address Standards

United Kingdom Addresses

text
Standard Format:
[Building Number] [Street Name]
[District/Area]
[Town/City]
[County] [Postcode]
[Country]

European Address Formats

  • German addresses: Street first, house number after
  • French addresses: Special character handling
  • Nordic countries: Unique postal systems

Job Title Normalization

#### Title Standardization Rules

Seniority Level Mapping

text
C-Level Titles:
- CEO, Chief Executive Officer
- CTO, Chief Technology Officer
- CMO, Chief Marketing Officer
- CFO, Chief Financial Officer

VP Level Titles:
- VP, Vice President
- SVP, Senior Vice President
- EVP, Executive Vice President

Director Level Titles:
- Director, Dir
- Senior Director, Sr. Director
- Executive Director, Exec Director

Functional Area Mapping

text
Marketing Titles:
- Marketing Manager → Marketing
- Brand Manager → Marketing
- Content Manager → Marketing
- Digital Marketing Specialist → Marketing

Sales Titles:
- Sales Representative → Sales
- Account Manager → Sales
- Business Development → Sales
- Sales Engineer → Sales

Industry-Specific Normalization

  • Healthcare: MD, RN, PharmD standardization
  • Legal: JD, Esq., Partner titles
  • Academia: PhD, Professor, Dean titles
  • Government: GS levels, military ranks

Data Enrichment

Enrichment Data Sources

#### Free Data Sources

Social Media Platforms

  • LinkedIn: Job titles, company info, connections
  • Twitter: Engagement data, interests
  • Facebook: Personal interests (B2C)
  • GitHub: Developer profiles, technologies
Public Databases
  • Government business registrations
  • SEC filings for public companies
  • Patent databases
  • Professional licensing boards
Web Scraping Sources
  • Company websites: Team pages, about sections
  • Industry directories
  • Conference speaker lists
  • Press release databases
#### Paid Enrichment Services

Comprehensive B2B Platforms

ZoomInfo (Premium)

  • Contact: $14,995/year for 10,000 credits
  • Coverage: 100M+ contacts, 14M+ companies
  • Data Types: Direct phone, email, technographics
  • Accuracy: 95% for contact info
  • API: RESTful with real-time lookups
Apollo (Mid-Range)
  • Contact: $49-149/month per user
  • Coverage: 275M+ contacts, 73M+ companies
  • Data Types: Email, phone, intent signals
  • Accuracy: 85-90% email accuracy
  • API: Generous rate limits, bulk operations
Clearbit (Developer-Focused)
  • Contact: $99-999/month based on volume
  • Coverage: 85M+ contacts, 12M+ companies
  • Data Types: Firmographics, technographics
  • Accuracy: 85% contact accuracy
  • API: Real-time enrichment, webhooks
Hunter (Email-Focused)
  • Contact: $49-399/month
  • Coverage: Email finder and verification
  • Data Types: Email addresses, domain search
  • Accuracy: 95% email verification
  • API: Bulk processing, domain search
#### Specialized Data Providers

Technographic Data

  • BuiltWith: Website technology stacks
  • Datanyze: Technology adoption data
  • 6sense: Intent and technology data
  • Bombora: Intent signal data
Financial Data
  • Dun & Bradstreet: Credit and financial data
  • Crunchbase: Funding and investor data
  • PitchBook: Private market data
  • FactSet: Public company financials
Industry-Specific Data
  • Healthcare: NPI database, medical licenses
  • Legal: Bar association directories
  • Real Estate: MLS data, property records
  • Education: Institution directories

Enrichment Workflow

#### Data Assessment Phase

Missing Data Analysis

sql
-- Example missing data analysis
SELECT 
    COUNT(*) as total_contacts,
    COUNT(phone) as has_phone,
    COUNT(company) as has_company,
    COUNT(job_title) as has_title,
    (COUNT(*) - COUNT(phone)) as missing_phone,
    (COUNT(*) - COUNT(company)) as missing_company
FROM contacts;

Enrichment Priority Matrix

  • High Value + High Confidence = Immediate enrichment
  • High Value + Low Confidence = Manual review
  • Low Value + High Confidence = Batch processing
  • Low Value + Low Confidence = Skip
#### Batch Enrichment Process

Data Preparation

  • Export contact list with unique identifiers
  • Identify enrichment keys (email, domain, name+company)
  • Remove duplicates to avoid duplicate charges
  • Validate existing data quality
Enrichment Execution
  • Email-based enrichment (highest accuracy)
  • Domain-based company enrichment
  • Name + Company fuzzy matching
  • Social profile matching
  • Phone number verification
Data Validation
  • Cross-reference multiple sources
  • Confidence scoring per data point
  • Flag conflicting information
  • Preserve data provenance
Integration Back to CRM
  • Map enriched fields to CRM properties
  • Update existing records without overwriting good data
  • Track enrichment timestamps
  • Log enrichment sources

Real-Time Enrichment

#### Form Submission Enrichment

javascript
// Example real-time enrichment on form submit
document.getElementById('leadForm').addEventListener('submit', async function(e) {
    const email = document.getElementById('email').value;
    const company = document.getElementById('company').value;
    
    // Enrich contact data
    const enrichedData = await enrichContact(email, company);
    
    // Update hidden form fields
    updateFormFields(enrichedData);
});

#### CRM Integration Triggers

  • New contact creation
  • Email address updates
  • Company field changes
  • Lead scoring threshold crossing
#### Progressive Profiling
  • Gradual data collection over multiple interactions
  • Smart form field suggestions
  • Prefilling forms with known data
  • A/B testing optimal field combinations

Data Quality Monitoring

#### Enrichment Accuracy Tracking

Verification Metrics

  • Email deliverability rates
  • Phone connection success rates
  • LinkedIn profile match accuracy
  • Company information consistency
Data Decay Monitoring
  • Email bounce rates over time
  • Phone number disconnect rates
  • Job title change frequency
  • Company merger/acquisition impact
Source Performance Comparison
  • Accuracy by data provider
  • Cost per successful enrichment
  • Update frequency
  • Coverage by industry/region

Platform-Specific Implementation

HubSpot Data Cleaning

#### Native HubSpot Tools

Data Quality Command Center

  • Duplicate detection and management
  • Property formatting rules
  • Workflow-based data validation
  • Automated data hygiene tasks
Property Settings for Data Quality
  • Field validation rules
  • Required field enforcement
  • Format standardization
  • Default value management
Workflow Automation
text
Trigger: Contact is created or updated
Condition: Email domain contains common typos
Action: Flag for manual review + normalize email

#### HubSpot Integrations

Third-Party Apps

  • Insycle: Advanced deduplication and data management
  • PieSync: Data synchronization across platforms
  • Zapier: Custom data cleaning automations
Custom Development
javascript
// HubSpot API example for bulk data cleaning
const hubspot = require('@hubspot/api-client');

async function cleanContactData(contacts) {
    const hubspotClient = new hubspot.Client({ apiKey: API_KEY });
    
    const cleanedContacts = contacts.map(contact => ({
        id: contact.id,
        properties: {
            phone: normalizePhone(contact.properties.phone),
            email: normalizeEmail(contact.properties.email),
            company: normalizeCompanyName(contact.properties.company)
        }
    }));
    
    return await hubspotClient.crm.contacts.batchApi.update({
        inputs: cleanedContacts
    });
}

Salesforce Data Cleaning

#### Native Salesforce Features

Duplicate Management

  • Standard duplicate rules
  • Custom matching rules
  • Duplicate alerts and blocking
  • Merge wizard functionality
Data Validation Rules
apex
// Example validation rule for phone format
REGEX(Phone, "^\\+?1?[2-9]\\d{2}[2-9]\\d{2}\\d{4}$")

Flow-Based Automation

  • Screen flows for data entry validation
  • Record-triggered flows for cleaning
  • Scheduled flows for batch processing
#### Salesforce Apps and Tools

Paid Solutions

  • Cloudingo: Comprehensive data management
  • DemandTools: Advanced deduplication
  • RingLead: Data cleaning and enrichment
Custom Apex Solutions
apex
// Custom Apex for email normalization
public class EmailNormalizer {
    public static String normalizeEmail(String email) {
        if (String.isBlank(email)) return email;
        return email.toLowerCase().trim();
    }
}

Pipedrive Data Cleaning

#### Native Pipedrive Features

Smart Contact Data

  • Automatic duplicate detection
  • Merge suggestions
  • Data enrichment from public sources
Custom Fields and Validation
  • Required field settings
  • Field type restrictions
  • Custom property management
Automation Features
  • Workflow automation for data tasks
  • Email sync and normalization
  • Activity-based data updates

Automation and Monitoring

Automated Data Quality Workflows

#### Continuous Data Validation

Real-Time Validation

  • Form submission validation
  • Email syntax checking
  • Phone format verification
  • Required field enforcement
Scheduled Batch Processing
  • Daily duplicate detection runs
  • Weekly enrichment batches
  • Monthly data quality reports
  • Quarterly complete audits
Event-Triggered Cleaning
  • New record creation
  • Data import completion
  • Email bounce notifications
  • Contact inactivity alerts
#### Quality Score Automation

Contact Quality Scoring

python
def calculate_contact_quality_score(contact):
    score = 0
    
    # Completeness (40 points)
    if contact.email: score += 15
    if contact.phone: score += 10
    if contact.company: score += 10
    if contact.job_title: score += 5
    
    # Accuracy (40 points)
    if is_valid_email(contact.email): score += 20
    if is_valid_phone(contact.phone): score += 20
    
    # Freshness (20 points)
    days_since_update = (datetime.now() - contact.last_modified).days
    if days_since_update < 30: score += 20
    elif days_since_update < 90: score += 10
    
    return min(score, 100)

Company Quality Scoring

  • Industry classification accuracy
  • Company size verification
  • Website and domain validation
  • Social media presence verification

Monitoring and Alerting

#### Key Performance Indicators

Data Quality Metrics

  • Overall completeness percentage
  • Duplicate contact percentage
  • Email deliverability rate
  • Phone number accuracy rate
Trend Analysis
  • Data quality improvement over time
  • Source performance comparison
  • Seasonal data decay patterns
  • Team adoption metrics
Alert Thresholds
  • Duplicate detection: >5% increase
  • Email bounces: >10% for campaign
  • Missing data: >20% for key fields
  • Enrichment failures: >30% error rate
#### Reporting Dashboard

Executive Summary Dashboard

  • Total records and quality score
  • Data completeness by key fields
  • Enrichment ROI analysis
  • Team productivity impact
Operational Dashboard
  • Daily processing statistics
  • Error logs and resolution status
  • Data source performance metrics
  • Automation workflow status
Detailed Analysis Reports
  • Field-by-field completion rates
  • Source-by-source quality analysis
  • Historical trend analysis
  • Predictive quality forecasting

Integration Architecture

#### API-Based Data Flows

Inbound Data Processing

text
External Source → Validation → Normalization → Deduplication → Enrichment → CRM

Outbound Data Synchronization

text
CRM → Clean Data → External Systems (Email, Analytics, etc.)

Real-Time vs Batch Processing

  • Real-time: Form submissions, high-value contacts
  • Batch: Bulk imports, scheduled maintenance
  • Hybrid: Priority-based processing queues

Maintenance and Governance

Data Governance Framework

#### Data Stewardship Roles

Data Owner (Executive Level)

  • Define data quality standards
  • Approve data policies
  • Budget allocation for tools
  • Strategic oversight
Data Steward (Operational Level)
  • Daily quality monitoring
  • Process execution
  • Issue escalation
  • Training coordination
Data Users (Sales/Marketing Teams)
  • Data entry compliance
  • Quality feedback
  • Process adherence
  • Issue reporting
#### Data Quality Policies

Data Entry Standards

text
Contact Creation Requirements:
- Email address (validated)
- Company name (standardized)
- Job title (normalized)
- Phone number (formatted)
- Source attribution

Update Procedures

  • Regular data refresh cycles
  • Change approval workflows
  • Bulk update protocols
  • Emergency correction procedures
Retention and Archival
  • Active record criteria
  • Archival trigger conditions
  • Data deletion policies
  • Compliance requirements

Change Management

#### Team Training Programs

Basic Data Hygiene Training

  • Importance of data quality
  • Common data entry mistakes
  • Platform-specific best practices
  • Quality monitoring tools
Advanced Training Topics
  • Duplicate detection techniques
  • Enrichment strategy optimization
  • Automation workflow design
  • Reporting and analysis
Ongoing Education
  • Monthly quality scorecards
  • Best practice sharing sessions
  • Platform update training
  • Industry trend analysis
#### Process Documentation

Standard Operating Procedures

  • Daily maintenance tasks
  • Weekly quality reviews
  • Monthly deep cleaning
  • Quarterly audits
Troubleshooting Guides
  • Common error resolution
  • Escalation procedures
  • Recovery protocols
  • Emergency contacts

Compliance and Security

#### Data Privacy Compliance

GDPR Considerations

  • Consent management
  • Data processing justification
  • Right to be forgotten
  • Data portability requirements
CCPA Requirements
  • Consumer rights notifications
  • Opt-out mechanisms
  • Data sale disclosures
  • Processing transparency
#### Data Security

Access Controls

  • Role-based permissions
  • Audit logging
  • Change tracking
  • Approval workflows
Data Protection
  • Encryption standards
  • Backup procedures
  • Recovery protocols
  • Breach notification

Advanced Techniques

Machine Learning Applications

#### Predictive Data Quality

Quality Score Prediction

  • Predict record quality degradation
  • Identify enrichment opportunities
  • Forecast maintenance needs
  • Optimize cleaning schedules
Duplicate Detection ML
  • Neural network matching
  • Similarity scoring algorithms
  • Clustering for bulk identification
  • Continuous learning from feedback
#### Natural Language Processing

Company Name Matching

  • Fuzzy string matching
  • Alias recognition
  • Subsidiary relationship mapping
  • M&A event detection
Job Title Standardization
  • Role classification
  • Seniority level prediction
  • Function area mapping
  • Industry-specific normalization

Advanced Automation

#### Intelligent Data Routing

Smart Assignment Rules

python
def assign_data_cleaning_task(record, quality_issues):
    if record.value_tier == 'enterprise':
        return 'manual_review_queue'
    elif len(quality_issues) > 3:
        return 'bulk_processing_queue'
    elif 'duplicate' in quality_issues:
        return 'dedup_automation_queue'
    else:
        return 'standard_cleaning_queue'

Priority-Based Processing

  • Value-based prioritization
  • Urgency classification
  • Resource allocation optimization
  • SLA management
#### Custom Data Pipelines

Real-Time Processing

  • Stream processing for immediate validation
  • Event-driven cleaning triggers
  • Microservice architecture
  • API-first design
Batch Processing
  • Distributed processing systems
  • Scheduled job management
  • Error handling and retry logic
  • Progress monitoring

Integration Ecosystem

#### Multi-Platform Synchronization

Bidirectional Sync

  • CRM ↔ Marketing Automation
  • CRM ↔ Sales Engagement
  • CRM ↔ Customer Support
  • CRM ↔ Analytics Platforms
Conflict Resolution
  • Master data management
  • Field-level precedence rules
  • Timestamp-based updates
  • Manual override capabilities
#### API-First Architecture

RESTful API Design

python
# Example API endpoint for data cleaning
@app.route('/api/v1/contacts/clean', methods=['POST'])
def clean_contact_data():
    data = request.get_json()
    
    # Validate input
    if not validate_input(data):
        return {'error': 'Invalid input'}, 400
    
    # Process cleaning
    cleaned_data = {
        'email': normalize_email(data.get('email')),
        'phone': normalize_phone(data.get('phone')),
        'company': normalize_company(data.get('company'))
    }
    
    return {'cleaned_data': cleaned_data}, 200

This comprehensive CRM data cleaning skill provides the foundation for maintaining high-quality customer and prospect data across all major platforms. Implementation of these strategies will dramatically improve sales productivity, marketing effectiveness, and overall customer experience while reducing operational overhead and compliance risk.

Installation

Terminal bash

openclaw install crm-data-cleaner
    
Copied!

💻Code Examples

**Email-Based Matching** (Most Reliable)

email-based-matching-most-reliable.txt
Match Criteria:
- Exact email match = 100% duplicate probability
- Domain + similar names = 85% probability
- Multiple emails for same person = merge candidates

**Phone-Based Matching** (Secondary)

phone-based-matching-secondary.txt
Match Criteria:
- Exact phone match + similar name = 90% probability
- Same phone, different names = investigate
- Multiple formats of same number = normalize first

**Name + Company Matching** (Fuzzy)

name--company-matching-fuzzy.txt
Match Criteria:
- Exact name + exact company = 95% probability
- Similar name + exact company = 80% probability
- Exact name + similar company = 70% probability

**Custom Duplicate Rules**

custom-duplicate-rules.txt
Email + Company Domain matching
Name similarity + Phone matching
LinkedIn URL exact matching
Custom property combinations

**API-Based Deduplication**

api-based-deduplication.py
# Example HubSpot duplicate detection
import requests

def find_hubspot_duplicates(api_key, batch_size=100):
    url = f"https://api.hubapi.com/contacts/v1/lists/all/contacts/all"
    params = {
        'hapikey': api_key,
        'count': batch_size,
        'property': ['email', 'firstname', 'lastname', 'company']
    }
    # Implementation details in scripts/

**North American Numbers**

north-american-numbers.txt
Input Variations:
- (555) 123-4567
- 555-123-4567
- 555.123.4567
- +1 555 123 4567
- 5551234567

Standardized Output:
- Display: +1 (555) 123-4567
- Storage: +15551234567
- Search: 15551234567

**International Numbers**

international-numbers.txt
Input Variations:
- +44 20 7946 0958 (UK)
- 020 7946 0958 (UK local)
- +49 30 12345678 (Germany)
- 030-12345678 (Germany local)

Standardized Output:
- Display: +44 20 7946 0958
- Storage: +442079460958

**Case Normalization**

case-normalization.txt
Input: [email protected]
Output: [email protected]

**Domain Standardization**

domain-standardization.txt
Common Variations:
- gmail.com vs googlemail.com → gmail.com
- hotmail.com vs live.com vs outlook.com → outlook.com
- yahoo.com vs ymail.com → yahoo.com

**Plus Addressing Removal**

plus-addressing-removal.txt
Input: [email protected]
Output: [email protected]

Tags

#git_and-github #data

Quick Info

Category Development
Model Claude 3.5
Complexity One-Click
Author luigi08001
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install crm-data-cleaner