Unique is an open-source platform for customer identity resolution and data integration. The platform helps organizations create a unified view of their customers by identifying and matching customer records across different databases and tables using advanced fuzzy matching techniques.
Organizations often store customer data in different systems and formats:
- CRM systems
- Sales databases
- Support tickets
- Marketing platforms
- Legacy systems
This leads to:
- Duplicate customer records
- Inconsistent customer information
- Difficulty in creating a unified customer view
- Challenges in data analysis and customer insights
Unique provides a powerful, yet easy-to-use platform that:
- Connects to different data sources (starting with Amazon Redshift)
- Identifies potential matching records using intelligent algorithms
- Provides confidence scores for matches
- Allows for easy verification and export of results
-
Smart Column Detection:
- Automatically categorizes columns (identity, contact, name)
- Assigns weights based on reliability of identifiers
- Supports common patterns in customer data
-
Fuzzy Matching Engine:
- Uses weighted scoring system
- Handles variations in:
- Names (order, case, formatting)
- Emails (domains, formats)
- Phone numbers (formats, country codes)
- Configurable similarity thresholds
-
Scoring System:
- Identity fields (SSN, tax ID): 1.0 weight
- Contact information (email, phone): 0.8 weight
- Names: 0.6 weight
- Customizable weights and thresholds
- Python 3.8+
- Access to a Redshift database
- Required Python packages (see requirements.txt)
- Clone the repository:
git clone https://github.com/andrelsouza/unique.git
cd unique- Install dependencies:
pip install -r requirements.txt- Set up your environment variables -If you dont want to use the front end to pass the credentials:
REDSHIFT_HOST=your-host
REDSHIFT_DATABASE=your-database
REDSHIFT_USER=your-username
REDSHIFT_PASSWORD=your-password
REDSHIFT_PORT=5439- Run the application:
streamlit run app.pyWe welcome contributions! Here are some areas where you can help:
- Add support for new databases:
- PostgreSQL
- MySQL
- MongoDB
- Snowflake
- BigQuery
- Implement new matching techniques:
- Machine learning-based matching
- Phonetic matching algorithms
- Address standardization
- Name parsing and normalization
- Cultural name variations
- Batch processing for large datasets
- Parallel processing
- Indexing strategies
- Memory optimization
- Caching mechanisms
- Matching rule configuration UI
- Match review and validation interface
- Batch processing interface
- API endpoints
- Custom scoring rules
- Match visualization tools
- Fork the repository
- Create a feature branch
- Implement your changes
- Add tests for new functionality
- Submit a pull request
- Follow PEP 8 style guide
- Add docstrings and comments
- Include unit tests
- Update documentation
- Keep commits atomic and descriptive
Run the test suite:
python -m tests.test_matchingThis project is licensed under the MIT License - see the LICENSE file for details.
- Additional database connectors
- Better UI/UX
- API endpoints
- Match review interface
- Batch processing
- Performance optimizations
- Visualization tools
- Custom rules engine