Unique - Customer Identity Resolution

Overview

Unique is an open-source platform for customer identity resolution and data integration. The platform helps organizations create a unified view of their customers by identifying and matching customer records across different databases and tables using advanced fuzzy matching techniques.

The Problem

Organizations often store customer data in different systems and formats:

CRM systems
Sales databases
Support tickets
Marketing platforms
Legacy systems

This leads to:

Duplicate customer records
Inconsistent customer information
Difficulty in creating a unified customer view
Challenges in data analysis and customer insights

Our Solution

Unique provides a powerful, yet easy-to-use platform that:

Connects to different data sources (starting with Amazon Redshift)
Identifies potential matching records using intelligent algorithms
Provides confidence scores for matches
Allows for easy verification and export of results

How It Works

Smart Column Detection:
- Automatically categorizes columns (identity, contact, name)
- Assigns weights based on reliability of identifiers
- Supports common patterns in customer data
Fuzzy Matching Engine:
- Uses weighted scoring system
- Handles variations in:
  - Names (order, case, formatting)
  - Emails (domains, formats)
  - Phone numbers (formats, country codes)
- Configurable similarity thresholds
Scoring System:
- Identity fields (SSN, tax ID): 1.0 weight
- Contact information (email, phone): 0.8 weight
- Names: 0.6 weight
- Customizable weights and thresholds

Getting Started

Prerequisites

Python 3.8+
Access to a Redshift database
Required Python packages (see requirements.txt)

Installation

Clone the repository:

git clone https://github.com/andrelsouza/unique.git
cd unique

Install dependencies:

pip install -r requirements.txt

Set up your environment variables -If you dont want to use the front end to pass the credentials:

REDSHIFT_HOST=your-host
REDSHIFT_DATABASE=your-database
REDSHIFT_USER=your-username
REDSHIFT_PASSWORD=your-password
REDSHIFT_PORT=5439

Run the application:

streamlit run app.py

Contributing

We welcome contributions! Here are some areas where you can help:

1. New Data Source Connectors

Add support for new databases:
- PostgreSQL
- MySQL
- MongoDB
- Snowflake
- BigQuery

2. Matching Algorithm Improvements

Implement new matching techniques:
- Machine learning-based matching
- Phonetic matching algorithms
- Address standardization
- Name parsing and normalization
- Cultural name variations

3. Performance Optimizations

Batch processing for large datasets
Parallel processing
Indexing strategies
Memory optimization
Caching mechanisms

4. Feature Additions

Matching rule configuration UI
Match review and validation interface
Batch processing interface
API endpoints
Custom scoring rules
Match visualization tools

How to Contribute

Fork the repository
Create a feature branch
Implement your changes
Add tests for new functionality
Submit a pull request

Development Guidelines

Follow PEP 8 style guide
Add docstrings and comments
Include unit tests
Update documentation
Keep commits atomic and descriptive

Testing

Run the test suite:

python -m tests.test_matching

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
test_matching.py		test_matching.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unique - Customer Identity Resolution

Overview

The Problem

Our Solution

How It Works

Getting Started

Prerequisites

Installation

Contributing

1. New Data Source Connectors

2. Matching Algorithm Improvements

3. Performance Optimizations

4. Feature Additions

How to Contribute

Development Guidelines

Testing

License

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

andrelsouza/unique

Folders and files

Latest commit

History

Repository files navigation

Unique - Customer Identity Resolution

Overview

The Problem

Our Solution

How It Works

Getting Started

Prerequisites

Installation

Contributing

1. New Data Source Connectors

2. Matching Algorithm Improvements

3. Performance Optimizations

4. Feature Additions

How to Contribute

Development Guidelines

Testing

License

Roadmap

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages