Skip to content

global: MySQL 5.6 by default has collation "utf8_general_ci" (case-insensitive) #114

@slint

Description

@slint

The default collation in MySQL 5.6 (though this might apply for newer versions as well), is utf8_general_ci which is case-insensitive. This means that the following can happen:

class Identifier(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    value = db.Column(db.String(255), unique=True)
...

id1 = Identifier(value='ABC')
db.session.add(id1)
db.session.commit()
assert id1.id == 1

id2 = Identifier(value='abc')
db.session.add(id2)
db.session.commit()
# ...DB error for unique constraint violation...

fetched_id = Identifier.query.filter_by(value='aBc').one()
assert fetched_id == id1
assert fetched_id.id == 1

This is probably causing problems with many assumptions that we as developers make throughout the Invenio codebase (especially on tables in e.g. invenio-pidstore).

A more correct collation + charset would come from creating the database with:

CREATE DATABASE invenio CHARACTER SET utf8 COLLATE utf8_bin;

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions