Skip to content

Not all matches returned using regex_left_join #81

Open
@aminards

Description

@aminards

I have two data frames. I need to merge them based on a partial string match.
Data frame A has Gene.Name column with EHBP1.
Data frame B has Gene.Symbols column with

CLEC7A,EHBP1
CLEC7A,EHBP1
MBL2,CLEC7A,EHBP1
MBL2,CLEC7A,EHBP1,HTR2A
MBL2,CLEC7A,EHBP1,HTR2A
EHBP1,HTR2A
EHBP1,HTR2A
MBL2,CLEC7A,EHBP1,HTR2A
EHBP1
EHBP1
EHBP1
EHBP1
EHBP1
EHBP1
TBX15,MBL2,SNORD54,CLEC7A,RREB1,MRPL51,GGTLC2,MIR30A,SETMAR,GFOD1,STK33,KHDRBS2,EHBP1,RCL1,HTR2A

When I run the following command:
mydata <- regex_left_join(A, B, by = c(Gene.Name = "Gene.Symbols"))

Only some of the matches are returned. I get only these matches:

EHBP1
EHBP1
EHBP1
EHBP1
EHBP1
EHBP1

Why am I not getting these remaining matches?

MBL2,CLEC7A,EHBP1,HTR2A
CLEC7A,EHBP1
CLEC7A,EHBP1
MBL2,CLEC7A,EHBP1
MBL2,CLEC7A,EHBP1,HTR2A
EHBP1,HTR2A
EHBP1,HTR2A
MBL2,CLEC7A,EHBP1,HTR2A
TBX15,MBL2,SNORD54,CLEC7A,RREB1,MRPL51,GGTLC2,MIR30A,SETMAR,GFOD1,STK33,KHDRBS2,EHBP1,RCL1,HTR2A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions