Title | Venue | Year | Paper | Slide | Video | Github |
---|---|---|---|---|---|---|
Cross-Inlining Binary Function Similarity Detection | ICSE | 2024 | Link | link | ||
Improving ML-based Binary Function Similarity Detection by Assessing and Deprioritizing Control Flow Graph Features | Usenix | 2024 | link | |||
BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching | ICSE | 2024 | link | |||
Code is not Natural Language: Unlock the Power of Semantics-Oriented Graph Representation for Binary Code Similarity Detection | Usenix | 2024 | link | link | ||
CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision | ISSTA | 2024 | link | link | ||
CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection | ISSTA | 2024 | link | link | ||
FASER: Binary Code Similarity Search through the use of Intermediate Representations | CAMLIS | 2023 | link | link | link | |
VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity | 2023 | link | ||||
kTrans: Knowledge-Aware Transformer for Binary Code Embedding | 2023 | link | link | |||
Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis | ISSTA | 2023 | link | link | ||
Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge | TOSEM | 2023 | link | link | ||
sem2vec: Semantics-aware Assembly Tracelet Embedding | TOSEM | 2023 | link | link | ||
1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis | TOSEM | 2023 | link | |||
Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures | AsiaCCS | 2023 | Link | |||
VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search | NDSS | 2023 | link | link | ||
A Game-Based Framework to Compare Program Classifiers and Evaders | CGO | 2023 | link | link | link | link |
BBDetector: A Precise and Scalable Third-Party Library Detection in Binary Executables with Fine-Grained Function-Level Features | MDPI | 2023 | link | |||
A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features | CSUR | 2022 | link | |||
Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning | ACSAC | 2022 | link | link | link | |
Improving cross-platform binary analysis using representation learning via graph alignment | ISSTA | 2022 | link | link | link | |
jTrans: Jump-Aware Transformer for Binary Code Similarity | ISSTA | 2022 | link | link | link | |
COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks | DIMVA | 2022 | link | |||
A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware | ISSTA | 2022 | link | link | link | |
How Machine Learning Is Solving the Binary Function Similarity Problem | Usenix | 2022 | link | link | link | |
Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking | TSE | 2022 | link | link | ||
Program Representations for Predictive Compilation: State of Affairs in the Early 20's | COLA | 2022 | link | link | link | |
Improving binary diffing speed and accuracy using community detection and locality-sensitive hashing: an empirical study | JCVHT | 2022 | link | |||
PalmTree: Learning an Assembly Language Model for Instruction Embedding | CCS | 2021 | link | link | link | |
Binary code similarity detection | ASE | 2021 | link | |||
Binary diffing as a network alignment problem via belief propagation | ASE | 2021 | link | |||
Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection | IEEE DSN 2021 | 2021 | link | link | ||
BinDeep: A deep learning approach to binary code similarity detection | ESWA | 2021 | link | |||
EnBinDiff: Identifying Data-Only Patches for Binaries | TDSC | 2021 | link | |||
BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences | TSE | 2021 | link | link | ||
Codee: A Tensor Embedding Scheme for Binary Code Search | TSE | 2021 | link | link | ||
Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned | TSE(revision) | 2021 | link | link | ||
How could Neural Networks understand Programs? | ICML 2021 | 2021 | link | link | ||
Multi-threshold token-based code clone detection | SANER 2021 | 2021 | link | |||
FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Embeddings | IEEE Euro S&P 2021 | 2021 | link | link | link | |
TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity | 2020 | link | link | |||
Similarity of Binaries Across Optimization Levels and Obfuscation | ESORICS 2020 | 2020 | link | link | ||
Open-source tools and benchmarks for code-clone detection: past, present, and future trends | 2020 | link | ||||
Semantically Find Similar Binary Codes with Mixed Key Instruction Sequence | 2020 | |||||
LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code | 2020 | link | ||||
Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree | SANER | 2020 | link | |||
What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning | 2020 | link | ||||
Clone Detection on Large Scala Codebases | 2020 | link | ||||
CloneCompass: Visualizations for Code Clone Analysis | 2020 | link | ||||
DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing | NDSS | 2020 | link | link | link | |
VGraph: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets | EuroS&P | 2020 | link | |||
Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection | AAAI | 2020 | link | |||
Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture | NDSS | 2020 | link | link | ||
Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis | NDSS Workshop on Binary Analysis Research (BAR) | 2019 | link | link | ||
Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization | IEEE S&P | 2019 | link | link | link | |
Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things | MDPI | 2019 | link | |||
A Survey of Binary Code Similarity | CSUR | 2019 | link | |||
代码克隆检测研究进展 | 软件学报 | 2019 | link | |||
A Systematic Review on Code Clone Detection | 2019 | link | ||||
A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis | NDSS | 2019 | link | link | ||
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs | NDSS | 2019 | link | link | link | model |
SAFE: Self-Attentive Function Embeddings for Binary Similarity | 2019 | link | link | link | ||
Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection | SANER | 2019 | link | |||
基于深度学习的跨平台二进制代码关联分析 | 2019 | link | ||||
CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph | 2019 | link | ||||
Function matching between binary executables: efficient algorithms and features | JCVHT | 2019 | link | |||
BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis | ICSME | 2018 | link | |||
αDiff: Cross-Version Binary Code Similarity Detection with DNN | ASE | 2018 | link | dataset | ||
Binary Similarity Detection Using Machine Learning | PLDI | 2018 | link | |||
CCAligner: A Token Based Large-Gap Clone Detector | ICSE | 2018 | link | |||
Oreo: Detection of Clones in the Twilight Zone | FSE | 2018 | link | |||
VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-platform Binary | ASE | 2018 | link | link | ||
VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation | 2018 | link | ||||
FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware | 2018 | link | ||||
BINARM: Scalable and Efficient Detection of Vulnerabilities in Firmware Images of Intelligent Electronic Devices | 2018 | link | ||||
A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries | 2018 | link | ||||
Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis | 2018 | link | link | |||
BCD: Decomposing Binary Code Into Components Using Graph-Based Clustering | ASIA CCS | 2018 | link | |||
A Deep Learning Approach to Program Similarity | MASES | 2018 | link | |||
Recurrent Neural Network for Code Clone Detection | SEIM | 2018 | link | |||
The Adverse Effects of Code Duplication in Machine Learning Models of Code | 2018 | link | link | |||
Benchmarks for software clone detection: A ten-year retrospective | SANER | 2018 | link | |||
Binary Code Clone Detection across Architectures and Compiling Configurations | ICPC | 2017 | link | |||
Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection | ACM CCS | 2017 | link | link | ||
BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection | ASIA CCS | 2017 | link | |||
BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape | DIMVA | 2017 | link | |||
Compiler-agnostic function detection in binaries | IEEE EuroS&P | 2017 | link | link | ||
BinSign: Fingerprinting binary functions to support automated analysis of code executables | 2017 | link | ||||
Similarity of binaries through re-optimization | PLDI | 2017 | link | link | ||
Transferring code-clone detection and analysis to practice | ICSE-SEIP | 2017 | link | |||
Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping | IEEE S&P | 2017 | link | |||
Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code | IJCAI | 2017 | link | |||
Extracting Conditional Formulas for Cross-Platform Bug Search | ASIA CCS | 2017 | link | |||
SPAIN: Security Patch Analysis for Binaries Towards Understanding the Pain and Pills | ICSE | 2017 | link | |||
CCLearner: A Deep Learning-Based Clone Detection Approach | 2017 | link | link | |||
BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking | USENIX | 2017 | link | link | link | |
In-memory Fuzzing for Binary Code Similarity Analysis | ASE | 2017 | link | |||
DéjàVu: a map of code duplicates on GitHub | OOPSLA | 2017 | link | |||
Some from Here, Some from There: Cross-project Code Reuse in GitHub | MSR | 2017 | link | |||
CVSSA: Cross-Architecture Vulnerability Search in Firmware Based on Support Vector Machine and Attributed Control Flow Graph | 2017 | link | ||||
Identifying Functionally Similar Code in Complex Codebases | ICPC | 2016 | link | link | ||
Scalable graph-based bug search for firmware images (Genius) | ASM CCS | 2016 | link | link | link | |
Cross-Architecture Binary Semantics Understanding via Similar Code Comparison | IEEE SANER | 2016 | link | |||
discovRE: Efficient cross-architecture identification of bugs in binary code | NDSS | 2016 | link | |||
BinGo: Cross-architecture cross-OS Binary Search | FSE | 2016 | link | |||
Kam1n0: Mapreduce-based assembly clone search for reverse engineering | KDD | 2016 | link | link | ||
Statistical similarity of binaries | PLDI | 2016 | link | link | link | |
Deep learning code fragments for code clone detection | ASE | 2016 | link | |||
A Survey of Software Clone Detection Techniques | 2016 | link | ||||
SourcererCC: Scaling Code Clone Detection to Big Code | ICSE | 2016 | link | |||
Binary executable file similarity calculation using function matching | 2016 | link | ||||
Matching Similar Functions in Different Versions of a Malware | 2016 | link | ||||
BinDNN: Resilient Function Matching Using Deep Learning | 2016 | link | ||||
VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis | ACSAC | 2016 | link | link | ||
BigCloneEval: A Clone Detection Tool Evaluation Framework with BigCloneBench | 2016 | link | link | |||
Cross-architecture bug search in binary executables | IEEE S&P | 2015 | link | |||
Library functions identification in binary code by using graph isomorphism testings | 2015 | link | ||||
Evaluating clone detection tools with BigCloneBench | 2015 | link | link | |||
Memoized semantics-based binary diffing with application to malware lineage inference | 2015 | link | ||||
Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code | 2015 | link | link | |||
BYTEWEIGHT: Learning to Recognize Functions in Binary Code | USENIX | 2014 | link | link | link | |
Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection | FSE | 2014 | link | |||
Binclone: Detecting code clones in malware | SERE | 2014 | link | link | ||
Detecting fine-grained similarity in binaries | 2014 | link | ||||
Leveraging semantic signatures for bug search in binary programs | ACSAC | 2014 | link | |||
How Accurate Is Coarse-grained Clone Detection?: Comparision with Fine-grained Detectors | 2014 | link | ||||
Tracelet-based code search in executables | PLDI | 2014 | link | |||
Control Flow-Based Malware Variant Detection | 2014 | link | ||||
Hashing for Similarity Search: A Survey | 2014 | link | ||||
Achieving accuracy and scalability simultaneously in detecting application clones on android markets | ICSE | 2014 | link | |||
Identifying Shared Software Components to Support Malware Forensics | 2014 | link | ||||
Evaluating Modern Clone Detection Tools | 2014 | link | ||||
Rendezvous: a search engine for binary code | MSR | 2013 | link | |||
Binslayer: accurate comparison of binary executables | PPREW | 2013 | link | link | ||
Software clone detection: A systematic review | 2013 | link | ||||
How to extract differences from similar programs? A cohesion metric approach | 2013 | link | ||||
Software clone detection and refactoring | 2013 | link | ||||
An Emerging Approach towards Code Clone Detection: Metric Based Approach on Byte Code | 2013 | link | ||||
A hybrid-token and textual based approach to find similar code segments | 2013 | link | ||||
Gapped code clone detection with lightweight source code analysis | 2013 | link | ||||
MutantX-S: Scalable Malware Clustering Based on Static Features | USENIX | 2013 | link | link | ||
Binjuice: Fast Location of Similar Code Fragments Using Semantic Juice | PPREW | 2013 | link | |||
Towards Automatic Software Lineage Inference | USENIX | 2013 | link | link | ||
AnDarwin: Scalable Detection of Semantically Similar Android Applications | 2013 | link | ||||
Expose: Discovering potential binary code re-use | 2013 | link | ||||
Function Matching-based Binary level Software Similarity Calculation | RACS | 2013 | link | |||
FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors | RAID | 2013 | link | |||
A study of repetitiveness of code changes in software evolution | ASE | 2013 | link | |||
ibinhunt: Binary hunting with interprocedural control flow | 2012 | link | link | |||
ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions | USENIX | 2012 | link | |||
Boreas: an accurate and scalable token-based approach to code clone detection | ASE | 2012 | link | |||
Folding Repeated Instructions for Improving Token-Based Code Clone Detection | 2012 | link | ||||
A metrics-based data mining approach for software clone detection | 2012 | link | ||||
Comparison of Clone Detection Techniques | 2012 | |||||
Malware Classification Method via Binary Content Comparison | RACS | 2012 | link | |||
Binary function clustering using semantic hashes | ICMLA | 2012 | link | |||
Value-based program characterization and its application to software plagiarism detection | 2011 | link | ||||
CMCD: Count Matrix Based Code Clone Detection | 2011 | link | ||||
Incremental code clone detection: A pdg-based approach | 2011 | link | ||||
Anywhere, Any-Time Binary Instrumentation | 2011 | link | ||||
Code reuse in open source software development: Quantitative evidence, drivers, and impediments | 2010 | |||||
Index-based code clone detection: incremental, distributed, scalable | 2010 | |||||
Detection of Type-1 and Type-2 Code Clones Using Textual Analysis and Metrics | 2010 | |||||
Ghezzi, A hybrid approach (syntactic and textual) to clone detection | 2010 | |||||
Evaluating code clone genealogies at release level: An empirical study | 2010 | |||||
A survey of Binary similarity and distance measures | 2010 | |||||
Idea: Opcode-Sequence-Based Malware Detection | 2010 | |||||
Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces | USENIX | 2010 | ||||
Data fingerprinting with similarity digests | 2010 | |||||
Automatic mining of functionally equivalent code fragments via random testing | 2009 | |||||
A mutation/injection-based automatic framework for evaluating code clone detection tools | 2009 | |||||
Problematic code clones identification using multiple detection results | 2009 | |||||
Incremental clone detection | 2009 | |||||
Scalable and incremental clone detection for evolving software | 2009 | |||||
Large-scale Malware Indexing Using Function-call Graphs | 2009 | |||||
Scalable, Behavior-Based Malware Clustering | 2009 | |||||
peHash: A Novel Approach to Fast Malware Clustering | USENIX | 2009 | ||||
Detecting Code Clones in Binary Executables | 2009 | |||||
Binhunt: Automatically finding semantic differences in binary programs | 2008 | |||||
Scalable detection of semantic clones | 2008 | |||||
Deckard: Scalable and accurate tree-based detection of code clones | 2007 | |||||
Large-scale code reuse in open source software | 2007 | |||||
A survey on software clone detection research | 2007 | link | ||||
A study of consistent and inconsistent changes to code clones | 2007 | |||||
Comparison and evaluation of clone detection tools | 2007 | |||||
Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions | 2007 | |||||
A Static Birthmark of Binary Executables Based on API Call Structure | 2007 | |||||
CP-Miner: Finding copy-paste and related bugs in large-scale software code | 2006 | |||||
Survey of research on software clones | 2006 | link | ||||
"Cloning considered harmful" considered harmful: patterns of cloning in software | 2006 | link | ||||
GPLAG: detection of software plagiarism by program dependence graph analysis | 2006 | |||||
Detecting Self-mutating Malware Using Control-flow Graph Matching | 2006 | |||||
Identifying Almost Identical Files Using Context Triggered Piecewise Hashing | 2006 | |||||
Hamsa: Fast signature generation for zero-day polymorphic worms with provable attack resilience | IEEE S&P | 2006 | ||||
Graph-based comparison of executable objects | 2005 | |||||
SDD: high performance code clone detection system for large scale source code | 2005 | link | ||||
Polygraph: Automatically generating signatures for polymorphic worms | 2005 | |||||
K-gram Based Software Birthmarks | 2005 | |||||
Insights into System-Wide Code Duplication | IEEE | 2004 | link | |||
Clone detection in source code by frequent itemset techniques | 2004 | |||||
Evaluating clone detection techniques from a refactoring perspective | 2004 | |||||
Structural comparison of executable objects | 2004 | |||||
Code compaction of matching single-entry multiple-exit regions | 2003 | link | ||||
CloSpan: Mining: Closed sequential patterns in large datasets | 2003 | |||||
Ccfinder: a multilinguistic token-based code clone detection system for large scale source code | 2002 | |||||
Identifying similar code with program dependence graphs | 2001 | |||||
Using slicing to identify duplication in source code | 2001 | |||||
BMAT – A Binary Matching Tool for Stale Profile Propagation | 2000 | |||||
A language independent approach for detecting duplicated code | 1999 | |||||
Compressing Differences of Executable Code | 1999 | |||||
Similarity search in high dimensions via hashing | 1999 | |||||
Clone detection using abstract syntax trees | 1998 | |||||
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics | 1996 | |||||
Pattern matching for clone and concept detection | 1996 | |||||
On finding duplication and near-duplication in large software systems | 1995 | link | ||||
Detecting code similarity using patterns | 1995 | |||||
A Cross-platform Binary Diff | 1995 |
-
Notifications
You must be signed in to change notification settings - Fork 76
SystemSecurityStorm/Awesome-Binary-Similarity
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
An awesome & curated list of binary code similarity papers
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published