Skip to content

Commit 3484985

Browse files
authored
Create README.md
1 parent 7194028 commit 3484985

File tree

1 file changed

+35
-0
lines changed

1 file changed

+35
-0
lines changed

malconv/README.md

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
This directory is provided as a courtesy. It includes the MalConv model to which we compared to in https://arxiv.org/abs/1804.04637.
2+
3+
For more details about MalConv, please see (and cite) the [original paper](https://arxiv.org/abs/1710.09435).
4+
5+
```
6+
Raff, Edward, et al. "Malware detection by eating a whole exe." arXiv preprint arXiv:1710.09435 (2017).
7+
```
8+
9+
If you use the pre-trained weights or code in your work, we also ask that you please cite [our paper](https://arxiv.org/pdf/1804.04637.pdf) for the implementation of MalConv, is it differs in a few subtle ways from the original.
10+
11+
```
12+
H. Anderson and P. Roth, "EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models”, in ArXiv e-prints. Apr. 2018.
13+
14+
@ARTICLE{2018arXiv180404637A,
15+
author = {{Anderson}, H.~S. and {Roth}, P.},
16+
title = "{EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models}",
17+
journal = {ArXiv e-prints},
18+
archivePrefix = "arXiv",
19+
eprint = {1804.04637},
20+
primaryClass = "cs.CR",
21+
keywords = {Computer Science - Cryptography and Security},
22+
year = 2018,
23+
month = apr,
24+
adsurl = {http://adsabs.harvard.edu/abs/2018arXiv180404637A},
25+
}
26+
```
27+
28+
## Can I use this code to train MalConv on my own dataset?
29+
The code provided is instructional and nonfunctional. With a few minor changes, it can be made functional. In particular, you must provide a URL to fetch file contents by sha256 hash.
30+
31+
## How does this MalConv model differ from that of Raff et al.?
32+
* The original paper used `batch_size = 256` and `SGD(lr=0.01, momentum=0.9, decay=UNDISCLOSED, nesterov=True )`. We used
33+
`decay=1e-3` and `batch_size=100`.
34+
* It is unknown whether the original paper used a special symbol for padding.
35+
* The paper allowed for up to 2MB malware sizes, we use 1MB because of memory limits on a commonly-used Titan X.

0 commit comments

Comments
 (0)