|
1 | 1 | # 🚚 MOVED 🚚
|
2 | 2 |
|
3 | 3 | ### __Future development of Snappy without JNI has moved to [aircompressor](https://github.com/airlift/aircompressor)__
|
4 |
| - |
5 |
| -</br> |
6 |
| - |
7 |
| -# Snappy in Java |
8 |
| - |
9 |
| -This is a rewrite (port) of [Snappy](http://code.google.com/p/snappy/) written in |
10 |
| -pure Java. This compression code produces a byte-for-byte exact copy of the output |
11 |
| -created by the original C++ code, and extremely fast. |
12 |
| - |
13 |
| -# Performance |
14 |
| - |
15 |
| -The Snappy micro-benchmark has been ported, and can be used to measure |
16 |
| -the performance of this code against the excellent Snappy JNI wrapper from |
17 |
| -[xerial](http://code.google.com/p/snappy-java/). As you can see in the results |
18 |
| -below, the pure Java port is 20-30% faster for block compress, 0-10% slower |
19 |
| -for block uncompress, and 0-5% slower for round-trip block compression. These |
20 |
| -results were run with Java 7 on a Core i7, 64-bit Mac. |
21 |
| - |
22 |
| -As a second more independent test, the performance has been measured using the |
23 |
| -Ning JVM compression benchmark against Snappy JNI, and the pure Java |
24 |
| -[Ning LZF](https://github.com/ning/compress) codec. The |
25 |
| -[results](http://dain.github.com/snappy/) show that the pure Java Snappy is |
26 |
| -20-30% faster than JNI Snappy for compression, and is typically 10-20% slower |
27 |
| -for decompression. Both, the pure Java Snappy and JNI Snappy implementations |
28 |
| -are faster that the Ning LZF codec. These results were run with Java 6 on a |
29 |
| -Core i7, 64-bit Mac. |
30 |
| - |
31 |
| -The difference in performance between these two tests is due to the difference |
32 |
| -in JVM version; Java 7 is consistently 5-10% faster than Java 6 in the |
33 |
| -compression code. As with all benchmarks your mileage will vary, so test with |
34 |
| -your actual use case. |
35 |
| - |
36 |
| - |
37 |
| - |
38 |
| -### Block Compress |
39 |
| -<pre><code> |
40 |
| - JNI Java JNI Java |
41 |
| -Input Size Compress Compress Throughput Throughput Change |
42 |
| ---------------------------------------------------------------------- |
43 |
| -html 102400 76.4% 76.4% 294.9MB/s 384.8MB/s +30.5% html |
44 |
| -urls 702087 49.1% 49.1% 178.7MB/s 226.5MB/s +26.8% urls |
45 |
| -jpg 126958 0.1% 0.1% 2.7GB/s 3.2GB/s +17.4% jpg (not compressible) |
46 |
| -pdf 94330 17.9% 17.9% 642.4MB/s 910.3MB/s +41.7% pdf |
47 |
| -html4 409600 76.4% 76.4% 289.2MB/s 377.3MB/s +30.5% html4 |
48 |
| -cp 24603 51.9% 51.9% 166.4MB/s 233.7MB/s +40.5% cp |
49 |
| -c 11150 57.6% 57.6% 177.1MB/s 295.4MB/s +66.8% c |
50 |
| -lsp 3721 51.6% 51.6% 245.5MB/s 278.0MB/s +13.2% lsp |
51 |
| -xls 1029744 58.7% 58.7% 263.0MB/s 292.5MB/s +11.2% xls |
52 |
| -txt1 152089 40.2% 40.2% 116.8MB/s 163.1MB/s +39.7% txt1 |
53 |
| -txt2 125179 35.9% 35.9% 112.5MB/s 153.4MB/s +36.3% txt2 |
54 |
| -txt3 426754 42.9% 42.9% 123.3MB/s 169.8MB/s +37.6% txt3 |
55 |
| -txt4 481861 31.7% 31.7% 107.8MB/s 146.2MB/s +35.6% txt4 |
56 |
| -bin 513216 81.8% 81.8% 413.1MB/s 497.8MB/s +20.5% bin |
57 |
| -sum 38240 48.1% 48.1% 162.4MB/s 213.9MB/s +31.7% sum |
58 |
| -man 4227 40.6% 40.6% 194.6MB/s 241.7MB/s +24.2% man |
59 |
| -pb 118588 76.8% 76.8% 363.7MB/s 450.3MB/s +23.8% pb |
60 |
| -gaviota 184320 61.7% 61.7% 166.7MB/s 253.7MB/s +52.2% gaviota |
61 |
| -</code></pre> |
62 |
| - |
63 |
| - |
64 |
| -### Block Uncompress |
65 |
| -<pre><code> |
66 |
| - JNI Java JNI Java |
67 |
| -Input Size Compress Compress Throughput Throughput Change |
68 |
| ---------------------------------------------------------------------- |
69 |
| -html 102400 76.4% 76.4% 1.5GB/s 1.3GB/s -12.2% html |
70 |
| -urls 702087 49.1% 49.1% 969.2MB/s 827.5MB/s -14.6% urls |
71 |
| -jpg 126958 0.1% 0.1% 18.6GB/s 19.4GB/s +4.2% jpg (not compressible) |
72 |
| -pdf 94330 17.9% 17.9% 4.1GB/s 3.7GB/s -8.8% pdf |
73 |
| -html4 409600 76.4% 76.4% 1.5GB/s 1.2GB/s -16.8% html4 |
74 |
| -cp 24603 51.9% 51.9% 965.2MB/s 956.0MB/s -1.0% cp |
75 |
| -c 11150 57.6% 57.6% 989.1MB/s 924.9MB/s -6.5% c |
76 |
| -lsp 3721 51.6% 51.6% 991.6MB/s 964.8MB/s -2.7% lsp |
77 |
| -xls 1029744 58.7% 58.7% 798.4MB/s 747.3MB/s -6.4% xls |
78 |
| -txt1 152089 40.2% 40.2% 643.8MB/s 580.8MB/s -9.8% txt1 |
79 |
| -txt2 125179 35.9% 35.9% 610.0MB/s 549.6MB/s -9.9% txt2 |
80 |
| -txt3 426754 42.9% 42.9% 683.8MB/s 614.4MB/s -10.2% txt3 |
81 |
| -txt4 481861 31.7% 31.7% 565.4MB/s 505.5MB/s -10.6% txt4 |
82 |
| -bin 513216 81.8% 81.8% 1.5GB/s 1.2GB/s -20.4% bin |
83 |
| -sum 38240 48.1% 48.1% 838.1MB/s 771.6MB/s -7.9% sum |
84 |
| -man 4227 40.6% 40.6% 856.9MB/s 847.2MB/s -1.1% man |
85 |
| -pb 118588 76.8% 76.8% 1.7GB/s 1.5GB/s -12.9% pb |
86 |
| -gaviota 184320 61.7% 61.7% 769.1MB/s 693.4MB/s -9.9% gaviota |
87 |
| -</code></pre> |
88 |
| - |
89 |
| - |
90 |
| -### Block Round Trip |
91 |
| -<pre><code> |
92 |
| - JNI Java JNI Java |
93 |
| -Input Size Compress Compress Throughput Throughput Change |
94 |
| ---------------------------------------------------------------------- |
95 |
| -html 102400 76.4% 76.4% 300.3MB/s 287.1MB/s -4.4% html |
96 |
| -urls 702087 49.1% 49.1% 182.7MB/s 177.0MB/s -3.2% urls |
97 |
| -jpg 126958 0.1% 0.1% 2.6GB/s 2.6GB/s +1.1% jpg (not compressible) |
98 |
| -pdf 94330 17.9% 17.9% 695.3MB/s 680.0MB/s -2.2% pdf |
99 |
| -html4 409600 76.4% 76.4% 296.4MB/s 282.1MB/s -4.8% html4 |
100 |
| -cp 24603 51.9% 51.9% 177.0MB/s 172.5MB/s -2.5% cp |
101 |
| -c 11150 57.6% 57.6% 221.7MB/s 218.3MB/s -1.5% c |
102 |
| -lsp 3721 51.6% 51.6% 217.3MB/s 216.3MB/s -0.5% lsp |
103 |
| -xls 1029744 58.7% 58.7% 213.3MB/s 209.9MB/s -1.6% xls |
104 |
| -txt1 152089 40.2% 40.2% 129.4MB/s 126.3MB/s -2.4% txt1 |
105 |
| -txt2 125179 35.9% 35.9% 121.7MB/s 118.8MB/s -2.4% txt2 |
106 |
| -txt3 426754 42.9% 42.9% 135.2MB/s 132.8MB/s -1.8% txt3 |
107 |
| -txt4 481861 31.7% 31.7% 115.2MB/s 113.0MB/s -1.9% txt4 |
108 |
| -bin 513216 81.8% 81.8% 371.2MB/s 350.7MB/s -5.5% bin |
109 |
| -sum 38240 48.1% 48.1% 164.2MB/s 160.0MB/s -2.6% sum |
110 |
| -man 4227 40.6% 40.6% 184.8MB/s 185.3MB/s +0.3% man |
111 |
| -pb 118588 76.8% 76.8% 344.1MB/s 326.3MB/s -5.2% pb |
112 |
| -gaviota 184320 61.7% 61.7% 188.0MB/s 185.2MB/s -1.5% gaviota |
113 |
| -</code></pre> |
114 |
| - |
115 |
| -# Stream Format |
116 |
| - |
117 |
| -There is no defined stream format for Snappy, but there is an effort to create |
118 |
| -a common format with the Google Snappy project. |
119 |
| - |
120 |
| -The stream format used in this library has a couple of unique features not |
121 |
| -found in the other Snappy stream formats. Like the other formats, the user |
122 |
| -input is broken into blocks and each block is compressed. If the compressed |
123 |
| -block is smaller that the user input, the compressed block is written, |
124 |
| -otherwise the uncompressed original is written. This dramatically improves the |
125 |
| -speed of uncompressible input such as JPG images. Additionally, a checksum of |
126 |
| -the user input data for each block is written to the stream. This safety check |
127 |
| -assures that the stream has not been corrupted in transit or by a bad Snappy |
128 |
| -implementation. Finally, like gzip, compressed Snappy files can be |
129 |
| -concatenated together without issue, since the input stream will ignore a |
130 |
| -Snappy stream header in the middle of a stream. This makes combining files in |
131 |
| -Hadoop and S3 trivial. |
132 |
| - |
133 |
| -The the SnappyOutputStream javadocs contain formal definition of the stream |
134 |
| -format. |
135 |
| - |
136 |
| -## Stream Performance |
137 |
| - |
138 |
| -The streaming mode performance can not be directly compared to other |
139 |
| -compression algorithms since most formats do not contain a checksum. The basic |
140 |
| -streaming code is significantly faster that the Snappy JNI library due to |
141 |
| -the completely unoptimized stream implementation in Snappy JNI, but once the |
142 |
| -check sum is enabled the performance drops off by about 20%. |
143 |
| - |
144 |
| -### Stream Compress (no checksums) |
145 |
| -<pre><code> |
146 |
| - JNI Java JNI Java |
147 |
| -Input Size Compress Compress Throughput Throughput Change |
148 |
| ---------------------------------------------------------------------- |
149 |
| -html 102400 76.4% 76.4% 275.8MB/s 373.5MB/s +35.4% html |
150 |
| -urls 702087 49.1% 49.1% 176.5MB/s 225.2MB/s +27.6% urls |
151 |
| -jpg 126958 0.1% -0.0% 1.7GB/s 2.0GB/s +15.8% jpg (not compressible) |
152 |
| -pdf 94330 17.8% 16.0% 557.2MB/s 793.2MB/s +42.4% pdf |
153 |
| -html4 409600 76.4% 76.4% 281.0MB/s 369.9MB/s +31.7% html4 |
154 |
| -cp 24603 51.8% 51.8% 151.7MB/s 214.3MB/s +41.3% cp |
155 |
| -c 11150 57.4% 57.5% 149.1MB/s 243.3MB/s +63.1% c |
156 |
| -lsp 3721 51.1% 51.2% 141.3MB/s 181.1MB/s +28.2% lsp |
157 |
| -xls 1029744 58.6% 58.6% 253.9MB/s 290.5MB/s +14.4% xls |
158 |
| -txt1 152089 40.2% 40.2% 114.8MB/s 159.4MB/s +38.8% txt1 |
159 |
| -txt2 125179 35.9% 35.9% 110.0MB/s 150.4MB/s +36.7% txt2 |
160 |
| -txt3 426754 42.9% 42.9% 121.0MB/s 167.9MB/s +38.8% txt3 |
161 |
| -txt4 481861 31.6% 31.6% 105.1MB/s 143.2MB/s +36.2% txt4 |
162 |
| -bin 513216 81.8% 81.8% 387.7MB/s 484.5MB/s +25.0% bin |
163 |
| -sum 38240 48.1% 48.1% 153.0MB/s 203.1MB/s +32.8% sum |
164 |
| -man 4227 40.2% 40.3% 125.9MB/s 171.9MB/s +36.5% man |
165 |
| -pb 118588 76.8% 76.8% 342.2MB/s 431.4MB/s +26.1% pb |
166 |
| -gaviota 184320 61.7% 61.7% 161.1MB/s 246.1MB/s +52.7% gaviota |
167 |
| -</code></pre> |
168 |
| - |
169 |
| - |
170 |
| -### Stream Uncompress (no checksums) |
171 |
| -<pre><code> |
172 |
| - JNI Java JNI Java |
173 |
| -Input Size Compress Compress Throughput Throughput Change |
174 |
| ---------------------------------------------------------------------- |
175 |
| -html 102400 76.4% 76.4% 1.2GB/s 1.2GB/s +0.4% html |
176 |
| -urls 702087 49.1% 49.1% 853.9MB/s 786.6MB/s -7.9% urls |
177 |
| -jpg 126958 0.1% -0.0% 3.0GB/s 10.3GB/s +239.0% jpg (not compressible) |
178 |
| -pdf 94330 17.8% 16.0% 2.0GB/s 3.4GB/s +71.5% pdf |
179 |
| -html4 409600 76.4% 76.4% 1.2GB/s 1.1GB/s -8.4% html4 |
180 |
| -cp 24603 51.8% 51.8% 785.2MB/s 905.6MB/s +15.3% cp |
181 |
| -c 11150 57.4% 57.5% 778.9MB/s 889.7MB/s +14.2% c |
182 |
| -lsp 3721 51.1% 51.2% 739.0MB/s 905.5MB/s +22.5% lsp |
183 |
| -xls 1029744 58.6% 58.6% 730.3MB/s 718.8MB/s -1.6% xls |
184 |
| -txt1 152089 40.2% 40.2% 582.4MB/s 559.0MB/s -4.0% txt1 |
185 |
| -txt2 125179 35.9% 35.9% 540.7MB/s 526.4MB/s -2.6% txt2 |
186 |
| -txt3 426754 42.9% 42.9% 620.5MB/s 583.9MB/s -5.9% txt3 |
187 |
| -txt4 481861 31.6% 31.6% 519.4MB/s 487.0MB/s -6.2% txt4 |
188 |
| -bin 513216 81.8% 81.8% 1.2GB/s 1.1GB/s -11.6% bin |
189 |
| -sum 38240 48.1% 48.1% 693.4MB/s 742.4MB/s +7.1% sum |
190 |
| -man 4227 40.2% 40.3% 637.3MB/s 784.3MB/s +23.1% man |
191 |
| -pb 118588 76.8% 76.8% 1.4GB/s 1.4GB/s +0.4% pb |
192 |
| -gaviota 184320 61.7% 61.7% 688.5MB/s 668.2MB/s -3.0% gaviota |
193 |
| -</code></pre> |
194 |
| - |
195 |
| - |
196 |
| -### Stream RoundTrip (no checksums) |
197 |
| -<pre><code> |
198 |
| - JNI Java JNI Java |
199 |
| -Input Size Compress Compress Throughput Throughput Change |
200 |
| ---------------------------------------------------------------------- |
201 |
| -html 102400 76.4% 76.4% 223.8MB/s 272.5MB/s +21.8% html |
202 |
| -urls 702087 49.1% 49.1% 142.8MB/s 174.1MB/s +22.0% urls |
203 |
| -jpg 126958 0.1% -0.0% 1.1GB/s 1.6GB/s +52.1% jpg (not compressible) |
204 |
| -pdf 94330 17.8% 16.0% 421.9MB/s 610.1MB/s +44.6% pdf |
205 |
| -html4 409600 76.4% 76.4% 226.2MB/s 275.5MB/s +21.8% html4 |
206 |
| -cp 24603 51.8% 51.8% 125.3MB/s 160.3MB/s +27.9% cp |
207 |
| -c 11150 57.4% 57.5% 125.1MB/s 183.2MB/s +46.5% c |
208 |
| -lsp 3721 51.1% 51.2% 130.6MB/s 149.5MB/s +14.5% lsp |
209 |
| -xls 1029744 58.6% 58.6% 188.2MB/s 206.1MB/s +9.5% xls |
210 |
| -txt1 152089 40.2% 40.2% 95.3MB/s 123.3MB/s +29.4% txt1 |
211 |
| -txt2 125179 35.9% 35.9% 91.4MB/s 116.8MB/s +27.9% txt2 |
212 |
| -txt3 426754 42.9% 42.9% 101.3MB/s 130.3MB/s +28.6% txt3 |
213 |
| -txt4 481861 31.6% 31.6% 87.9MB/s 111.1MB/s +26.3% txt4 |
214 |
| -bin 513216 81.8% 81.8% 294.7MB/s 337.9MB/s +14.7% bin |
215 |
| -sum 38240 48.1% 48.1% 122.9MB/s 152.9MB/s +24.3% sum |
216 |
| -man 4227 40.2% 40.3% 113.0MB/s 139.1MB/s +23.1% man |
217 |
| -pb 118588 76.8% 76.8% 269.5MB/s 313.8MB/s +16.4% pb |
218 |
| -gaviota 184320 61.7% 61.7% 131.1MB/s 180.3MB/s +37.6% gaviota |
219 |
| -</code></pre> |
0 commit comments