|
1 | 1 | # 🚚 MOVED 🚚 |
2 | 2 |
|
3 | 3 | ### __Future development of Snappy without JNI has moved to [aircompressor](https://github.com/airlift/aircompressor)__ |
4 | | - |
5 | | -</br> |
6 | | - |
7 | | -# Snappy in Java |
8 | | - |
9 | | -This is a rewrite (port) of [Snappy](http://code.google.com/p/snappy/) written in |
10 | | -pure Java. This compression code produces a byte-for-byte exact copy of the output |
11 | | -created by the original C++ code, and extremely fast. |
12 | | - |
13 | | -# Performance |
14 | | - |
15 | | -The Snappy micro-benchmark has been ported, and can be used to measure |
16 | | -the performance of this code against the excellent Snappy JNI wrapper from |
17 | | -[xerial](http://code.google.com/p/snappy-java/). As you can see in the results |
18 | | -below, the pure Java port is 20-30% faster for block compress, 0-10% slower |
19 | | -for block uncompress, and 0-5% slower for round-trip block compression. These |
20 | | -results were run with Java 7 on a Core i7, 64-bit Mac. |
21 | | - |
22 | | -As a second more independent test, the performance has been measured using the |
23 | | -Ning JVM compression benchmark against Snappy JNI, and the pure Java |
24 | | -[Ning LZF](https://github.com/ning/compress) codec. The |
25 | | -[results](http://dain.github.com/snappy/) show that the pure Java Snappy is |
26 | | -20-30% faster than JNI Snappy for compression, and is typically 10-20% slower |
27 | | -for decompression. Both, the pure Java Snappy and JNI Snappy implementations |
28 | | -are faster that the Ning LZF codec. These results were run with Java 6 on a |
29 | | -Core i7, 64-bit Mac. |
30 | | - |
31 | | -The difference in performance between these two tests is due to the difference |
32 | | -in JVM version; Java 7 is consistently 5-10% faster than Java 6 in the |
33 | | -compression code. As with all benchmarks your mileage will vary, so test with |
34 | | -your actual use case. |
35 | | - |
36 | | - |
37 | | - |
38 | | -### Block Compress |
39 | | -<pre><code> |
40 | | - JNI Java JNI Java |
41 | | -Input Size Compress Compress Throughput Throughput Change |
42 | | ---------------------------------------------------------------------- |
43 | | -html 102400 76.4% 76.4% 294.9MB/s 384.8MB/s +30.5% html |
44 | | -urls 702087 49.1% 49.1% 178.7MB/s 226.5MB/s +26.8% urls |
45 | | -jpg 126958 0.1% 0.1% 2.7GB/s 3.2GB/s +17.4% jpg (not compressible) |
46 | | -pdf 94330 17.9% 17.9% 642.4MB/s 910.3MB/s +41.7% pdf |
47 | | -html4 409600 76.4% 76.4% 289.2MB/s 377.3MB/s +30.5% html4 |
48 | | -cp 24603 51.9% 51.9% 166.4MB/s 233.7MB/s +40.5% cp |
49 | | -c 11150 57.6% 57.6% 177.1MB/s 295.4MB/s +66.8% c |
50 | | -lsp 3721 51.6% 51.6% 245.5MB/s 278.0MB/s +13.2% lsp |
51 | | -xls 1029744 58.7% 58.7% 263.0MB/s 292.5MB/s +11.2% xls |
52 | | -txt1 152089 40.2% 40.2% 116.8MB/s 163.1MB/s +39.7% txt1 |
53 | | -txt2 125179 35.9% 35.9% 112.5MB/s 153.4MB/s +36.3% txt2 |
54 | | -txt3 426754 42.9% 42.9% 123.3MB/s 169.8MB/s +37.6% txt3 |
55 | | -txt4 481861 31.7% 31.7% 107.8MB/s 146.2MB/s +35.6% txt4 |
56 | | -bin 513216 81.8% 81.8% 413.1MB/s 497.8MB/s +20.5% bin |
57 | | -sum 38240 48.1% 48.1% 162.4MB/s 213.9MB/s +31.7% sum |
58 | | -man 4227 40.6% 40.6% 194.6MB/s 241.7MB/s +24.2% man |
59 | | -pb 118588 76.8% 76.8% 363.7MB/s 450.3MB/s +23.8% pb |
60 | | -gaviota 184320 61.7% 61.7% 166.7MB/s 253.7MB/s +52.2% gaviota |
61 | | -</code></pre> |
62 | | - |
63 | | - |
64 | | -### Block Uncompress |
65 | | -<pre><code> |
66 | | - JNI Java JNI Java |
67 | | -Input Size Compress Compress Throughput Throughput Change |
68 | | ---------------------------------------------------------------------- |
69 | | -html 102400 76.4% 76.4% 1.5GB/s 1.3GB/s -12.2% html |
70 | | -urls 702087 49.1% 49.1% 969.2MB/s 827.5MB/s -14.6% urls |
71 | | -jpg 126958 0.1% 0.1% 18.6GB/s 19.4GB/s +4.2% jpg (not compressible) |
72 | | -pdf 94330 17.9% 17.9% 4.1GB/s 3.7GB/s -8.8% pdf |
73 | | -html4 409600 76.4% 76.4% 1.5GB/s 1.2GB/s -16.8% html4 |
74 | | -cp 24603 51.9% 51.9% 965.2MB/s 956.0MB/s -1.0% cp |
75 | | -c 11150 57.6% 57.6% 989.1MB/s 924.9MB/s -6.5% c |
76 | | -lsp 3721 51.6% 51.6% 991.6MB/s 964.8MB/s -2.7% lsp |
77 | | -xls 1029744 58.7% 58.7% 798.4MB/s 747.3MB/s -6.4% xls |
78 | | -txt1 152089 40.2% 40.2% 643.8MB/s 580.8MB/s -9.8% txt1 |
79 | | -txt2 125179 35.9% 35.9% 610.0MB/s 549.6MB/s -9.9% txt2 |
80 | | -txt3 426754 42.9% 42.9% 683.8MB/s 614.4MB/s -10.2% txt3 |
81 | | -txt4 481861 31.7% 31.7% 565.4MB/s 505.5MB/s -10.6% txt4 |
82 | | -bin 513216 81.8% 81.8% 1.5GB/s 1.2GB/s -20.4% bin |
83 | | -sum 38240 48.1% 48.1% 838.1MB/s 771.6MB/s -7.9% sum |
84 | | -man 4227 40.6% 40.6% 856.9MB/s 847.2MB/s -1.1% man |
85 | | -pb 118588 76.8% 76.8% 1.7GB/s 1.5GB/s -12.9% pb |
86 | | -gaviota 184320 61.7% 61.7% 769.1MB/s 693.4MB/s -9.9% gaviota |
87 | | -</code></pre> |
88 | | - |
89 | | - |
90 | | -### Block Round Trip |
91 | | -<pre><code> |
92 | | - JNI Java JNI Java |
93 | | -Input Size Compress Compress Throughput Throughput Change |
94 | | ---------------------------------------------------------------------- |
95 | | -html 102400 76.4% 76.4% 300.3MB/s 287.1MB/s -4.4% html |
96 | | -urls 702087 49.1% 49.1% 182.7MB/s 177.0MB/s -3.2% urls |
97 | | -jpg 126958 0.1% 0.1% 2.6GB/s 2.6GB/s +1.1% jpg (not compressible) |
98 | | -pdf 94330 17.9% 17.9% 695.3MB/s 680.0MB/s -2.2% pdf |
99 | | -html4 409600 76.4% 76.4% 296.4MB/s 282.1MB/s -4.8% html4 |
100 | | -cp 24603 51.9% 51.9% 177.0MB/s 172.5MB/s -2.5% cp |
101 | | -c 11150 57.6% 57.6% 221.7MB/s 218.3MB/s -1.5% c |
102 | | -lsp 3721 51.6% 51.6% 217.3MB/s 216.3MB/s -0.5% lsp |
103 | | -xls 1029744 58.7% 58.7% 213.3MB/s 209.9MB/s -1.6% xls |
104 | | -txt1 152089 40.2% 40.2% 129.4MB/s 126.3MB/s -2.4% txt1 |
105 | | -txt2 125179 35.9% 35.9% 121.7MB/s 118.8MB/s -2.4% txt2 |
106 | | -txt3 426754 42.9% 42.9% 135.2MB/s 132.8MB/s -1.8% txt3 |
107 | | -txt4 481861 31.7% 31.7% 115.2MB/s 113.0MB/s -1.9% txt4 |
108 | | -bin 513216 81.8% 81.8% 371.2MB/s 350.7MB/s -5.5% bin |
109 | | -sum 38240 48.1% 48.1% 164.2MB/s 160.0MB/s -2.6% sum |
110 | | -man 4227 40.6% 40.6% 184.8MB/s 185.3MB/s +0.3% man |
111 | | -pb 118588 76.8% 76.8% 344.1MB/s 326.3MB/s -5.2% pb |
112 | | -gaviota 184320 61.7% 61.7% 188.0MB/s 185.2MB/s -1.5% gaviota |
113 | | -</code></pre> |
114 | | - |
115 | | -# Stream Format |
116 | | - |
117 | | -There is no defined stream format for Snappy, but there is an effort to create |
118 | | -a common format with the Google Snappy project. |
119 | | - |
120 | | -The stream format used in this library has a couple of unique features not |
121 | | -found in the other Snappy stream formats. Like the other formats, the user |
122 | | -input is broken into blocks and each block is compressed. If the compressed |
123 | | -block is smaller that the user input, the compressed block is written, |
124 | | -otherwise the uncompressed original is written. This dramatically improves the |
125 | | -speed of uncompressible input such as JPG images. Additionally, a checksum of |
126 | | -the user input data for each block is written to the stream. This safety check |
127 | | -assures that the stream has not been corrupted in transit or by a bad Snappy |
128 | | -implementation. Finally, like gzip, compressed Snappy files can be |
129 | | -concatenated together without issue, since the input stream will ignore a |
130 | | -Snappy stream header in the middle of a stream. This makes combining files in |
131 | | -Hadoop and S3 trivial. |
132 | | - |
133 | | -The the SnappyOutputStream javadocs contain formal definition of the stream |
134 | | -format. |
135 | | - |
136 | | -## Stream Performance |
137 | | - |
138 | | -The streaming mode performance can not be directly compared to other |
139 | | -compression algorithms since most formats do not contain a checksum. The basic |
140 | | -streaming code is significantly faster that the Snappy JNI library due to |
141 | | -the completely unoptimized stream implementation in Snappy JNI, but once the |
142 | | -check sum is enabled the performance drops off by about 20%. |
143 | | - |
144 | | -### Stream Compress (no checksums) |
145 | | -<pre><code> |
146 | | - JNI Java JNI Java |
147 | | -Input Size Compress Compress Throughput Throughput Change |
148 | | ---------------------------------------------------------------------- |
149 | | -html 102400 76.4% 76.4% 275.8MB/s 373.5MB/s +35.4% html |
150 | | -urls 702087 49.1% 49.1% 176.5MB/s 225.2MB/s +27.6% urls |
151 | | -jpg 126958 0.1% -0.0% 1.7GB/s 2.0GB/s +15.8% jpg (not compressible) |
152 | | -pdf 94330 17.8% 16.0% 557.2MB/s 793.2MB/s +42.4% pdf |
153 | | -html4 409600 76.4% 76.4% 281.0MB/s 369.9MB/s +31.7% html4 |
154 | | -cp 24603 51.8% 51.8% 151.7MB/s 214.3MB/s +41.3% cp |
155 | | -c 11150 57.4% 57.5% 149.1MB/s 243.3MB/s +63.1% c |
156 | | -lsp 3721 51.1% 51.2% 141.3MB/s 181.1MB/s +28.2% lsp |
157 | | -xls 1029744 58.6% 58.6% 253.9MB/s 290.5MB/s +14.4% xls |
158 | | -txt1 152089 40.2% 40.2% 114.8MB/s 159.4MB/s +38.8% txt1 |
159 | | -txt2 125179 35.9% 35.9% 110.0MB/s 150.4MB/s +36.7% txt2 |
160 | | -txt3 426754 42.9% 42.9% 121.0MB/s 167.9MB/s +38.8% txt3 |
161 | | -txt4 481861 31.6% 31.6% 105.1MB/s 143.2MB/s +36.2% txt4 |
162 | | -bin 513216 81.8% 81.8% 387.7MB/s 484.5MB/s +25.0% bin |
163 | | -sum 38240 48.1% 48.1% 153.0MB/s 203.1MB/s +32.8% sum |
164 | | -man 4227 40.2% 40.3% 125.9MB/s 171.9MB/s +36.5% man |
165 | | -pb 118588 76.8% 76.8% 342.2MB/s 431.4MB/s +26.1% pb |
166 | | -gaviota 184320 61.7% 61.7% 161.1MB/s 246.1MB/s +52.7% gaviota |
167 | | -</code></pre> |
168 | | - |
169 | | - |
170 | | -### Stream Uncompress (no checksums) |
171 | | -<pre><code> |
172 | | - JNI Java JNI Java |
173 | | -Input Size Compress Compress Throughput Throughput Change |
174 | | ---------------------------------------------------------------------- |
175 | | -html 102400 76.4% 76.4% 1.2GB/s 1.2GB/s +0.4% html |
176 | | -urls 702087 49.1% 49.1% 853.9MB/s 786.6MB/s -7.9% urls |
177 | | -jpg 126958 0.1% -0.0% 3.0GB/s 10.3GB/s +239.0% jpg (not compressible) |
178 | | -pdf 94330 17.8% 16.0% 2.0GB/s 3.4GB/s +71.5% pdf |
179 | | -html4 409600 76.4% 76.4% 1.2GB/s 1.1GB/s -8.4% html4 |
180 | | -cp 24603 51.8% 51.8% 785.2MB/s 905.6MB/s +15.3% cp |
181 | | -c 11150 57.4% 57.5% 778.9MB/s 889.7MB/s +14.2% c |
182 | | -lsp 3721 51.1% 51.2% 739.0MB/s 905.5MB/s +22.5% lsp |
183 | | -xls 1029744 58.6% 58.6% 730.3MB/s 718.8MB/s -1.6% xls |
184 | | -txt1 152089 40.2% 40.2% 582.4MB/s 559.0MB/s -4.0% txt1 |
185 | | -txt2 125179 35.9% 35.9% 540.7MB/s 526.4MB/s -2.6% txt2 |
186 | | -txt3 426754 42.9% 42.9% 620.5MB/s 583.9MB/s -5.9% txt3 |
187 | | -txt4 481861 31.6% 31.6% 519.4MB/s 487.0MB/s -6.2% txt4 |
188 | | -bin 513216 81.8% 81.8% 1.2GB/s 1.1GB/s -11.6% bin |
189 | | -sum 38240 48.1% 48.1% 693.4MB/s 742.4MB/s +7.1% sum |
190 | | -man 4227 40.2% 40.3% 637.3MB/s 784.3MB/s +23.1% man |
191 | | -pb 118588 76.8% 76.8% 1.4GB/s 1.4GB/s +0.4% pb |
192 | | -gaviota 184320 61.7% 61.7% 688.5MB/s 668.2MB/s -3.0% gaviota |
193 | | -</code></pre> |
194 | | - |
195 | | - |
196 | | -### Stream RoundTrip (no checksums) |
197 | | -<pre><code> |
198 | | - JNI Java JNI Java |
199 | | -Input Size Compress Compress Throughput Throughput Change |
200 | | ---------------------------------------------------------------------- |
201 | | -html 102400 76.4% 76.4% 223.8MB/s 272.5MB/s +21.8% html |
202 | | -urls 702087 49.1% 49.1% 142.8MB/s 174.1MB/s +22.0% urls |
203 | | -jpg 126958 0.1% -0.0% 1.1GB/s 1.6GB/s +52.1% jpg (not compressible) |
204 | | -pdf 94330 17.8% 16.0% 421.9MB/s 610.1MB/s +44.6% pdf |
205 | | -html4 409600 76.4% 76.4% 226.2MB/s 275.5MB/s +21.8% html4 |
206 | | -cp 24603 51.8% 51.8% 125.3MB/s 160.3MB/s +27.9% cp |
207 | | -c 11150 57.4% 57.5% 125.1MB/s 183.2MB/s +46.5% c |
208 | | -lsp 3721 51.1% 51.2% 130.6MB/s 149.5MB/s +14.5% lsp |
209 | | -xls 1029744 58.6% 58.6% 188.2MB/s 206.1MB/s +9.5% xls |
210 | | -txt1 152089 40.2% 40.2% 95.3MB/s 123.3MB/s +29.4% txt1 |
211 | | -txt2 125179 35.9% 35.9% 91.4MB/s 116.8MB/s +27.9% txt2 |
212 | | -txt3 426754 42.9% 42.9% 101.3MB/s 130.3MB/s +28.6% txt3 |
213 | | -txt4 481861 31.6% 31.6% 87.9MB/s 111.1MB/s +26.3% txt4 |
214 | | -bin 513216 81.8% 81.8% 294.7MB/s 337.9MB/s +14.7% bin |
215 | | -sum 38240 48.1% 48.1% 122.9MB/s 152.9MB/s +24.3% sum |
216 | | -man 4227 40.2% 40.3% 113.0MB/s 139.1MB/s +23.1% man |
217 | | -pb 118588 76.8% 76.8% 269.5MB/s 313.8MB/s +16.4% pb |
218 | | -gaviota 184320 61.7% 61.7% 131.1MB/s 180.3MB/s +37.6% gaviota |
219 | | -</code></pre> |
0 commit comments