Skip to content

Commit b55ef88

Browse files
committed
Replace implementation with Aircompressor code
The core compression and decompression code has been replaced with the latest code from Aircompressor. The existing interfaces have been retained where possible. The notable exception is the deprecated SnappyInputStream and SnappyOutputStream output stream implementations have been removed. Any existing data in these formats should be converted to the specification defined framed formats using an older version of this library.
1 parent 0e375d5 commit b55ef88

32 files changed

+1862
-2955
lines changed

.github/workflows/main.yml

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
name: ci
2+
3+
on:
4+
- push
5+
- pull_request
6+
7+
jobs:
8+
build:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- uses: actions/checkout@v1
12+
- uses: actions/setup-java@v3
13+
with:
14+
distribution: 'temurin'
15+
java-version: 8
16+
- name: Maven Install
17+
run: mvn install -B -V -DskipTests -Dair.check.skip-all
18+
- name: Maven Tests
19+
run: mvn install -B -P ci

README.md

-216
Original file line numberDiff line numberDiff line change
@@ -1,219 +1,3 @@
11
# 🚚 MOVED 🚚
22

33
### __Future development of Snappy without JNI has moved to [aircompressor](https://github.com/airlift/aircompressor)__
4-
5-
</br>
6-
7-
# Snappy in Java
8-
9-
This is a rewrite (port) of [Snappy](http://code.google.com/p/snappy/) written in
10-
pure Java. This compression code produces a byte-for-byte exact copy of the output
11-
created by the original C++ code, and extremely fast.
12-
13-
# Performance
14-
15-
The Snappy micro-benchmark has been ported, and can be used to measure
16-
the performance of this code against the excellent Snappy JNI wrapper from
17-
[xerial](http://code.google.com/p/snappy-java/). As you can see in the results
18-
below, the pure Java port is 20-30% faster for block compress, 0-10% slower
19-
for block uncompress, and 0-5% slower for round-trip block compression. These
20-
results were run with Java 7 on a Core i7, 64-bit Mac.
21-
22-
As a second more independent test, the performance has been measured using the
23-
Ning JVM compression benchmark against Snappy JNI, and the pure Java
24-
[Ning LZF](https://github.com/ning/compress) codec. The
25-
[results](http://dain.github.com/snappy/) show that the pure Java Snappy is
26-
20-30% faster than JNI Snappy for compression, and is typically 10-20% slower
27-
for decompression. Both, the pure Java Snappy and JNI Snappy implementations
28-
are faster that the Ning LZF codec. These results were run with Java 6 on a
29-
Core i7, 64-bit Mac.
30-
31-
The difference in performance between these two tests is due to the difference
32-
in JVM version; Java 7 is consistently 5-10% faster than Java 6 in the
33-
compression code. As with all benchmarks your mileage will vary, so test with
34-
your actual use case.
35-
36-
37-
38-
### Block Compress
39-
<pre><code>
40-
JNI Java JNI Java
41-
Input Size Compress Compress Throughput Throughput Change
42-
---------------------------------------------------------------------
43-
html 102400 76.4% 76.4% 294.9MB/s 384.8MB/s +30.5% html
44-
urls 702087 49.1% 49.1% 178.7MB/s 226.5MB/s +26.8% urls
45-
jpg 126958 0.1% 0.1% 2.7GB/s 3.2GB/s +17.4% jpg (not compressible)
46-
pdf 94330 17.9% 17.9% 642.4MB/s 910.3MB/s +41.7% pdf
47-
html4 409600 76.4% 76.4% 289.2MB/s 377.3MB/s +30.5% html4
48-
cp 24603 51.9% 51.9% 166.4MB/s 233.7MB/s +40.5% cp
49-
c 11150 57.6% 57.6% 177.1MB/s 295.4MB/s +66.8% c
50-
lsp 3721 51.6% 51.6% 245.5MB/s 278.0MB/s +13.2% lsp
51-
xls 1029744 58.7% 58.7% 263.0MB/s 292.5MB/s +11.2% xls
52-
txt1 152089 40.2% 40.2% 116.8MB/s 163.1MB/s +39.7% txt1
53-
txt2 125179 35.9% 35.9% 112.5MB/s 153.4MB/s +36.3% txt2
54-
txt3 426754 42.9% 42.9% 123.3MB/s 169.8MB/s +37.6% txt3
55-
txt4 481861 31.7% 31.7% 107.8MB/s 146.2MB/s +35.6% txt4
56-
bin 513216 81.8% 81.8% 413.1MB/s 497.8MB/s +20.5% bin
57-
sum 38240 48.1% 48.1% 162.4MB/s 213.9MB/s +31.7% sum
58-
man 4227 40.6% 40.6% 194.6MB/s 241.7MB/s +24.2% man
59-
pb 118588 76.8% 76.8% 363.7MB/s 450.3MB/s +23.8% pb
60-
gaviota 184320 61.7% 61.7% 166.7MB/s 253.7MB/s +52.2% gaviota
61-
</code></pre>
62-
63-
64-
### Block Uncompress
65-
<pre><code>
66-
JNI Java JNI Java
67-
Input Size Compress Compress Throughput Throughput Change
68-
---------------------------------------------------------------------
69-
html 102400 76.4% 76.4% 1.5GB/s 1.3GB/s -12.2% html
70-
urls 702087 49.1% 49.1% 969.2MB/s 827.5MB/s -14.6% urls
71-
jpg 126958 0.1% 0.1% 18.6GB/s 19.4GB/s +4.2% jpg (not compressible)
72-
pdf 94330 17.9% 17.9% 4.1GB/s 3.7GB/s -8.8% pdf
73-
html4 409600 76.4% 76.4% 1.5GB/s 1.2GB/s -16.8% html4
74-
cp 24603 51.9% 51.9% 965.2MB/s 956.0MB/s -1.0% cp
75-
c 11150 57.6% 57.6% 989.1MB/s 924.9MB/s -6.5% c
76-
lsp 3721 51.6% 51.6% 991.6MB/s 964.8MB/s -2.7% lsp
77-
xls 1029744 58.7% 58.7% 798.4MB/s 747.3MB/s -6.4% xls
78-
txt1 152089 40.2% 40.2% 643.8MB/s 580.8MB/s -9.8% txt1
79-
txt2 125179 35.9% 35.9% 610.0MB/s 549.6MB/s -9.9% txt2
80-
txt3 426754 42.9% 42.9% 683.8MB/s 614.4MB/s -10.2% txt3
81-
txt4 481861 31.7% 31.7% 565.4MB/s 505.5MB/s -10.6% txt4
82-
bin 513216 81.8% 81.8% 1.5GB/s 1.2GB/s -20.4% bin
83-
sum 38240 48.1% 48.1% 838.1MB/s 771.6MB/s -7.9% sum
84-
man 4227 40.6% 40.6% 856.9MB/s 847.2MB/s -1.1% man
85-
pb 118588 76.8% 76.8% 1.7GB/s 1.5GB/s -12.9% pb
86-
gaviota 184320 61.7% 61.7% 769.1MB/s 693.4MB/s -9.9% gaviota
87-
</code></pre>
88-
89-
90-
### Block Round Trip
91-
<pre><code>
92-
JNI Java JNI Java
93-
Input Size Compress Compress Throughput Throughput Change
94-
---------------------------------------------------------------------
95-
html 102400 76.4% 76.4% 300.3MB/s 287.1MB/s -4.4% html
96-
urls 702087 49.1% 49.1% 182.7MB/s 177.0MB/s -3.2% urls
97-
jpg 126958 0.1% 0.1% 2.6GB/s 2.6GB/s +1.1% jpg (not compressible)
98-
pdf 94330 17.9% 17.9% 695.3MB/s 680.0MB/s -2.2% pdf
99-
html4 409600 76.4% 76.4% 296.4MB/s 282.1MB/s -4.8% html4
100-
cp 24603 51.9% 51.9% 177.0MB/s 172.5MB/s -2.5% cp
101-
c 11150 57.6% 57.6% 221.7MB/s 218.3MB/s -1.5% c
102-
lsp 3721 51.6% 51.6% 217.3MB/s 216.3MB/s -0.5% lsp
103-
xls 1029744 58.7% 58.7% 213.3MB/s 209.9MB/s -1.6% xls
104-
txt1 152089 40.2% 40.2% 129.4MB/s 126.3MB/s -2.4% txt1
105-
txt2 125179 35.9% 35.9% 121.7MB/s 118.8MB/s -2.4% txt2
106-
txt3 426754 42.9% 42.9% 135.2MB/s 132.8MB/s -1.8% txt3
107-
txt4 481861 31.7% 31.7% 115.2MB/s 113.0MB/s -1.9% txt4
108-
bin 513216 81.8% 81.8% 371.2MB/s 350.7MB/s -5.5% bin
109-
sum 38240 48.1% 48.1% 164.2MB/s 160.0MB/s -2.6% sum
110-
man 4227 40.6% 40.6% 184.8MB/s 185.3MB/s +0.3% man
111-
pb 118588 76.8% 76.8% 344.1MB/s 326.3MB/s -5.2% pb
112-
gaviota 184320 61.7% 61.7% 188.0MB/s 185.2MB/s -1.5% gaviota
113-
</code></pre>
114-
115-
# Stream Format
116-
117-
There is no defined stream format for Snappy, but there is an effort to create
118-
a common format with the Google Snappy project.
119-
120-
The stream format used in this library has a couple of unique features not
121-
found in the other Snappy stream formats. Like the other formats, the user
122-
input is broken into blocks and each block is compressed. If the compressed
123-
block is smaller that the user input, the compressed block is written,
124-
otherwise the uncompressed original is written. This dramatically improves the
125-
speed of uncompressible input such as JPG images. Additionally, a checksum of
126-
the user input data for each block is written to the stream. This safety check
127-
assures that the stream has not been corrupted in transit or by a bad Snappy
128-
implementation. Finally, like gzip, compressed Snappy files can be
129-
concatenated together without issue, since the input stream will ignore a
130-
Snappy stream header in the middle of a stream. This makes combining files in
131-
Hadoop and S3 trivial.
132-
133-
The the SnappyOutputStream javadocs contain formal definition of the stream
134-
format.
135-
136-
## Stream Performance
137-
138-
The streaming mode performance can not be directly compared to other
139-
compression algorithms since most formats do not contain a checksum. The basic
140-
streaming code is significantly faster that the Snappy JNI library due to
141-
the completely unoptimized stream implementation in Snappy JNI, but once the
142-
check sum is enabled the performance drops off by about 20%.
143-
144-
### Stream Compress (no checksums)
145-
<pre><code>
146-
JNI Java JNI Java
147-
Input Size Compress Compress Throughput Throughput Change
148-
---------------------------------------------------------------------
149-
html 102400 76.4% 76.4% 275.8MB/s 373.5MB/s +35.4% html
150-
urls 702087 49.1% 49.1% 176.5MB/s 225.2MB/s +27.6% urls
151-
jpg 126958 0.1% -0.0% 1.7GB/s 2.0GB/s +15.8% jpg (not compressible)
152-
pdf 94330 17.8% 16.0% 557.2MB/s 793.2MB/s +42.4% pdf
153-
html4 409600 76.4% 76.4% 281.0MB/s 369.9MB/s +31.7% html4
154-
cp 24603 51.8% 51.8% 151.7MB/s 214.3MB/s +41.3% cp
155-
c 11150 57.4% 57.5% 149.1MB/s 243.3MB/s +63.1% c
156-
lsp 3721 51.1% 51.2% 141.3MB/s 181.1MB/s +28.2% lsp
157-
xls 1029744 58.6% 58.6% 253.9MB/s 290.5MB/s +14.4% xls
158-
txt1 152089 40.2% 40.2% 114.8MB/s 159.4MB/s +38.8% txt1
159-
txt2 125179 35.9% 35.9% 110.0MB/s 150.4MB/s +36.7% txt2
160-
txt3 426754 42.9% 42.9% 121.0MB/s 167.9MB/s +38.8% txt3
161-
txt4 481861 31.6% 31.6% 105.1MB/s 143.2MB/s +36.2% txt4
162-
bin 513216 81.8% 81.8% 387.7MB/s 484.5MB/s +25.0% bin
163-
sum 38240 48.1% 48.1% 153.0MB/s 203.1MB/s +32.8% sum
164-
man 4227 40.2% 40.3% 125.9MB/s 171.9MB/s +36.5% man
165-
pb 118588 76.8% 76.8% 342.2MB/s 431.4MB/s +26.1% pb
166-
gaviota 184320 61.7% 61.7% 161.1MB/s 246.1MB/s +52.7% gaviota
167-
</code></pre>
168-
169-
170-
### Stream Uncompress (no checksums)
171-
<pre><code>
172-
JNI Java JNI Java
173-
Input Size Compress Compress Throughput Throughput Change
174-
---------------------------------------------------------------------
175-
html 102400 76.4% 76.4% 1.2GB/s 1.2GB/s +0.4% html
176-
urls 702087 49.1% 49.1% 853.9MB/s 786.6MB/s -7.9% urls
177-
jpg 126958 0.1% -0.0% 3.0GB/s 10.3GB/s +239.0% jpg (not compressible)
178-
pdf 94330 17.8% 16.0% 2.0GB/s 3.4GB/s +71.5% pdf
179-
html4 409600 76.4% 76.4% 1.2GB/s 1.1GB/s -8.4% html4
180-
cp 24603 51.8% 51.8% 785.2MB/s 905.6MB/s +15.3% cp
181-
c 11150 57.4% 57.5% 778.9MB/s 889.7MB/s +14.2% c
182-
lsp 3721 51.1% 51.2% 739.0MB/s 905.5MB/s +22.5% lsp
183-
xls 1029744 58.6% 58.6% 730.3MB/s 718.8MB/s -1.6% xls
184-
txt1 152089 40.2% 40.2% 582.4MB/s 559.0MB/s -4.0% txt1
185-
txt2 125179 35.9% 35.9% 540.7MB/s 526.4MB/s -2.6% txt2
186-
txt3 426754 42.9% 42.9% 620.5MB/s 583.9MB/s -5.9% txt3
187-
txt4 481861 31.6% 31.6% 519.4MB/s 487.0MB/s -6.2% txt4
188-
bin 513216 81.8% 81.8% 1.2GB/s 1.1GB/s -11.6% bin
189-
sum 38240 48.1% 48.1% 693.4MB/s 742.4MB/s +7.1% sum
190-
man 4227 40.2% 40.3% 637.3MB/s 784.3MB/s +23.1% man
191-
pb 118588 76.8% 76.8% 1.4GB/s 1.4GB/s +0.4% pb
192-
gaviota 184320 61.7% 61.7% 688.5MB/s 668.2MB/s -3.0% gaviota
193-
</code></pre>
194-
195-
196-
### Stream RoundTrip (no checksums)
197-
<pre><code>
198-
JNI Java JNI Java
199-
Input Size Compress Compress Throughput Throughput Change
200-
---------------------------------------------------------------------
201-
html 102400 76.4% 76.4% 223.8MB/s 272.5MB/s +21.8% html
202-
urls 702087 49.1% 49.1% 142.8MB/s 174.1MB/s +22.0% urls
203-
jpg 126958 0.1% -0.0% 1.1GB/s 1.6GB/s +52.1% jpg (not compressible)
204-
pdf 94330 17.8% 16.0% 421.9MB/s 610.1MB/s +44.6% pdf
205-
html4 409600 76.4% 76.4% 226.2MB/s 275.5MB/s +21.8% html4
206-
cp 24603 51.8% 51.8% 125.3MB/s 160.3MB/s +27.9% cp
207-
c 11150 57.4% 57.5% 125.1MB/s 183.2MB/s +46.5% c
208-
lsp 3721 51.1% 51.2% 130.6MB/s 149.5MB/s +14.5% lsp
209-
xls 1029744 58.6% 58.6% 188.2MB/s 206.1MB/s +9.5% xls
210-
txt1 152089 40.2% 40.2% 95.3MB/s 123.3MB/s +29.4% txt1
211-
txt2 125179 35.9% 35.9% 91.4MB/s 116.8MB/s +27.9% txt2
212-
txt3 426754 42.9% 42.9% 101.3MB/s 130.3MB/s +28.6% txt3
213-
txt4 481861 31.6% 31.6% 87.9MB/s 111.1MB/s +26.3% txt4
214-
bin 513216 81.8% 81.8% 294.7MB/s 337.9MB/s +14.7% bin
215-
sum 38240 48.1% 48.1% 122.9MB/s 152.9MB/s +24.3% sum
216-
man 4227 40.2% 40.3% 113.0MB/s 139.1MB/s +23.1% man
217-
pb 118588 76.8% 76.8% 269.5MB/s 313.8MB/s +16.4% pb
218-
gaviota 184320 61.7% 61.7% 131.1MB/s 180.3MB/s +37.6% gaviota
219-
</code></pre>

pom.xml

+21-50
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,11 @@
6262
<enabled>true</enabled>
6363
</snapshots>
6464
</repository>
65+
<repository>
66+
<id>maven_central</id>
67+
<name>Maven Central</name>
68+
<url>https://repo.maven.apache.org/maven2/</url>
69+
</repository>
6570
</repositories>
6671

6772
<distributionManagement>
@@ -78,29 +83,36 @@
7883
</distributionManagement>
7984

8085
<dependencies>
86+
<!--
87+
Hadoop is optional for the CompressionCodec. Only constants and
88+
interfaces are used from the Hadoop jar, so there this code
89+
does not have vulnerabilities from Hadoop.
90+
This specific version was chosen because, it has the least
91+
number of known vulnerabilities at this time.
92+
-->
8193
<dependency>
8294
<groupId>org.apache.hadoop</groupId>
83-
<artifactId>hadoop-core</artifactId>
84-
<version>0.20.2</version>
95+
<artifactId>hadoop-common</artifactId>
96+
<version>3.4.0</version>
8597
<optional>true</optional>
8698
<scope>provided</scope>
8799
</dependency>
88100
<dependency>
89101
<groupId>com.google.guava</groupId>
90102
<artifactId>guava</artifactId>
91-
<version>13.0.1</version>
103+
<version>33.2.0-jre</version>
92104
<scope>test</scope>
93105
</dependency>
94106
<dependency>
95107
<groupId>org.xerial.snappy</groupId>
96108
<artifactId>snappy-java</artifactId>
97-
<version>1.0.4.1</version>
109+
<version>1.1.10.4</version>
98110
<scope>test</scope>
99111
</dependency>
100112
<dependency>
101113
<groupId>org.testng</groupId>
102114
<artifactId>testng</artifactId>
103-
<version>6.0.1</version>
115+
<version>7.5.1</version>
104116
<scope>test</scope>
105117
</dependency>
106118
</dependencies>
@@ -110,7 +122,7 @@
110122
<plugin>
111123
<groupId>org.apache.maven.plugins</groupId>
112124
<artifactId>maven-enforcer-plugin</artifactId>
113-
<version>1.0</version>
125+
<version>3.4.1</version>
114126
<executions>
115127
<execution>
116128
<id>enforce-versions</id>
@@ -123,7 +135,7 @@
123135
<version>3.0.0</version>
124136
</requireMavenVersion>
125137
<requireJavaVersion>
126-
<version>1.6</version>
138+
<version>1.8</version>
127139
</requireJavaVersion>
128140
</rules>
129141
</configuration>
@@ -134,47 +146,6 @@
134146
<groupId>org.apache.maven.plugins</groupId>
135147
<artifactId>maven-source-plugin</artifactId>
136148
</plugin>
137-
138-
<plugin>
139-
<groupId>org.apache.maven.plugins</groupId>
140-
<artifactId>maven-jar-plugin</artifactId>
141-
<version>2.3.2</version>
142-
<executions>
143-
<execution>
144-
<id>binary</id>
145-
<phase>package</phase>
146-
<goals>
147-
<goal>jar</goal>
148-
</goals>
149-
<configuration>
150-
<classifier>bin</classifier>
151-
<archive>
152-
<manifest>
153-
<mainClass>org.iq80.snappy.Main</mainClass>
154-
</manifest>
155-
</archive>
156-
</configuration>
157-
</execution>
158-
</executions>
159-
</plugin>
160-
161-
<plugin>
162-
<groupId>org.skife.maven</groupId>
163-
<artifactId>really-executable-jar-maven-plugin</artifactId>
164-
<version>1.0.3</version>
165-
<executions>
166-
<execution>
167-
<phase>package</phase>
168-
<goals>
169-
<goal>really-executable-jar</goal>
170-
</goals>
171-
<configuration>
172-
<classifier>bin</classifier>
173-
</configuration>
174-
</execution>
175-
</executions>
176-
</plugin>
177-
178149
</plugins>
179150

180151
<pluginManagement>
@@ -212,8 +183,8 @@
212183
<artifactId>maven-compiler-plugin</artifactId>
213184
<version>2.3.2</version>
214185
<configuration>
215-
<source>1.6</source>
216-
<target>1.6</target>
186+
<source>1.8</source>
187+
<target>1.8</target>
217188
</configuration>
218189
</plugin>
219190

0 commit comments

Comments
 (0)