Skip to content

Commit 4601229

Browse files
committed
update docs; bump to 0.2.0
1 parent 909f2ba commit 4601229

File tree

7 files changed

+240
-103
lines changed

7 files changed

+240
-103
lines changed

INSTALLING.md

+29-12
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,51 @@
1-
# Prereqs
1+
# Pre-requisites
22

33
**install-as** https://github.com/mreiferson/go-install-as
44

5-
$ git clone https://github.com/bitly/go-install-as.git
6-
$ cd go-install-as
5+
$ git clone git://github.com/bitly/go-install-as.git
6+
$ cd <repo_root>
77
$ make
88

99
**simplejson** https://github.com/bitly/go-simplejson
1010

1111
# installed under a custom import path so you can control versioning
12-
$ git clone https://github.com/bitly/go-simplejson.git
13-
$ cd go-simplejson
12+
$ git clone git://github.com/bitly/go-simplejson.git
13+
$ cd <repo_root>
1414
$ go tool install_as --import-as=bitly/simplejson
1515

1616
**notify** https://github.com/bitly/go-notify
1717

1818
# installed under a custom import path so you can control versioning
19-
$ git clone https://github.com/bitly/go-notify.git
20-
$ cd go-notify
19+
$ git clone git://github.com/bitly/go-notify.git
20+
$ cd <repo_root>
2121
$ go tool install_as --import-as=bitly/notify
2222

23-
# installing nsq
23+
**assert** https://github.com/bmizerany/assert
2424

25-
$ cd nsqd
25+
$ go get github.com/bmizerany/assert
26+
27+
# Installing
28+
29+
$ git clone git://github.com/bitly/nsq.git
30+
31+
# nsq Go package (for building Go readers)
32+
$ cd <repo_root>/nsq
33+
$ go tool install_as --import-as=bitly/nsq
34+
35+
# nsqd binary
36+
$ cd <repo_root>/nsqd
2637
$ go build
38+
$ cp nsqd /usr/local/bin/
2739

28-
$ cd ../nsqlookupd
40+
# nsqlookupd binary
41+
$ cd <repo_root>/nsqlookupd
2942
$ go build
43+
$ cp nsqlookupd /usr/local/bin/
3044

31-
## Dependencies for tests
45+
# pynsq Python module (for building Python readers)
46+
$ cd <repo_root>/pynsq
47+
$ python setup.py install
48+
49+
# Testing
3250

33-
$ go get github.com/bmizerany/assert
3451
$ ./test.sh

LICENSE

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Permission is hereby granted, free of charge, to any person obtaining a copy
2+
of this software and associated documentation files (the "Software"), to deal
3+
in the Software without restriction, including without limitation the rights
4+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
5+
copies of the Software, and to permit persons to whom the Software is
6+
furnished to do so, subject to the following conditions:
7+
8+
The above copyright notice and this permission notice shall be included in
9+
all copies or substantial portions of the Software.
10+
11+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
13+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
14+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
15+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
16+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
17+
THE SOFTWARE.

README.md

+191-86
Original file line numberDiff line numberDiff line change
@@ -1,103 +1,208 @@
1-
### NSQ
1+
# Not Simple Queue
22

3-
An infrastructure component designed to support highly available, distributed, fault tolerant, "guaranteed" message
4-
delivery.
3+
An infrastructure component designed to support highly available, distributed, fault tolerant,
4+
loosely guaranteed message processing.
55

6-
#### Background
6+
## Background
77

8-
`simplequeue` was developed as a dead-simple in-memory message queue. It spoke HTTP and had no knowledge (or care) for
9-
the data you put in or took out. Life was good.
8+
[simplequeue][1] was developed, you guessed it, as a *simple* in-memory message queue with an HTTP
9+
interface. It is agnostic to the type and format of the data you put in and take out.
1010

11-
We used `simplequeue` as the foundation for a distributed message queue. In production, we silo'd a `simplequeue` right
12-
where messages were produced (ie. frontends) and effectively reduced the potential for data loss in a system which did
13-
not persist messages (by guaranteeing that the loss of any single `simplequeue` would not prevent the rest of the
14-
message producers or workers, to function).
11+
We use `simplequeue` as the foundation for a distributed message queue by siloing an instance on
12+
each host that produces messages. This effectively reduces the potential for data loss in a system
13+
which otherwise does not persist messages (by guaranteeing that the loss of any single host would
14+
not prevent the rest of the message producers or consumers from functioning).
1515

16-
We added `pubsub`, an HTTP server to aggregate streams and provide a long-lived `/sub` endpoint. We leveraged `pubsub`
17-
to transmit streams across data-centers in order to daisy chain a given feed to various downstream services. A nice
18-
property of this setup is that producers are de-coupled from downstream consumers (a downstream consumer only needs to
19-
know of the `pubsub` to receive data).
16+
We also use [pubsub][1], an HTTP server to aggregate streams and provide an endpoint for N number of
17+
clients to subscribe. We use it to transmit streams across hosts (or datacenters) and be queued
18+
again for writing to various downstream services.
2019

21-
There are a few issues with this combination of tools...
20+
We use this foundation to process 100s of millions of messages a day. It is the core upon which
21+
*everything* is built.
2222

23-
One is simply the operational complexity of having to setup the data pipe to begin with. Often this involves services
24-
setup as follows:
23+
This setup has several nice properties:
2524

26-
`api` > `simplequeue` > `queuereader` > `pubsub` > `ps_to_http` > `simplequeue` > `queuereader`
25+
* producers are de-coupled from downstream consumers
26+
* no producer-side single point of failures
27+
* easy to interact with (all HTTP)
2728

28-
Of particular note are the `pubsub` > `ps_to_http` links. We repeatedly encounter the problem of consuming a single
29-
stream with the desire to avoid a SPOF. You have 2 options, none ideal. Often we just put the `ps_to_http` process on a
30-
single box and pray. Alternatively we've chosen to consume the full stream multiple times but only process a % of the
31-
stream on a given host (essentially sharding). To make things even more complicated we need to repeat this chain for
32-
each stream of data we're interested in.
29+
But, it also has its issues...
3330

34-
Messages traveling through the system have no guarantee that they will be delivered to a client and the responsibility
35-
of requeueing is placed on the client. This churn of messages being passed back and forth increases the potential for
36-
errors resulting in message loss.
31+
One is simply the operational overhead/complexity of having to setup and configure the various tools
32+
in the chain. This often looks like:
3733

38-
#### Enter NSQ
34+
api > simplequeue > queuereader > pubsub > ps_to_http > simplequeue > queuereader
3935

40-
`NSQ` is designed to address the fragile nature of the combination of components listed above as well as provide
41-
high-availability as a byproduct of a messaging pattern that includes no SPOF. It also addresses the need for stronger
42-
guarantees around the delivery of a message.
36+
Of particular note are the `pubsub > ps_to_http` links. Given this setup, consuming a stream in a
37+
way that avoids SPOFs is a challenge. There are two options, neither of which is ideal:
4338

44-
A single `nsqd` process handles multiple "topics" (by convention, this would previously have been referred to as a
45-
"stream"). Second, a topic can have multiple "channels". In practice, a channel maps to a downstream service. Each
46-
channel receives all the messages from a topic. The channels buffer data independently of each other, preventing a slow
47-
consumer from causing a backlog for other channels. A channel can have multiple clients, a message (assuming successful
48-
delivery) will only be delivered to one of the connected clients, at random.
39+
1. just put the `ps_to_http` process on a single box and pray
40+
2. shard by *consuming* the full stream but *processing* only a percentage of it on each host
41+
(though this does not resolve the issue of seamless failover)
4942

50-
For example, the "decodes" topic could have a channel for "clickatron", "spam", and "fishnet", etc. The benefit should
51-
be easy to see, there are no additional services needed to be setup for new queues or to daisy chain a new downstream
52-
service.
43+
To make things even more complicated, we need to repeat this for *each* stream of data we're
44+
interested in.
5345

54-
`NSQ` is fully distributed with no single broker or point of failure. `nsqd` clients (aka "queuereaders") are connected
55-
over TCP sockets to **all** `nsqd` instances providing the specified topic. There are no middle-men, no brokers, and no
56-
SPOF. The *topology* solves the problems described above:
46+
Also, messages traveling through the system have no delivery guarantee and the responsibility of
47+
requeueing is placed on the client (for instance, if processing fails). This churn increases the
48+
potential for situations that result in message loss.
49+
50+
## Enter NSQ
51+
52+
`NSQ` is designed to:
53+
54+
1. greatly simplify configuration complexity
55+
2. provide a straightforward upgrade path
56+
3. provide easy topology solutions that enable high-availability and eliminate SPOFs
57+
4. address the need for stronger message delivery guarantees
58+
5. bound the memory footprint of a single process (by persisting some messages to disk)
59+
6. improve efficiency
60+
61+
### Simplifying Configuration Complexity
62+
63+
A single `nsqd` instance is designed to handle multiple streams of data at once. Streams are called
64+
"topics" and a topic has 1 or more "channels". Each channel receives a *copy* of all the
65+
messages for a topic. In practice, a channel maps to a downstream service consuming a topic.
66+
67+
Topics and channels all buffer data independently of each other, preventing a slow consumer from
68+
causing a backlog for other channels (the same applies at the topic level).
69+
70+
A channel can, and generally does, have multiple clients connected. Assuming all connected clients
71+
are in a state where they are ready to receive messages, a message will be delivered to a random
72+
client.
73+
74+
For example:
75+
76+
"clicks" (topic)
77+
| |- client
78+
|>---- "metrics" (channel) --<|- ...
79+
| |- client
80+
|
81+
| |- client
82+
|>---- "spam_analysis" (channel) --<|- ...
83+
| |- client
84+
|
85+
| |- client
86+
|>---- "archive" (channel) --<|- ...
87+
|- client
88+
89+
Configuration is greatly simplified because there is no additional setup required to introduce a new
90+
distinct consumer for a given topic nor is there any need to setup new services to introduce a new
91+
topic.
92+
93+
`NSQ` also includes a helper application, `nsqlookupd`, which provides a directory service where
94+
consumers can lookup the addresses of `nsqd` instances that provide the topics they are interested
95+
in subscribing to. In terms of configuration, this decouples the consumers from the producers (they
96+
both individually only need to know where to contact common instances of `nsqlookupd`, never each
97+
other) reducing complexity and maintenance.
98+
99+
At a lower level each `nsqd` has a long-lived TCP connection to `nsqlookupd` over which it
100+
periodically pushes its state. This data is used to inform which `nsqd` addresses `nsqlookupd` will
101+
give to consumers. For consumers, a HTTP `/lookup` endpoint is exposed for polling.
102+
103+
NOTE: in future versions, the heuristic `nsqlookupd` uses to return addresses could be based on
104+
depth, number of connected clients, or other "intelligent" strategies. The current implementation is
105+
simply *all*. Ultimately, the goal is to ensure that all producers are being read from such that
106+
depth stays near zero.
107+
108+
### Straightforward Upgrade Path
109+
110+
This was one of our **highest** priorities. Our production systems handle a large volume of traffic,
111+
all built upon our existing messaging tools, so we needed a way to slowly and methodically upgrade
112+
specific parts of our infrastructure with little to no impact.
113+
114+
First, on the message *producer* side we built `nsqd` to match `simplequeue`, ie. a HTTP `/put`
115+
endpoint to POST binary data (with the one caveat that the endpoint takes an additional query
116+
parameter specifying the "topic"). Services that wanted to start writing to `nsqd` now just had to
117+
point to `nsqd`.
118+
119+
Second, we built libraries in both Python and Go that matched the functionality and idioms we had
120+
been accustomed to in our existing libraries. This eased the transition on the message *consumer*
121+
side by limiting the code changes to bootstrapping. All business logic remained the same.
122+
123+
Finally, we built utilities to glue old and new components together. These are all available in the
124+
`examples` directory in the repository:
125+
126+
* `nsq_pubsub` - expose a `pubsub` like HTTP interface to topics in an `NSQ` cluster
127+
* `nsq_to_file` - durably write all messages for a given topic to a file
128+
* `nsq_to_http` - perform HTTP requests for all messages in a topic to (multiple) endpoints
129+
130+
### Eliminating SPOFs
131+
132+
`NSQ` is designed to be used in a distributed fashion. `nsqd` clients are connected (over TCP) to
133+
**all** instances providing the specified topic. There are no middle-men, no brokers, and no SPOFs:
57134

58135
NSQ NSQ NSQ
59-
\ /\ /
60-
\ / \ /
61-
\ / \ /
62-
X X
63-
/ \ / \
64-
/ \ / \
65-
... consumers ...
66-
67-
You don't have to deal with figuring out how to robustly distribute an aggregated feed, instead you consume directly
68-
from *all* producers. It also doesn't *technically* matter which client connects to which `NSQ`, as long as there are
69-
enough clients connected to all producers to satisfy the volume of messages, you're guaranteed that all will be
70-
delivered downstream.
71-
72-
It's also worth pointing out the bandwidth efficiency when you have >1 consumers of a topic. You're no longer daisy
73-
chaining multiple copies of the stream over the network, it's all happening right at the source. Moving away from HTTP
74-
as the only available transport also significantly reduces the per-message overhead.
75-
76-
#### Message Delivery Guarantees
77-
78-
`NSQ` guarantees that a message will be delivered **at least once**. Duplicate messages are possible and downstream
79-
systems should be designed to perform idempotent operations.
80-
81-
It accomplishes this by performing a handshake with the client, as follows:
82-
83-
1. client GET message
84-
2. `NSQ` sends message and stores in temporary internal location
85-
3. client replies SUCCESS or FAIL
86-
* if client does not reply `NSQ` will automatically timeout and requeue the message
87-
4. `NSQ` requeues on FAIL and purges on SUCCESS
88-
89-
#### Lookup Service (nsqlookupd)
90-
91-
`NSQ` includes a helper application, `nsqlookupd`, which provides a directory service where queuereaders can lookup the
92-
addresses of `NSQ` instances that contain the topics they are interested in subscribing to. This decouples the consumers
93-
from the producers (they both individually only need to have intimate knowledge of `nsqlookupd`, never each other).
94-
95-
At a lower level each `nsqd` has a long-lived connection to `nsqlookupd` over which it periodically pushes it's state.
96-
This data is used to inform which addresses `nsqlookupd` will give to queuereaders. The heuristic could be based on
97-
depth, number of connected queuereaders or naive strategies like round-robin, etc. The goal is to ensure that all
98-
producers are being read from. On the client side an HTTP interface is exposed for queuereaders to poll.
99-
100-
High availability of `nsqlookupd` is achieved by running multiple instances. They don't communicate directly to each
101-
other and don't require strong data consistency between themselves. The data is considered *eventually* consistent, the
102-
queuereaders randomly choose a `nsqlookupd` to poll. Stale (or otherwise inaccessible) nodes don't grind the system to a
103-
halt.
136+
|\ /\ /|
137+
| \ / \ / |
138+
| \ / \ / |
139+
| X X |
140+
| / \ / \ |
141+
| / \ / \ |
142+
C C C (consumers)
143+
144+
This topology eliminates the need to chain single, aggregated, feeds. Instead you consume directly
145+
from **all** producers. *Technically*, it doesn't matter which client connects to which `NSQ`, as
146+
long as there are enough clients connected to all producers to satisfy the volume of messages,
147+
you're guaranteed that all will be eventually processed.
148+
149+
For `nsqlookupd`, high availability is achieved by running multiple instances. They don't
150+
communicate directly to each other and data is considered to be eventually consistent. Consumers
151+
poll *all* of the `nsqlookupd` instances they are configured with resulting in a union of all the
152+
responses. Stale, inaccessible, or otherwise faulty nodes don't grind the system to a halt.
153+
154+
### Message Delivery Guarantees
155+
156+
`NSQ` guarantees that a message will be delivered **at least once**. Duplicate messages are a
157+
possibility and downstream systems should be designed to perform idempotent operations.
158+
159+
This is accomplished by performing a handshake with the client, as follows (assume the client has
160+
successfully connected and subscribed to a topic):
161+
162+
1. client sends RDY command indicating the # of messages they are willing to accept
163+
2. `NSQ` sends message and stores in temporary internal location (decrementing RDY count for that
164+
client)
165+
3. client replies FIN or REQ indicating success or failure (requeue) respectively (if client does
166+
not reply`NSQ` will automatically timeout after a configurable amount of time and automatically
167+
REQ the message)
168+
169+
This ensures that the only edge case that would result in message loss is an unclean shutdown of a
170+
`nsqd` process (and only for the messages that were in memory).
171+
172+
### Bounded Memory Footprint
173+
174+
`nsqd` provides a configuration option (`--mem-queue-size`) that will determine the number of
175+
messages that are kept in memory for a given queue (for both topics *and* channels). If the depth of
176+
a queue exceeds this threshold messages will be written to disk. This bounds the footprint of a
177+
given `nsqd` process to:
178+
179+
mem-queue-size * #_of_channels_and_topics
180+
181+
Also, an astute observer might have identified that this is a convenient way to gain an even higher
182+
guarantee of delivery by setting this value to something low (like 1 or even 0). The disk-backed
183+
queue is designed to survive unclean restarts (although messages might be delivered twice).
184+
185+
Also, related to message delivery guarantees, *clean* shutdowns (by sending a `nsqd` process the
186+
TERM signal) safely persist messages in memory, in-flight, deferred, and in various internal
187+
buffers.
188+
189+
### Efficiency
190+
191+
`NSQ` was designed to communicate over a `memcached` like command protocol with simple size-prefixed
192+
responses. Compared to the previous toolchain using HTTP, this is significantly more efficient
193+
per-message.
194+
195+
Also, by eliminating the daisy chained stream copies, configuration, setup, and development time is
196+
greatly reduced (especially in cases where there are >1 consumers of a topic). All the duplication
197+
happens at the source.
198+
199+
Finally, because `simplequeue` has no high-level functionality built in, the client is responsible
200+
for maintaining message state (it does so by embedding this information and sending the message back
201+
and forth for retries). In `NSQ` all of this information is kept in the core.
202+
203+
## EOL
204+
205+
We've been using `NSQ` in production for several months. It's already made an improvement in terms
206+
of robustness, development time, and simplicity to systems that have moved over to it.
207+
208+
[1]: https://github.com/bitly/simplehttp

TODO.md

-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11

22
### Roadmap
33

4-
* v0.1 working nsqd/go-nsqreader (useable drop in replacement for simplequeue/go-queuereader)
5-
* v0.2 working python client (usable drop in replacement for simplequeue/BaseReader)
64
* v0.3 nsqadmin - more complete administrative commands, UI for topology/stats
75
* upon topic creation, lookup channels against lookupd
86
* cleanup (expire) topics/channels

nsq/version.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
package nsq
22

3-
const VERSION = "0.1.29"
3+
const VERSION = "0.2.0"

nsqd/version.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
package main
22

3-
const VERSION = "0.1.29"
3+
const VERSION = "0.2.0"

0 commit comments

Comments
 (0)