|
1 |
| -### NSQ |
| 1 | +# Not Simple Queue |
2 | 2 |
|
3 |
| -An infrastructure component designed to support highly available, distributed, fault tolerant, "guaranteed" message |
4 |
| -delivery. |
| 3 | +An infrastructure component designed to support highly available, distributed, fault tolerant, |
| 4 | +loosely guaranteed message processing. |
5 | 5 |
|
6 |
| -#### Background |
| 6 | +## Background |
7 | 7 |
|
8 |
| -`simplequeue` was developed as a dead-simple in-memory message queue. It spoke HTTP and had no knowledge (or care) for |
9 |
| -the data you put in or took out. Life was good. |
| 8 | +[simplequeue][1] was developed, you guessed it, as a *simple* in-memory message queue with an HTTP |
| 9 | +interface. It is agnostic to the type and format of the data you put in and take out. |
10 | 10 |
|
11 |
| -We used `simplequeue` as the foundation for a distributed message queue. In production, we silo'd a `simplequeue` right |
12 |
| -where messages were produced (ie. frontends) and effectively reduced the potential for data loss in a system which did |
13 |
| -not persist messages (by guaranteeing that the loss of any single `simplequeue` would not prevent the rest of the |
14 |
| -message producers or workers, to function). |
| 11 | +We use `simplequeue` as the foundation for a distributed message queue by siloing an instance on |
| 12 | +each host that produces messages. This effectively reduces the potential for data loss in a system |
| 13 | +which otherwise does not persist messages (by guaranteeing that the loss of any single host would |
| 14 | +not prevent the rest of the message producers or consumers from functioning). |
15 | 15 |
|
16 |
| -We added `pubsub`, an HTTP server to aggregate streams and provide a long-lived `/sub` endpoint. We leveraged `pubsub` |
17 |
| -to transmit streams across data-centers in order to daisy chain a given feed to various downstream services. A nice |
18 |
| -property of this setup is that producers are de-coupled from downstream consumers (a downstream consumer only needs to |
19 |
| -know of the `pubsub` to receive data). |
| 16 | +We also use [pubsub][1], an HTTP server to aggregate streams and provide an endpoint for N number of |
| 17 | +clients to subscribe. We use it to transmit streams across hosts (or datacenters) and be queued |
| 18 | +again for writing to various downstream services. |
20 | 19 |
|
21 |
| -There are a few issues with this combination of tools... |
| 20 | +We use this foundation to process 100s of millions of messages a day. It is the core upon which |
| 21 | +*everything* is built. |
22 | 22 |
|
23 |
| -One is simply the operational complexity of having to setup the data pipe to begin with. Often this involves services |
24 |
| -setup as follows: |
| 23 | +This setup has several nice properties: |
25 | 24 |
|
26 |
| - `api` > `simplequeue` > `queuereader` > `pubsub` > `ps_to_http` > `simplequeue` > `queuereader` |
| 25 | + * producers are de-coupled from downstream consumers |
| 26 | + * no producer-side single point of failures |
| 27 | + * easy to interact with (all HTTP) |
27 | 28 |
|
28 |
| -Of particular note are the `pubsub` > `ps_to_http` links. We repeatedly encounter the problem of consuming a single |
29 |
| -stream with the desire to avoid a SPOF. You have 2 options, none ideal. Often we just put the `ps_to_http` process on a |
30 |
| -single box and pray. Alternatively we've chosen to consume the full stream multiple times but only process a % of the |
31 |
| -stream on a given host (essentially sharding). To make things even more complicated we need to repeat this chain for |
32 |
| -each stream of data we're interested in. |
| 29 | +But, it also has its issues... |
33 | 30 |
|
34 |
| -Messages traveling through the system have no guarantee that they will be delivered to a client and the responsibility |
35 |
| -of requeueing is placed on the client. This churn of messages being passed back and forth increases the potential for |
36 |
| -errors resulting in message loss. |
| 31 | +One is simply the operational overhead/complexity of having to setup and configure the various tools |
| 32 | +in the chain. This often looks like: |
37 | 33 |
|
38 |
| -#### Enter NSQ |
| 34 | + api > simplequeue > queuereader > pubsub > ps_to_http > simplequeue > queuereader |
39 | 35 |
|
40 |
| -`NSQ` is designed to address the fragile nature of the combination of components listed above as well as provide |
41 |
| -high-availability as a byproduct of a messaging pattern that includes no SPOF. It also addresses the need for stronger |
42 |
| -guarantees around the delivery of a message. |
| 36 | +Of particular note are the `pubsub > ps_to_http` links. Given this setup, consuming a stream in a |
| 37 | +way that avoids SPOFs is a challenge. There are two options, neither of which is ideal: |
43 | 38 |
|
44 |
| -A single `nsqd` process handles multiple "topics" (by convention, this would previously have been referred to as a |
45 |
| -"stream"). Second, a topic can have multiple "channels". In practice, a channel maps to a downstream service. Each |
46 |
| -channel receives all the messages from a topic. The channels buffer data independently of each other, preventing a slow |
47 |
| -consumer from causing a backlog for other channels. A channel can have multiple clients, a message (assuming successful |
48 |
| -delivery) will only be delivered to one of the connected clients, at random. |
| 39 | + 1. just put the `ps_to_http` process on a single box and pray |
| 40 | + 2. shard by *consuming* the full stream but *processing* only a percentage of it on each host |
| 41 | + (though this does not resolve the issue of seamless failover) |
49 | 42 |
|
50 |
| -For example, the "decodes" topic could have a channel for "clickatron", "spam", and "fishnet", etc. The benefit should |
51 |
| -be easy to see, there are no additional services needed to be setup for new queues or to daisy chain a new downstream |
52 |
| -service. |
| 43 | +To make things even more complicated, we need to repeat this for *each* stream of data we're |
| 44 | +interested in. |
53 | 45 |
|
54 |
| -`NSQ` is fully distributed with no single broker or point of failure. `nsqd` clients (aka "queuereaders") are connected |
55 |
| -over TCP sockets to **all** `nsqd` instances providing the specified topic. There are no middle-men, no brokers, and no |
56 |
| -SPOF. The *topology* solves the problems described above: |
| 46 | +Also, messages traveling through the system have no delivery guarantee and the responsibility of |
| 47 | +requeueing is placed on the client (for instance, if processing fails). This churn increases the |
| 48 | +potential for situations that result in message loss. |
| 49 | + |
| 50 | +## Enter NSQ |
| 51 | + |
| 52 | +`NSQ` is designed to: |
| 53 | + |
| 54 | + 1. greatly simplify configuration complexity |
| 55 | + 2. provide a straightforward upgrade path |
| 56 | + 3. provide easy topology solutions that enable high-availability and eliminate SPOFs |
| 57 | + 4. address the need for stronger message delivery guarantees |
| 58 | + 5. bound the memory footprint of a single process (by persisting some messages to disk) |
| 59 | + 6. improve efficiency |
| 60 | + |
| 61 | +### Simplifying Configuration Complexity |
| 62 | + |
| 63 | +A single `nsqd` instance is designed to handle multiple streams of data at once. Streams are called |
| 64 | +"topics" and a topic has 1 or more "channels". Each channel receives a *copy* of all the |
| 65 | +messages for a topic. In practice, a channel maps to a downstream service consuming a topic. |
| 66 | + |
| 67 | +Topics and channels all buffer data independently of each other, preventing a slow consumer from |
| 68 | +causing a backlog for other channels (the same applies at the topic level). |
| 69 | + |
| 70 | +A channel can, and generally does, have multiple clients connected. Assuming all connected clients |
| 71 | +are in a state where they are ready to receive messages, a message will be delivered to a random |
| 72 | +client. |
| 73 | + |
| 74 | +For example: |
| 75 | + |
| 76 | + "clicks" (topic) |
| 77 | + | |- client |
| 78 | + |>---- "metrics" (channel) --<|- ... |
| 79 | + | |- client |
| 80 | + | |
| 81 | + | |- client |
| 82 | + |>---- "spam_analysis" (channel) --<|- ... |
| 83 | + | |- client |
| 84 | + | |
| 85 | + | |- client |
| 86 | + |>---- "archive" (channel) --<|- ... |
| 87 | + |- client |
| 88 | + |
| 89 | +Configuration is greatly simplified because there is no additional setup required to introduce a new |
| 90 | +distinct consumer for a given topic nor is there any need to setup new services to introduce a new |
| 91 | +topic. |
| 92 | + |
| 93 | +`NSQ` also includes a helper application, `nsqlookupd`, which provides a directory service where |
| 94 | +consumers can lookup the addresses of `nsqd` instances that provide the topics they are interested |
| 95 | +in subscribing to. In terms of configuration, this decouples the consumers from the producers (they |
| 96 | +both individually only need to know where to contact common instances of `nsqlookupd`, never each |
| 97 | +other) reducing complexity and maintenance. |
| 98 | + |
| 99 | +At a lower level each `nsqd` has a long-lived TCP connection to `nsqlookupd` over which it |
| 100 | +periodically pushes its state. This data is used to inform which `nsqd` addresses `nsqlookupd` will |
| 101 | +give to consumers. For consumers, a HTTP `/lookup` endpoint is exposed for polling. |
| 102 | + |
| 103 | +NOTE: in future versions, the heuristic `nsqlookupd` uses to return addresses could be based on |
| 104 | +depth, number of connected clients, or other "intelligent" strategies. The current implementation is |
| 105 | +simply *all*. Ultimately, the goal is to ensure that all producers are being read from such that |
| 106 | +depth stays near zero. |
| 107 | + |
| 108 | +### Straightforward Upgrade Path |
| 109 | + |
| 110 | +This was one of our **highest** priorities. Our production systems handle a large volume of traffic, |
| 111 | +all built upon our existing messaging tools, so we needed a way to slowly and methodically upgrade |
| 112 | +specific parts of our infrastructure with little to no impact. |
| 113 | + |
| 114 | +First, on the message *producer* side we built `nsqd` to match `simplequeue`, ie. a HTTP `/put` |
| 115 | +endpoint to POST binary data (with the one caveat that the endpoint takes an additional query |
| 116 | +parameter specifying the "topic"). Services that wanted to start writing to `nsqd` now just had to |
| 117 | +point to `nsqd`. |
| 118 | + |
| 119 | +Second, we built libraries in both Python and Go that matched the functionality and idioms we had |
| 120 | +been accustomed to in our existing libraries. This eased the transition on the message *consumer* |
| 121 | +side by limiting the code changes to bootstrapping. All business logic remained the same. |
| 122 | + |
| 123 | +Finally, we built utilities to glue old and new components together. These are all available in the |
| 124 | +`examples` directory in the repository: |
| 125 | + |
| 126 | + * `nsq_pubsub` - expose a `pubsub` like HTTP interface to topics in an `NSQ` cluster |
| 127 | + * `nsq_to_file` - durably write all messages for a given topic to a file |
| 128 | + * `nsq_to_http` - perform HTTP requests for all messages in a topic to (multiple) endpoints |
| 129 | + |
| 130 | +### Eliminating SPOFs |
| 131 | + |
| 132 | +`NSQ` is designed to be used in a distributed fashion. `nsqd` clients are connected (over TCP) to |
| 133 | +**all** instances providing the specified topic. There are no middle-men, no brokers, and no SPOFs: |
57 | 134 |
|
58 | 135 | NSQ NSQ NSQ
|
59 |
| - \ /\ / |
60 |
| - \ / \ / |
61 |
| - \ / \ / |
62 |
| - X X |
63 |
| - / \ / \ |
64 |
| - / \ / \ |
65 |
| - ... consumers ... |
66 |
| - |
67 |
| -You don't have to deal with figuring out how to robustly distribute an aggregated feed, instead you consume directly |
68 |
| -from *all* producers. It also doesn't *technically* matter which client connects to which `NSQ`, as long as there are |
69 |
| -enough clients connected to all producers to satisfy the volume of messages, you're guaranteed that all will be |
70 |
| -delivered downstream. |
71 |
| - |
72 |
| -It's also worth pointing out the bandwidth efficiency when you have >1 consumers of a topic. You're no longer daisy |
73 |
| -chaining multiple copies of the stream over the network, it's all happening right at the source. Moving away from HTTP |
74 |
| -as the only available transport also significantly reduces the per-message overhead. |
75 |
| - |
76 |
| -#### Message Delivery Guarantees |
77 |
| - |
78 |
| -`NSQ` guarantees that a message will be delivered **at least once**. Duplicate messages are possible and downstream |
79 |
| -systems should be designed to perform idempotent operations. |
80 |
| - |
81 |
| -It accomplishes this by performing a handshake with the client, as follows: |
82 |
| - |
83 |
| - 1. client GET message |
84 |
| - 2. `NSQ` sends message and stores in temporary internal location |
85 |
| - 3. client replies SUCCESS or FAIL |
86 |
| - * if client does not reply `NSQ` will automatically timeout and requeue the message |
87 |
| - 4. `NSQ` requeues on FAIL and purges on SUCCESS |
88 |
| - |
89 |
| -#### Lookup Service (nsqlookupd) |
90 |
| - |
91 |
| -`NSQ` includes a helper application, `nsqlookupd`, which provides a directory service where queuereaders can lookup the |
92 |
| -addresses of `NSQ` instances that contain the topics they are interested in subscribing to. This decouples the consumers |
93 |
| -from the producers (they both individually only need to have intimate knowledge of `nsqlookupd`, never each other). |
94 |
| - |
95 |
| -At a lower level each `nsqd` has a long-lived connection to `nsqlookupd` over which it periodically pushes it's state. |
96 |
| -This data is used to inform which addresses `nsqlookupd` will give to queuereaders. The heuristic could be based on |
97 |
| -depth, number of connected queuereaders or naive strategies like round-robin, etc. The goal is to ensure that all |
98 |
| -producers are being read from. On the client side an HTTP interface is exposed for queuereaders to poll. |
99 |
| - |
100 |
| -High availability of `nsqlookupd` is achieved by running multiple instances. They don't communicate directly to each |
101 |
| -other and don't require strong data consistency between themselves. The data is considered *eventually* consistent, the |
102 |
| -queuereaders randomly choose a `nsqlookupd` to poll. Stale (or otherwise inaccessible) nodes don't grind the system to a |
103 |
| -halt. |
| 136 | + |\ /\ /| |
| 137 | + | \ / \ / | |
| 138 | + | \ / \ / | |
| 139 | + | X X | |
| 140 | + | / \ / \ | |
| 141 | + | / \ / \ | |
| 142 | + C C C (consumers) |
| 143 | + |
| 144 | +This topology eliminates the need to chain single, aggregated, feeds. Instead you consume directly |
| 145 | +from **all** producers. *Technically*, it doesn't matter which client connects to which `NSQ`, as |
| 146 | +long as there are enough clients connected to all producers to satisfy the volume of messages, |
| 147 | +you're guaranteed that all will be eventually processed. |
| 148 | + |
| 149 | +For `nsqlookupd`, high availability is achieved by running multiple instances. They don't |
| 150 | +communicate directly to each other and data is considered to be eventually consistent. Consumers |
| 151 | +poll *all* of the `nsqlookupd` instances they are configured with resulting in a union of all the |
| 152 | +responses. Stale, inaccessible, or otherwise faulty nodes don't grind the system to a halt. |
| 153 | + |
| 154 | +### Message Delivery Guarantees |
| 155 | + |
| 156 | +`NSQ` guarantees that a message will be delivered **at least once**. Duplicate messages are a |
| 157 | +possibility and downstream systems should be designed to perform idempotent operations. |
| 158 | + |
| 159 | +This is accomplished by performing a handshake with the client, as follows (assume the client has |
| 160 | +successfully connected and subscribed to a topic): |
| 161 | + |
| 162 | + 1. client sends RDY command indicating the # of messages they are willing to accept |
| 163 | + 2. `NSQ` sends message and stores in temporary internal location (decrementing RDY count for that |
| 164 | + client) |
| 165 | + 3. client replies FIN or REQ indicating success or failure (requeue) respectively (if client does |
| 166 | + not reply`NSQ` will automatically timeout after a configurable amount of time and automatically |
| 167 | + REQ the message) |
| 168 | + |
| 169 | +This ensures that the only edge case that would result in message loss is an unclean shutdown of a |
| 170 | +`nsqd` process (and only for the messages that were in memory). |
| 171 | + |
| 172 | +### Bounded Memory Footprint |
| 173 | + |
| 174 | +`nsqd` provides a configuration option (`--mem-queue-size`) that will determine the number of |
| 175 | +messages that are kept in memory for a given queue (for both topics *and* channels). If the depth of |
| 176 | +a queue exceeds this threshold messages will be written to disk. This bounds the footprint of a |
| 177 | +given `nsqd` process to: |
| 178 | + |
| 179 | + mem-queue-size * #_of_channels_and_topics |
| 180 | + |
| 181 | +Also, an astute observer might have identified that this is a convenient way to gain an even higher |
| 182 | +guarantee of delivery by setting this value to something low (like 1 or even 0). The disk-backed |
| 183 | +queue is designed to survive unclean restarts (although messages might be delivered twice). |
| 184 | + |
| 185 | +Also, related to message delivery guarantees, *clean* shutdowns (by sending a `nsqd` process the |
| 186 | +TERM signal) safely persist messages in memory, in-flight, deferred, and in various internal |
| 187 | +buffers. |
| 188 | + |
| 189 | +### Efficiency |
| 190 | + |
| 191 | +`NSQ` was designed to communicate over a `memcached` like command protocol with simple size-prefixed |
| 192 | +responses. Compared to the previous toolchain using HTTP, this is significantly more efficient |
| 193 | +per-message. |
| 194 | + |
| 195 | +Also, by eliminating the daisy chained stream copies, configuration, setup, and development time is |
| 196 | +greatly reduced (especially in cases where there are >1 consumers of a topic). All the duplication |
| 197 | +happens at the source. |
| 198 | + |
| 199 | +Finally, because `simplequeue` has no high-level functionality built in, the client is responsible |
| 200 | +for maintaining message state (it does so by embedding this information and sending the message back |
| 201 | +and forth for retries). In `NSQ` all of this information is kept in the core. |
| 202 | + |
| 203 | +## EOL |
| 204 | + |
| 205 | +We've been using `NSQ` in production for several months. It's already made an improvement in terms |
| 206 | +of robustness, development time, and simplicity to systems that have moved over to it. |
| 207 | + |
| 208 | +[1]: https://github.com/bitly/simplehttp |
0 commit comments