Skip to content

Commit 0ca9916

Browse files
committed
proposal: secondary indexes
1 parent dc0384d commit 0ca9916

File tree

2 files changed

+111
-0
lines changed

2 files changed

+111
-0
lines changed

website/docs/spec/index.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,11 @@ The following records are allowed to appear in the data section:
6060
- [Schema](#schema-op0x03)
6161
- [Channel](#channel-op0x04)
6262
- [Message](#message-op0x05)
63+
- [Secondary Index Key](#secondary-index-key-op0x10)
6364
- [Attachment](#attachment-op0x09)
6465
- [Chunk](#chunk-op0x06)
6566
- [Message Index](#message-index-op0x07)
67+
- [Secondary Message Index](#secondary-message-index-op0x11)
6668
- [Metadata](#metadata-op0x0C)
6769
- [Data End](#data-end-op0x0F)
6870

@@ -82,7 +84,9 @@ The following records are allowed to appear in the summary section:
8284

8385
- [Schema](#schema-op0x03)
8486
- [Channel](#channel-op0x04)
87+
- [Secondary Index Key](#secondary-index-key-op0x10)
8588
- [Chunk Index](#chunk-index-op0x08)
89+
- [Secondary Chunk Index](#secondary-chunk-index-op0x12)
8690
- [Attachment Index](#attachment-index-op0x0A)
8791
- [Metadata Index](#metadata-index-op0x0D)
8892
- [Statistics](#statistics-op0x0B)
@@ -179,6 +183,30 @@ The message encoding and schema must match that of the Channel record correspond
179183
| 8 | publish_time | Timestamp | Time at which the message was published. If not available, must be set to the log time. |
180184
| N | data | Bytes | Message data, to be decoded according to the schema of the channel. |
181185

186+
### Secondary Index Key (op=0x10)
187+
188+
A Secondary Index Key record defines a secondary timestamp index that will be used in this file.
189+
Secondary Indexes can be used to quickly look up messages by timestamps other than `log_time`.
190+
The `name` field identifies the timestamp key that messages will be indexed by. The [registry](./registry.md#secondary-index-keys) lists well-known secondary index key names.
191+
192+
A Secondary Index Key record must appear before any [Secondary Message Index](#secondary-message-index-op0x11) records
193+
in the data section with this `secondary_index_id`.
194+
195+
Secondary Index Key records in the Data section must also appear in the Summary section, before
196+
any [Secondary Chunk Index](#secondary-chunk-index-op0x12) records with this `secondary_index_id`.
197+
198+
| Bytes | Name | Type | Description |
199+
| ----- | ------------------ | ------ | ----------------------------------------------------------------- |
200+
| 2 | secondary_index_id | uint16 | A unique identifier for this secondary index within the file. |
201+
| 4 + N | name | string | A name that describes the key, eg. `publish_time`, `header.stamp` |
202+
203+
> Why do Secondary Index Key records appear in the Data section?
204+
> When reading using an index, the Secondary Index Key would be read out of the Summary section
205+
> before reading into the Data section. This means that the Secondary Index Key in the Data section
206+
> is not normally used. However, if a MCAP is truncated and the summary section is lost, having the
207+
> Secondary Index Key appear before any Secondary Message Index records allows the MCAP to be fully
208+
> recovered.
209+
182210
### Chunk (op=0x06)
183211

184212
A Chunk contains a batch of Schema, Channel, and Message records. The batch of records contained in a chunk may be compressed or uncompressed.
@@ -207,6 +235,17 @@ A sequence of Message Index records occurs immediately after each chunk. Exactly
207235

208236
Messages outside of chunks cannot be indexed.
209237

238+
### Secondary Message Index (op=0x11)
239+
240+
A Secondary Message Index record allows readers to locate individual message records within a chunk using a
241+
key defined in a [Secondary Index Key record](#secondary-index-key-op0x10).
242+
243+
| Bytes | Name | Type | Description |
244+
| ----- | ------------------ | --------------------------------- | -------------------------------------------------------------------------------------------------------------- |
245+
| 2 | channel_id | uint16 | Channel ID. |
246+
| 2 | secondary_index_id | uint16 | Secondary Index ID. |
247+
| 4 + N | records | `Array<Tuple<Timestamp, uint64>>` | Array of timestamp and offset for each record. Offset is relative to the start of the uncompressed chunk data. |
248+
210249
### Chunk Index (op=0x08)
211250

212251
A Chunk Index record contains the location of a Chunk record and its associated Message Index records.
@@ -229,6 +268,18 @@ A Schema and Channel record MUST exist in the summary section for all channels r
229268

230269
> Why? The typical use case for file readers using an index is fast random access to a specific message timestamp. Channel is a prerequisite for decoding Message record data. Without an easy-to-access copy of the Channel records, readers would need to search for Channel records from the start of the file, degrading random access read performance.
231270
271+
### Secondary Chunk Index (op=0x12)
272+
273+
A secondary Chunk Index record contains additional secondary index information on top of the corresponding Chunk Index record.
274+
275+
| Bytes | Name | Type | Description |
276+
| ----- | --------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
277+
| 2 | secondary_index_id | uint16 | Secondary Index ID. |
278+
| 8 | chunk_start_offset | uint64 | Offset to the chunk record from the start of the file. |
279+
| 8 | earliest_key | Timestamp | Earliest key in the chunk. Zero if the chunk contains no messages with this key. |
280+
| 8 | latest_key | Timestamp | Latest key in the chunk. Zero if the chunk contains no messages with this key. |
281+
| 4 + N | message_index_offsets | `Map<uint16, uint64>` | Mapping from channel ID to the offset of the message index record for that channel after the chunk, from the start of the file. An empty map indicates no message indexing is available. |
282+
232283
### Attachment (op=0x09)
233284

234285
Attachment records contain auxiliary artifacts such as text, core dumps, calibration data, or other arbitrary data.
@@ -522,6 +573,52 @@ A writer may choose to put messages in Chunks to compress record data. This MCAP
522573
[Footer]
523574
```
524575

576+
### Multiple Messages with a Secondary Index
577+
578+
```
579+
[Header]
580+
[Secondary Index Key 1]
581+
[Chunk A]
582+
[Schema A]
583+
[Channel 1 (A)]
584+
[Channel 2 (B)]
585+
[Message on 1]
586+
[Message on 1]
587+
[Message on 2]
588+
[Message Index 1]
589+
[Message Index 2]
590+
[Secondary Message Index 1 (Channel 1)]
591+
[Secondary Message Index 1 (Channel 2)]
592+
[Attachment 1]
593+
[Chunk B]
594+
[Schema B]
595+
[Channel 3 (B)]
596+
[Message on 3]
597+
[Message on 1]
598+
[Message Index 3]
599+
[Message Index 1]
600+
[Secondary Message Index 1 (Channel 3)]
601+
[Secondary Message Index 1 (Channel 1)]
602+
[Data End]
603+
[Schema A]
604+
[Schema B]
605+
[Channel 1]
606+
[Channel 2]
607+
[Channel 3]
608+
[Secondary Index Key 1]
609+
[Chunk Index A]
610+
[Chunk Index B]
611+
[Secondary Chunk Index 1 (Chunk A)]
612+
[Secondary Chunk Index 1 (Chunk B)]
613+
[Attachment Index 1]
614+
[Statistics]
615+
[Summary Offset 0x01]
616+
[Summary Offset 0x05]
617+
[Summary Offset 0x07]
618+
[Summary Offset 0x08]
619+
[Footer]
620+
```
621+
525622
## Further Reading
526623

527624
- [Feature explanations][feature_explanations]: includes usage details that may be useful to implementers of readers or writers.

website/docs/spec/registry.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,17 @@ The `ros2` profile describes how to create MCAP files for [ROS 2](https://docs.r
152152
#### Schema
153153

154154
- `encoding`: MUST be either `ros2msg` or `ros2idl`
155+
156+
## Secondary index keys
157+
158+
The Secondary Index Key `name` field may contain the following options:
159+
160+
### `header.stamp`
161+
162+
Indexes the `stamp` value of the `std_msgs/msg/Header`-valued `header` field of the deserialized message data.
163+
164+
- `profile`: must be `ros1` or `ros2`
165+
166+
### `publish_time`
167+
168+
Indexes the `publish_time` value of Message records.

0 commit comments

Comments
 (0)