Skip to content

Commit 6d905c7

Browse files
committed
proposal: secondary indexes
1 parent dc0384d commit 6d905c7

File tree

2 files changed

+111
-0
lines changed

2 files changed

+111
-0
lines changed

website/docs/spec/index.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,11 @@ The following records are allowed to appear in the data section:
6060
- [Schema](#schema-op0x03)
6161
- [Channel](#channel-op0x04)
6262
- [Message](#message-op0x05)
63+
- [Secondary Index Key](#secondary-index-key-op0x10)
6364
- [Attachment](#attachment-op0x09)
6465
- [Chunk](#chunk-op0x06)
6566
- [Message Index](#message-index-op0x07)
67+
- [Secondary Message Index](#secondary-message-index-op0x11)
6668
- [Metadata](#metadata-op0x0C)
6769
- [Data End](#data-end-op0x0F)
6870

@@ -82,7 +84,9 @@ The following records are allowed to appear in the summary section:
8284

8385
- [Schema](#schema-op0x03)
8486
- [Channel](#channel-op0x04)
87+
- [Secondary Index Key](#secondary-index-key-op0x10)
8588
- [Chunk Index](#chunk-index-op0x08)
89+
- [Secondary Chunk Index](#secondary-chunk-index-op0x12)
8690
- [Attachment Index](#attachment-index-op0x0A)
8791
- [Metadata Index](#metadata-index-op0x0D)
8892
- [Statistics](#statistics-op0x0B)
@@ -179,6 +183,30 @@ The message encoding and schema must match that of the Channel record correspond
179183
| 8 | publish_time | Timestamp | Time at which the message was published. If not available, must be set to the log time. |
180184
| N | data | Bytes | Message data, to be decoded according to the schema of the channel. |
181185

186+
### Secondary Index Key (op=0x10)
187+
188+
A Secondary Index Key record defines a secondary timestamp index that will be used in this file.
189+
Secondary Indexes can be used to quickly look up messages by timestamps other than `log_time`.
190+
The `name` field identifies the timestamp key that messages will be indexed by. The [registry](./registry.md#secondary-index-keys) lists well-known secondary index key names.
191+
192+
A Secondary Index Key record must appear before any [Secondary Message Index](#secondary-message-index-op0x11) records
193+
in the data section with this `secondary_index_id`.
194+
195+
Secondary Index Key records in the Data section must also appear in the Summary section, before
196+
any [Secondary Chunk Index](#secondary-chunk-index-op0x12) records with this `secondary_index_id`.
197+
198+
| Bytes | Name | Type | Description |
199+
| ----- | ------------------ | ------ | ----------------------------------------------------------------- |
200+
| 2 | secondary_index_id | uint16 | A unique identifier for this secondary index within the file. |
201+
| 4 + N | name | string | A name that describes the key, eg. `publish_time`, `header.stamp` |
202+
203+
> Why do Secondary Index Key records appear in the Data section?
204+
> When reading using an index, the Secondary Index Key would be read out of the Summary section
205+
> before reading into the Data section. This means that the Secondary Index Key in the Data section
206+
> is not normally used. However, if a MCAP is truncated and the summary section is lost, having the
207+
> Secondary Index Key appear before any Secondary Message Index records allows the MCAP to be fully
208+
> recovered.
209+
182210
### Chunk (op=0x06)
183211

184212
A Chunk contains a batch of Schema, Channel, and Message records. The batch of records contained in a chunk may be compressed or uncompressed.
@@ -207,6 +235,17 @@ A sequence of Message Index records occurs immediately after each chunk. Exactly
207235

208236
Messages outside of chunks cannot be indexed.
209237

238+
### Secondary Message Index (op=0x11)
239+
240+
A Secondary Message Index record allows readers to locate individual message records within a chunk using a
241+
key defined in a [Secondary Index Key record](#secondary-index-key-op0x10).
242+
243+
| Bytes | Name | Type | Description |
244+
| ----- | ------------------ | --------------------------------- | -------------------------------------------------------------------------------------------------------------- |
245+
| 2 | channel_id | uint16 | Channel ID. |
246+
| 2 | secondary_index_id | uint16 | Secondary Index ID. |
247+
| 4 + N | records | `Array<Tuple<Timestamp, uint64>>` | Array of timestamp and offset for each record. Offset is relative to the start of the uncompressed chunk data. |
248+
210249
### Chunk Index (op=0x08)
211250

212251
A Chunk Index record contains the location of a Chunk record and its associated Message Index records.
@@ -229,6 +268,18 @@ A Schema and Channel record MUST exist in the summary section for all channels r
229268

230269
> Why? The typical use case for file readers using an index is fast random access to a specific message timestamp. Channel is a prerequisite for decoding Message record data. Without an easy-to-access copy of the Channel records, readers would need to search for Channel records from the start of the file, degrading random access read performance.
231270
271+
### Secondary Chunk Index (op=0x12)
272+
273+
A secondary Chunk Index record contains additional secondary index information on top of the corresponding Chunk Index record.
274+
275+
| Bytes | Name | Type | Description |
276+
| ----- | --------------------- | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
277+
| 2 | secondary_index_id | uint16 | Secondary Index ID. |
278+
| 8 | chunk_start_offset | uint64 | Offset to the chunk record from the start of the file. |
279+
| 8 | earliest_key | Timestamp | Earliest key in the chunk. Zero if the chunk contains no messages with this key. |
280+
| 8 | latest_key | Timestamp | Latest key in the chunk. Zero if the chunk contains no messages with this key. |
281+
| 4 + N | message_index_offsets | `Map<uint16, uint64>` | Mapping from channel ID to the offset of the secondary message index record with this `secondary_index_id` for that channel after the chunk, from the start of the file. An empty map indicates no message indexing is available. |
282+
232283
### Attachment (op=0x09)
233284

234285
Attachment records contain auxiliary artifacts such as text, core dumps, calibration data, or other arbitrary data.
@@ -522,6 +573,52 @@ A writer may choose to put messages in Chunks to compress record data. This MCAP
522573
[Footer]
523574
```
524575

576+
### Multiple Messages with a Secondary Index
577+
578+
```
579+
[Header]
580+
[Secondary Index Key 1]
581+
[Chunk A]
582+
[Schema A]
583+
[Channel 1 (A)]
584+
[Channel 2 (B)]
585+
[Message on 1]
586+
[Message on 1]
587+
[Message on 2]
588+
[Message Index 1]
589+
[Message Index 2]
590+
[Secondary Message Index 1 (Channel 1)]
591+
[Secondary Message Index 1 (Channel 2)]
592+
[Attachment 1]
593+
[Chunk B]
594+
[Schema B]
595+
[Channel 3 (B)]
596+
[Message on 3]
597+
[Message on 1]
598+
[Message Index 3]
599+
[Message Index 1]
600+
[Secondary Message Index 1 (Channel 3)]
601+
[Secondary Message Index 1 (Channel 1)]
602+
[Data End]
603+
[Schema A]
604+
[Schema B]
605+
[Channel 1]
606+
[Channel 2]
607+
[Channel 3]
608+
[Secondary Index Key 1]
609+
[Chunk Index A]
610+
[Chunk Index B]
611+
[Secondary Chunk Index 1 (Chunk A)]
612+
[Secondary Chunk Index 1 (Chunk B)]
613+
[Attachment Index 1]
614+
[Statistics]
615+
[Summary Offset 0x01]
616+
[Summary Offset 0x05]
617+
[Summary Offset 0x07]
618+
[Summary Offset 0x08]
619+
[Footer]
620+
```
621+
525622
## Further Reading
526623

527624
- [Feature explanations][feature_explanations]: includes usage details that may be useful to implementers of readers or writers.

website/docs/spec/registry.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,17 @@ The `ros2` profile describes how to create MCAP files for [ROS 2](https://docs.r
152152
#### Schema
153153

154154
- `encoding`: MUST be either `ros2msg` or `ros2idl`
155+
156+
## Secondary index keys
157+
158+
The Secondary Index Key `name` field may contain the following options:
159+
160+
### `header.stamp`
161+
162+
Indexes the `stamp` value of the `std_msgs/msg/Header`-valued `header` field of the deserialized message data.
163+
164+
- `profile`: must be `ros1` or `ros2`
165+
166+
### `publish_time`
167+
168+
Indexes the `publish_time` value of Message records.

0 commit comments

Comments
 (0)