forked from apache/mesos
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG
6540 lines (5959 loc) · 405 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Release Notes - Mesos - Version 1.6.0
-------------------------------------------
This release contains the following new features:
* [MESOS-4965] - **Experimental** Persistent volumes can be resized
through new offer operations and V1 operator API now.
* [MESOS-6575] - Added a new `--xfs_kill_containers` flag to the
Mesos agent. This causes the `disk/xfs` isolator to terminate
containers that exceed their disk quota.
* [MESOS-7944] - **Experimental** Added a new `MemoryProfiler` class to
libprocess to aid in debugging memory issues.
* [MESOS-8054] - Schedulers can now receive feedback about offer operations
which operate on resources managed by resource providers. In the future,
this feature will be extended to operations on agent default resources.
* [MESOS-8534] - **Experimental** A nested container is now allowed
to join a separate CNI network than its parent container.
* [MESOS-8572] - Improvements to the Docker containerizer and executor
to more gracefully handle situations in which the Docker CLI is
unresponsive.
* [MESOS-8607] - The `mesos-execute` tool has been ported to Windows.
* [MESOS-8649] - **Experimental** Support for Container Storage Interface
(CSI) version 0.2 in Mesos.
* [MESOS-8659] - The Windows build now links the C runtime library
dynamically instead of statically. This requires the Visual Studio
redistributable to be available at runtime.
* [MESOS-8682] - The use of the C runtime library's POSIX wrappers on
Windows has been deprecated in favor of the native Windows APIs.
* [MESOS-8725] - Added a new `max_completion_time` field to `TaskInfo`.
Tasks which do not complete at the end of the specified duration will
fail with a new reason `REASON_MAX_COMPLETION_TIME_REACHED`.
* [MESOS-8801] - **Experimental** On Linux, Mesos can now be
configured to use the jemalloc allocator by default via the
`--enable-jemalloc-allocator` configuration option.
* Agents now support the `--fetcher_stall_timeout` flag which allows container
image and artifact fetchers to abort after the timeout when downloads stall.
Deprecations/Removals:
* Support for CSI v0.1 is deprecated in favor of CSI v0.2.
Additional API Changes:
* [MESOS-8306] - Authorization of resource reservation has been updated
to allow the restriction of which agents can statically reserve
resources for which roles.
* [MESOS-8332] - Container sandbox permissions have been changed from
0755 to 0750.
* [MESOS-8388] - Local resource provider resources are now included in
the responses to the GET_AGENTS and GET_RESOURCE_PROVIDER calls.
* [MESOS-8534] - Nested containers within a task group can now specify
separate network namespaces.
Changes to Dependencies:
* Upgraded minimum required gRPC library to version 1.10+ for gRPC-enabled builds.
Unresolved Critical Issues:
* [MESOS-1718] - Command executor can overcommit the agent.
* [MESOS-2554] - Slave flaps when using --slave_subsystems that are not used for isolation.
* [MESOS-2774] - SIGSEGV received during process::MessageEncoder::encode()
* [MESOS-2842] - Update FrameworkInfo.principal on framework re-registration
* [MESOS-3533] - Unable to find and run URIs files
* [MESOS-3747] - HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
* [MESOS-5396] - After failover, master does not remove agents with same UPID.
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
* [MESOS-5995] - Protobuf JSON deserialisation does not accept numbers formated as strings
* [MESOS-6632] - ContainerLogger might leak FD if container launch fails.
* [MESOS-7386] - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
* [MESOS-7566] - Master crash due to failed check in DRFSorter::remove
* [MESOS-7622] - Agent can crash if a HTTP executor tries to retry subscription in running state.
* [MESOS-7721] - Master's agent removal rate limit also applies to agent unreachability.
* [MESOS-7748] - Slow subscribers of streaming APIs can lead to Mesos OOMing.
* [MESOS-7911] - Non-checkpointing framework's tasks should not be marked LOST when agent disconnects.
* [MESOS-7966] - check for maintenance on agent causes fatal error
* [MESOS-7991] - fatal, check failed !framework->recovered()
* [MESOS-8137] - Mesos agent can hang during startup.
* [MESOS-8256] - Libprocess can silently deadlock due to worker thread exhaustion.
* [MESOS-8257] - Unified Containerizer "leaks" a target container mount path to the host FS when the target resolves to an absolute path
* [MESOS-8522] - `prepareMounts` in Mesos containerizer is flaky.
* [MESOS-8623] - Crashed framework brings down the whole Mesos cluster
* [MESOS-8679] - If the first KILL stuck in the default executor, all other KILLs will be ignored.
* [MESOS-8703] - Mesos master can`t reconnect to zookeeper
* [MESOS-8731] - mesos master APIs become latent
* [MESOS-8769] - Agent crashes when CNI config not defined
* [MESOS-8803] - Libprocess deadlocks in a test.
* [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent volume data
* [MESOS-8840] - `cpu.cfs_quota_us` may be accidentally set for command task using docker during agent recovery.
Feature Graduations:
* [MESOS-4828] - XFS disk quota isolator.
* [MESOS-6906] - Introduce a general non-interpreting task check.
All Experimental Features:
* [MESOS-3094] - Mesos on Windows.
* [MESOS-3421] - Support sharing of resources across task instances.
* [MESOS-4312] - Porting Mesos on Power (ppc64le).
* [MESOS-4355] - Implement isolator for Docker volume.
* [MESOS-4965] - Persistent volume resizing.
* [MESOS-5344] - Partition-aware Mesos frameworks.
* [MESOS-5788] - Added JAVA API adapter for seamless transition to new scheduler API.
* [MESOS-5931] - Support auto backend in Mesos Containerizer.
* [MESOS-6014] - Added port mapping CNI plugin.
* [MESOS-7944] - Libprocess `MemoryProfiler`.
* [MESOS-8534] - Separate CNI networks for nested containers.
* [MESOS-8649] - Support for Container Storage Interface version 0.2.
* [MESOS-8801] - Linux support for jemalloc.
All Resolved Issues:
** Bug
* [MESOS-1720] - Slave should send exited executor message when the executor is never launched.
* [MESOS-3915] - Upgrade vendored Boost
* [MESOS-4420] - Support read host physical link speed from virtio driver
* [MESOS-5333] - GET /master/maintenance/schedule/ produces 404.
* [MESOS-5820] - Port master to Windows
* [MESOS-5882] - `os::cloexec` does not exist on Windows
* [MESOS-5940] - `setPaths` doesn’t work on Windows
* [MESOS-6555] - Namespace 'mnt' is not supported
* [MESOS-6713] - Port `slave_recovery_tests.cpp`
* [MESOS-6715] - Port `uri_fetcher_tests.cpp`
* [MESOS-6822] - CNI reports confusing error message for failed interface setup.
* [MESOS-6973] - Fix BOOST random generator initialization on Windows
* [MESOS-7028] - NetSocketTest.EOFBeforeRecv is flaky.
* [MESOS-7342] - Port Docker tests
* [MESOS-7604] - SlaveTest.ExecutorReregistrationTimeoutFlag aborts on Windows
* [MESOS-7699] - "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable freshly released)
* [MESOS-7742] - Race conditions in IOSwitchboard: listening on unix socket and premature closing of the connection.
* [MESOS-7803] - fs::list drops path components on Windows
* [MESOS-7944] - Implement jemalloc memory profiling support for Mesos
* [MESOS-7979] - reviewboard's GUESS_FIELDS setting leads to redundant information in commit messages
* [MESOS-8125] - Agent should properly handle recovering an executor when its pid is reused
* [MESOS-8140] - Executors should clear their auth tokens
* [MESOS-8232] - SlaveTest.RegisteredAgentReregisterAfterFailover is flaky.
* [MESOS-8258] - Mesos.DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer is flaky.
* [MESOS-8305] - DefaultExecutorTest.ROOT_MultiTaskgroupSharePidNamespace is flaky.
* [MESOS-8308] - CommandExecutorCheckTest.CommandCheckTimeout is flaky on Windows
* [MESOS-8334] - PartitionedSlaveReregistrationMasterFailover is flaky.
* [MESOS-8336] - MasterTest.RegistryUpdateAfterReconfiguration is flaky
* [MESOS-8348] - Enable function sections in the build.
* [MESOS-8350] - Resource provider-capable agents not correctly synchronizing checkpointed agent resources on reregistration
* [MESOS-8404] - Improve image puller error messages.
* [MESOS-8411] - Killing a queued task can lead to the command executor never terminating.
* [MESOS-8413] - Zookeeper configuration passwords are shown in clear text
* [MESOS-8416] - CHECK failure if trying to recover nested containers but the framework checkpointing is not enabled.
* [MESOS-8440] - `network/ports` isolator kills legitimate tasks on recovery.
* [MESOS-8444] - GC failure causes agent miss to detach virtual paths for the executor's sandbox
* [MESOS-8446] - Agent miss to detach `virtualLatestPath` for the executor's sandbox during recovery
* [MESOS-8447] - Incomplete output of apply-reviews.py --dry-run
* [MESOS-8453] - ExecutorAuthorizationTest.RunTaskGroup segfaults.
* [MESOS-8463] - Test MasterAllocatorTest/1.SingleFramework is flaky
* [MESOS-8468] - `LAUNCH_GROUP` failure tears down the default executor.
* [MESOS-8474] - Test StorageLocalResourceProviderTest.ROOT_ConvertPreExistingVolume is flaky
* [MESOS-8477] - Make clean fails without Python artifacts.
* [MESOS-8480] - Mesos returns high resource usage when killing a Docker task.
* [MESOS-8482] - Signed/Unsigned comparisons in tests
* [MESOS-8483] - ExampleTests PythonFramework fails with sigabort.
* [MESOS-8484] - stout test NumifyTest.HexNumberTest fails.
* [MESOS-8485] - MasterTest.RegistryGcByCount is flaky
* [MESOS-8489] - LinuxCapabilitiesIsolatorFlagsTest.ROOT_IsolatorFlags is flaky
* [MESOS-8490] - UpdateSlaveMessageWithPendingOffers is flaky.
* [MESOS-8497] - Docker parameter `name` does not work with Docker Containerizer.
* [MESOS-8508] - Missing map header when compiling against unbundled protobuf
* [MESOS-8510] - URI disk profile adaptor does not consider plugin type for a profile.
* [MESOS-8512] - Fetcher doesn't log it's stdout/stderr properly to the log file
* [MESOS-8513] - Noisy "transport endpoint is not connected" logs on closing sockets.
* [MESOS-8519] - Fix recovery of job object isolated tasks
* [MESOS-8530] - Default executor tasks can get stuck in KILLING state
* [MESOS-8536] - Pending offer operations on resource provider resources not properly accounted for in allocator
* [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
* [MESOS-8546] - PythonFramework test fails with cache write failure.
* [MESOS-8548] - Test StorageLocalResourceProviderTest.ROOT_Metrics is flaky
* [MESOS-8550] - Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`.
* [MESOS-8552] - CGROUPS_ROOT_PidNamespaceForward and CGROUPS_ROOT_PidNamespaceBackward tests fail
* [MESOS-8563] - Windows executors cannot re-register
* [MESOS-8565] - Persistent volumes are not visible in Mesos UI when launching a pod using default executor.
* [MESOS-8577] - Destroy nested container if `LAUNCH_NESTED_CONTAINER_SESSION` fails
* [MESOS-8578] - UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole is flaky.
* [MESOS-8585] - Agent crashes when starting a task with an unknown user.
* [MESOS-8586] - apply-reviews.py silently does nothing when a review was submitted already.
* [MESOS-8594] - Mesos master stack overflow in libprocess socket send loop.
* [MESOS-8598] - Allow empty resource provider selector in `UriDiskProfileAdaptor`.
* [MESOS-8601] - Master crashes during slave reregistration after failover.
* [MESOS-8604] - Quota headroom tracking may be incorrect in the presence of hierarchical reservation.
* [MESOS-8605] - Terminal task status update will not send if 'docker inspect' is hung
* [MESOS-8610] - NsTest.SupportedNamespaces fails on CentOS7
* [MESOS-8611] - SlaveTest.RemoveExecutorUponFailedLaunch is flaky.
* [MESOS-8617] - Tests using default executor occasionally fail.
* [MESOS-8618] - ReconciliationTest.ReconcileStatusUpdateTaskState is flaky.
* [MESOS-8619] - Docker on Windows uses USERPROFILE instead of HOME for credentials
* [MESOS-8620] - Containers stuck in FETCHING possibly due to unresponsive server.
* [MESOS-8624] - Valid tasks may be explicitly dropped by agent due to race conditions
* [MESOS-8631] - Agent should be able to start a task with every CPU on a Windows machine
* [MESOS-8641] - Event stream could send heartbeat before subscribed
* [MESOS-8642] - ballon-executor is hard to run as unprivileged user
* [MESOS-8643] - `os::system` and `os::spawn` returns -1 on valid windows commands
* [MESOS-8644] - W* macros wrong on Windows.
* [MESOS-8646] - Agent should be able to resolve file names on open files.
* [MESOS-8647] - Enable resource provider agent capability by default
* [MESOS-8651] - Potential memory leaks in the `volume/sandbox_path` isolator
* [MESOS-8654] - The `/proc/sys` mount point in Mesos containers should also include `nosuid,noexec,nodev` mount options.
* [MESOS-8659] - Fix warning `cl : Command line warning D9025 : overriding '/MTd' with '/MDd'`
* [MESOS-8664] - Perf sampler doesn't handle extra fields and nameless counters
* [MESOS-8691] - Forward CXX_FLAGS to C++ projects and C_FLAGS to C projects in CMake
* [MESOS-8711] - SlaveTest.ChangeDomain is disabled.
* [MESOS-8719] - Mesos configured with `--enable-grpc` doesn't compile on non-Linux builds
* [MESOS-8724] - G++ Warning about libc system macros `major` and `minor` prevents Mesos build
* [MESOS-8733] - OversubscriptionTest.ForwardUpdateSlaveMessage is flaky
* [MESOS-8741] - `Add` to sequence will not run if it races with sequence destruction
* [MESOS-8742] - Agent resource provider config API calls should be idempotent.
* [MESOS-8749] - CSI proto is always included in the build when using CMake
* [MESOS-8761] - Default linker fails to link tests on FreeBSD
* [MESOS-8781] - Mesos master shouldn't silently drop operations
* [MESOS-8784] - OPERATION_DROPPED operation status updates should include the operation/framework IDs
* [MESOS-8787] - RP-related API should be experimental.
* [MESOS-8804] - Fix Ninja Release builds on Windows
* [MESOS-8818] - VolumeSandboxPathIsolatorTest.SharedParentTypeVolume fails on macOS
* [MESOS-8834] - Indirect recursion between `send` and `_send` in libprocess may cause stack overflow.
* [MESOS-8865] - Suspicious enum value comparisons in scheduler Java bindings
* [MESOS-8866] - CMake builds are missing byproduct declaration for jemalloc.
* [MESOS-8868] - Some 'FsTest' test cases fail on macOS
* [MESOS-8870] - Master does not correctly reconcile dropped operations after agent failover
* [MESOS-8874] - ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider is flaky.
* [MESOS-8876] - Normal exit of Docker container using rexray volume results in TASK_FAILED.
* [MESOS-8881] - Enable epoll backend in libevent integration.
* [MESOS-8885] - Disable libevent debug mode.
** Improvement
* [MESOS-2922] - Add move constructors / assignment to Future.
* [MESOS-3022] - export additional metrics from scheduler driver
* [MESOS-4965] - Support resizing of an existing persistent volume
* [MESOS-5362] - Add authentication to example frameworks
* [MESOS-6128] - Make "re-register" vs. "reregister" consistent in the master
* [MESOS-7016] - Make default AWAIT_* duration configurable
* [MESOS-7643] - The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically
* [MESOS-7656] - Update the JSON <=> protobuf message conversion for map support
* [MESOS-7881] - Building gRPC with CMake
* [MESOS-7990] - Support systemd named hierarchy (name=systemd) for Mesos Containerizer.
* [MESOS-8033] - Use more idiomatic CMake for compiler features
* [MESOS-8240] - Add an option to build the new CLI and run unit tests.
* [MESOS-8306] - Restrict which agents can statically reserve resources for which roles
* [MESOS-8332] - Narrow the container sandbox permissions.
* [MESOS-8357] - Example frameworks have an inconsistent UX.
* [MESOS-8361] - Example frameworks to support launching mesos-local.
* [MESOS-8389] - Notion of "removable" task in master code is inaccurate.
* [MESOS-8390] - Notion of "transitioning" agents in the master is now inaccurate.
* [MESOS-8402] - Resource provider manager should persist resource provider information
* [MESOS-8426] - Speed up SLRP tests
* [MESOS-8427] - Clean up residual CSI endpoints for SLRP tests.
* [MESOS-8434] - Cleanup Authorization logic in master and agent
* [MESOS-8454] - Add a download link for master and agent logs in WebUI
* [MESOS-8471] - Allow revocable_resources capability for mesos-execute
* [MESOS-8488] - Docker bug can cause unkillable tasks.
* [MESOS-8506] - Add test coverage for `Resources::find` on revocable resources
* [MESOS-8556] - Boost emits warning repeatedly
* [MESOS-8573] - Container stuck in PULLING when Docker daemon hangs
* [MESOS-8574] - Docker executor makes no progress when 'docker inspect' hangs
* [MESOS-8575] - Improve discard handling for 'Docker::stop' and 'Docker::pull'.
* [MESOS-8576] - Improve discard handling of 'Docker::inspect()'
* [MESOS-8591] - Add infra to test a hung Docker daemon
* [MESOS-8599] - Build with Ninja on Windows
* [MESOS-8607] - Port mesos-execute to Windows
* [MESOS-8609] - Create a metric to indicate how long agent takes to recover executors
* [MESOS-8640] - Validate `DockerInfo` exists when container's type is `DOCKER`
* [MESOS-8656] - Improve stout JSON -> protobuf message conversion to handle more valid JSONs
* [MESOS-8658] - CMake build should use same compiler warnings as Autotools
* [MESOS-8702] - Replace the manual parsing in Mesos code with the native protobuf map support
* [MESOS-8725] - Support max_duration for tasks
* [MESOS-8728] - Don't print full usage for invocation errors
* [MESOS-8772] - Add slave recovery test for default executor.
* [MESOS-8793] - Add more logging to agent recovery path.
* [MESOS-8801] - Add jemalloc as optional third-party memory allocator
* [MESOS-8851] - Introduce a push-based gauge.
** Task
* [MESOS-3441] - Port os_tests to Windows
* [MESOS-3445] - Port signals_tests to Windows
* [MESOS-3644] - Implement stout/os/windows/signals.hpp
* [MESOS-4176] - Support CMake build on FreeBSD
* [MESOS-5726] - Benchmark the v1 Operator API
* [MESOS-5850] - Add a test that runs the 'mesos-local' binary
* [MESOS-6575] - Change `disk/xfs` isolator to terminate executor when it exceeds quota
* [MESOS-7558] - Add resource provider validation
* [MESOS-8184] - Implement master's AcknowledgeOfferOperationMessage handler.
* [MESOS-8189] - Master’s OperationStatusUpdate handler should forward updates to the framework when OfferOperationID is set.
* [MESOS-8190] - Update the master to accept OfferOperationIDs from frameworks.
* [MESOS-8191] - Implement ReconcileOfferOperations handler in the master
* [MESOS-8192] - Update the scheduler library to support request/response API calls.
* [MESOS-8275] - Remove use of ::_stat on Windows
* [MESOS-8284] - Add a ns::supported convenience API.
* [MESOS-8362] - Verify end-to-end operation status update retry after RP failover
* [MESOS-8363] - Verify that the master acknowledges operation status updates correctly
* [MESOS-8373] - Test reconciliation after operation is dropped en route to agent
* [MESOS-8382] - Master should bookkeep local resource providers.
* [MESOS-8388] - Show LRP resources in master and agent endpoints.
* [MESOS-8407] - Add SLRP unit tests for profile updates and corner cases.
* [MESOS-8408] - Add an SLRP test for CSI plugin restart.
* [MESOS-8409] - Add an SLRP test for agent registered with a new ID.
* [MESOS-8415] - Add an SLRP test for agent reboot.
* [MESOS-8420] - Test that operation status updates are retried after being dropped en-route to the master.
* [MESOS-8424] - Test that operations are correctly reported following a master failover
* [MESOS-8442] - Source tree contains generated endpoint documentation
* [MESOS-8445] - Test that `UPDATE_STATE` of a resource provider doesn't have unwanted side-effects in master or agent
* [MESOS-8462] - Unit test for `Slave::detachFile` on removed frameworks.
* [MESOS-8492] - Checkpoint profiles in storage local resource provider.
* [MESOS-8527] - Add metrics about number of subscribed LRPs on the agent.
* [MESOS-8534] - Allow nested containers in TaskGroups to have separate network namespaces
* [MESOS-8539] - Add metrics about CSI plugin terminations.
* [MESOS-8551] - Port libprocess HTTPTest.QueryEncodeDecode
* [MESOS-8569] - Allow newline characters when decoding base64 strings in stout.
* [MESOS-8650] - Bump CSI bundle to v0.2.
* [MESOS-8653] - Make the CSI client to support CSI v0.2.
* [MESOS-8657] - Build CSI proto in CMake.
* [MESOS-8673] - Fix os::open to use HANDLEs
* [MESOS-8675] - Remove FD_CRT from WindowsFD
* [MESOS-8676] - Fix os::read and os::write to use HANDLES
* [MESOS-8678] - Bump gRPC bundle to 1.10.0.
* [MESOS-8683] - Remove _close from Windows close.hpp
* [MESOS-8684] - Replace _dup with DuplicateHandle on Windows
* [MESOS-8685] - Replace _lseek with SetFilePointer
* [MESOS-8692] - Replace _chsize_s with SetEndOfFile on Windows
* [MESOS-8697] - Make gRPC-related tests cross-platform.
* [MESOS-8698] - Enable storage local resource provider in CMake.
* [MESOS-8706] - Unify return type of `wait` and `destroy` containerizer methods
* [MESOS-8710] - Update tests after changing return type of `wait` method
* [MESOS-8717] - Support CSI v0.2 in SLRP.
* [MESOS-8735] - Implement recovery for resource provider manager registrar
* [MESOS-8747] - Support resizing persistent volume through operator API
* [MESOS-8748] - Create ACL for grow and shrink volume
* [MESOS-8750] - Check failed: !slaves.registered.contains(task->slave_id)
* [MESOS-8777] - Support `STAGE_UNSTAGE_VOLUME` CSI capability in SLRP
* [MESOS-8819] - mesos.pom file hardcodes developers
* [MESOS-8833] - Port libprocess subprocess_tests.cpp
** Documentation
* [MESOS-8291] - Add documentation about fault domains
Release Notes - Mesos - Version 1.5.1 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-1720] - Slave should send exited executor message when the executor is never launched.
* [MESOS-7742] - Race conditions in IOSwitchboard: listening on unix socket and premature closing of the connection.
* [MESOS-8125] - Agent should properly handle recovering an executor when its pid is reused.
* [MESOS-8411] - Killing a queued task can lead to the command executor never terminating.
* [MESOS-8416] - CHECK failure if trying to recover nested containers but the framework checkpointing is not enabled.
* [MESOS-8468] - `LAUNCH_GROUP` failure tears down the default executor.
* [MESOS-8488] - Docker bug can cause unkillable tasks.
* [MESOS-8510] - URI disk profile adaptor does not consider plugin type for a profile.
* [MESOS-8536] - Pending offer operations on resource provider resources not properly accounted for in allocator.
* [MESOS-8550] - Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`.
* [MESOS-8552] - CGROUPS_ROOT_PidNamespaceForward and CGROUPS_ROOT_PidNamespaceBackward tests fail.
* [MESOS-8565] - Persistent volumes are not visible in Mesos UI when launching a pod using default executor.
* [MESOS-8569] - Allow newline characters when decoding base64 strings in stout.
* [MESOS-8574] - Docker executor makes no progress when 'docker inspect' hangs.
* [MESOS-8575] - Improve discard handling for 'Docker::stop' and 'Docker::pull'.
* [MESOS-8576] - Improve discard handling of 'Docker::inspect()'.
* [MESOS-8577] - Destroy nested container if `LAUNCH_NESTED_CONTAINER_SESSION` fails.
* [MESOS-8594] - Mesos master stack overflow in libprocess socket send loop.
* [MESOS-8598] - Allow empty resource provider selector in `UriDiskProfileAdaptor`.
* [MESOS-8601] - Master crashes during slave reregistration after failover.
* [MESOS-8604] - Quota headroom tracking may be incorrect in the presence of hierarchical reservation.
* [MESOS-8605] - Terminal task status update will not send if 'docker inspect' is hung.
* [MESOS-8619] - Docker on Windows uses `USERPROFILE` instead of `HOME` for credentials.
* [MESOS-8631] - Agent should be able to start a task with every CPU on a Windows machine.
* [MESOS-8641] - Event stream could send heartbeat before subscribed.
* [MESOS-8646] - Agent should be able to resolve file names on open files.
* [MESOS-8651] - Potential memory leaks in the `volume/sandbox_path` isolator.
* [MESOS-8742] - Agent resource provider config API calls should be idempotent.
* [MESOS-8787] - RP-related API should be experimental.
* [MESOS-8876] - Normal exit of Docker container using rexray volume results in TASK_FAILED.
* [MESOS-8881] - Enable epoll backend in libevent integration.
* [MESOS-8885] - Disable libevent debug mode.
Release Notes - Mesos - Version 1.5.0
-------------------------------------------
This release contains the following new features:
* [MESOS-1739] - **Experimental** Agents now support the
`--reconfiguration_policy` flag which allows them to recover
the agent ID and running tasks after configuration changes.
See docs/agent-recovery.md for more details.
* [MESOS-4945] - **Experimental** Agents now can automatically
garbage collect unused Docker image layers used by Mesos
Containerizer.
* [MESOS-7289, MESOS-7235] - **Experimental** Support for the
Container Storage Interface (CSI) to simplify storage management
in Mesos, and allow 3rdparty vendors to plugin into Mesos very
easily.
* [MESOS-7302] - Support launching standalone containers on the
agent using MesosContainerizer without a master or framework
running.
* [MESOS-7749] - **Experimental** Support for gRPC client in Mesos.
The gRPC is bundled in Mesos and a gRPC client API is built is
built into libprocess.
* [MESOS-7973] - **Experimental** Non-leading replica is now allowed
to catch-up missing log positions in the replicated log. This opens
the door for implementing hot standby (by offloading some reading
from a leader to standbys) and fast failover time (by keeping
in-memory storage represented by the log “hot”).
* Several improvements and fixes to the enforcement of quota
guarantees have been made:
* [MESOS-4527]: Previously a role could "game" the quota system
by amassing reservations that it leaves unused. This is now
prevented by accounting for reservations when allocating
resources.
* [MESOS-7099]: Resources are now allocated in a fine-grained
manner to prevent roles from exceeding their quota.
* [MESOS-8293]: There was a bug where a role may not receive its
reservation when it does not have quota, this has been fixed.
* [MESOS-8339]: When a role has more reservations than quota,
there was a bug previously where an insufficient amount of
quota headroom was held. This has been fixed.
* [MESOS-8352]: When allocating to a role with quota, we
previously included all other resources on the agent that the
role does not have quota for. This made it possible to violate
the quota guarantees of a different role. This has been fixed
by taking into account the headroom that is needed when
allocating the resources.
Deprecations/Removals:
* [MESOS-7305] - Some nested container agent APIs `****_NESTED_CONTAINER`
are deprecated in favor of the new generally named agent APIs
`****_CONTAINER`.
* Agent flag `--executor_secret_key` has been deprecated. Operators
should use `--jwt_secret_key` instead.
Additional API Changes:
* [MESOS-6406, MESOS-7215, MESOS-8337] Now when an agent is partitioned,
the master tracks all noncompleted tasks regardless of partition-awareness
so when the agent reregisters it can recover all of them and send their
latest statuses to the scheduler. NOTE: The master now sends updates for
tasks recovered from partitioned agents upon reregistration so the scheduler
can get them before reconciliation. We also fixed the buggy semantics that
exposes terminal unacknowledged tasks when partitioned as "completed" in the
HTTP endpoints and the operator API, now they are shown as "unreachable". We
plan to further improve the API on this in MESOS-8405.
* [MESOS-7550] The fields `Resource.disk.source.path.root` and
`Resource.disk.source.mount.root` can now be set to relative paths
to an agent's work directory.
* [MESOS-7660] `Filter::refuse_seconds` is now capped to 31536000
seconds (365 days).
* [MESOS-7941] Built-in executors will now send a TASK_STARTING
status update when a task is starting.
* [MESOS-7973] A new `catchup` method has been added to the
`Log.Reader` interface (including Java binding).
* [MESOS-8040] Return nested/standalone containers in `GET_CONTAINERS`
API call.
* [MESOS-8165] Master will now send TASK_GONE status for unknown
tasks of PARTITION_AWARE frameworks belonging to registered agents
during explicit reconciliation.
Changes to Dependencies:
* Upgraded minimum required Protobuf library to version 3+.
Feature Graduations:
* [MESOS-4791] - v1 Operator API is now considered stable. The performance has
been improved so that when using protobuf it is faster than v0, and when
using JSON it is slightly slower than v0.
* [MESOS-5116] - Add support for accounting only mode in XFS isolator.
* [MESOS-5275, MESOS-7476, MESOS-7477, MESOS-7671] - Add file-based and
protobuf-based capabilities support for mesos containerizer. This
includes the support for effective and bounding capabilities.
* [MESOS-6077] - Added a default (task group) executor.
* [MESOS-6402] - rlimit support for Mesos containerizer.
* [MESOS-6460] - Container Attach/Exec.
* [MESOS-6758] - Support docker registry that requires basic auth.
* [MESOS-7088] - Support private registry credential per container.
* [MESOS-7418] - Add support for file-based secrets.
Unresolved Critical Issues:
* [MESOS-1718] - Command executor can overcommit the agent.
* [MESOS-2554] - Slave flaps when using --slave_subsystems that are not used for isolation.
* [MESOS-2774] - SIGSEGV received during process::MessageEncoder::encode().
* [MESOS-2842] - Update FrameworkInfo.principal on framework re-registration.
* [MESOS-3533] - Unable to find and run URIs files.
* [MESOS-3747] - HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string.
* [MESOS-4996] - 'containerizer->update' will always fail after killing a docker container.
* [MESOS-5352] - Docker volume isolator cleanup can be blocked by first cleanup failure.
* [MESOS-5396] - After failover, master does not remove agents with same UPID.
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
* [MESOS-5995] - Protobuf JSON deserialisation does not accept numbers formated as strings.
* [MESOS-6632] - ContainerLogger might leak FD if container launch fails.
* [MESOS-6804] - Running 'tty' inside a debug container that has a tty reports "Not a tty".
* [MESOS-6986] - abort in DRFSorter::add.
* [MESOS-7386] - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed.
* [MESOS-7566] - Master crash due to failed check in DRFSorter::remove.
* [MESOS-7622] - Agent can crash if a HTTP executor tries to retry subscription in running state.
* [MESOS-7721] - Master's agent removal rate limit also applies to agent unreachability.
* [MESOS-7748] - Slow subscribers of streaming APIs can lead to Mesos OOMing.
* [MESOS-7911] - Non-checkpointing framework's tasks should not be marked LOST when agent disconnects.
* [MESOS-7966] - check for maintenance on agent causes fatal error.
* [MESOS-7991] - fatal, check failed !framework->recovered().
* [MESOS-8038] - Launching GPU task sporadically fails.
* [MESOS-8125] - Agent should properly handle recovering an executor when its pid is reused.
* [MESOS-8137] - Mesos agent can hang during startup.
* [MESOS-8256] - Libprocess can silently deadlock due to worker thread exhaustion.
* [MESOS-8411] - Killing a queued task can lead to the command executor never terminating.
* [MESOS-8468] - `LAUNCH_GROUP` failure tears down the default executor.
All Resolved Issues:
** Bug
* [MESOS-1216] - Attributes comparator operator should allow multiple attributes of same name and type.
* [MESOS-3576] - Audit CMake linking flags.
* [MESOS-5455] - Transition away from temporary build variables.
* [MESOS-5462] - Re-organize isolator hierarchy.
* [MESOS-5656] - Incomplete modelling of 3rdparty dependencies in cmake build.
* [MESOS-5881] - Semantics of `os::symlink` differ across POSIX and Windows.
* [MESOS-5905] - Zookeeper tests do not work on CMake builds as directory structure changed.
* [MESOS-6086] - PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove is flaky.
* [MESOS-6187] - "double free or corruption" with Java 8.
* [MESOS-6345] - ExamplesTest.PersistentVolumeFramework failing due to double free corruption on Ubuntu 14.04.
* [MESOS-6406] - Send latest status for partition-aware tasks when agent reregisters.
* [MESOS-6428] - Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe.
* [MESOS-6616] - Error: dereferencing type-punned pointer will break strict-aliasing rules.
* [MESOS-6671] - External 3rdparty deps are not built with the configured compiler in cmake build.
* [MESOS-6690] - Wire up resource control API to Windows Job objects API.
* [MESOS-6697] - Port `authentication_tests.cpp`.
* [MESOS-6703] - Port `credentials_tests.cpp`.
* [MESOS-6705] - Port `fetcher_tests.cpp`.
* [MESOS-6708] - Port `group_tests.cpp`.
* [MESOS-6735] - `os::realpath` semantics differ between Windows and POSIX.
* [MESOS-6784] - IOSwitchboardTest.KillSwitchboardContainerDestroyed is flaky.
* [MESOS-6790] - Wrong task started time in webui.
* [MESOS-6794] - Properly model header dependencies of cmake build components.
* [MESOS-6816] - Allows frameworks to overwrite system environment variables.
* [MESOS-6942] - CMake build with `-DENABLE_LIBEVENT=ON` requires system-installed `openssl`.
* [MESOS-6949] - SchedulerTest.MasterFailover is flaky.
* [MESOS-7007] - filesystem/shared and --default_container_info broken since 1.1.
* [MESOS-7099] - Quota can be exceeded due to coarse-grained offer technique.
* [MESOS-7130] - port_mapping isolator: executor hangs when running on EC2.
* [MESOS-7160] - Parsing of perf version segfaults.
* [MESOS-7215] - Race condition on re-registration of non-partition-aware frameworks.
* [MESOS-7223] - Linux filesystem isolator cannot mount host volume /dev/log.
* [MESOS-7296] - CMake 2.8.10 does not support TIMESTAMP.
* [MESOS-7312] - Update Resource proto for storage resource providers.
* [MESOS-7425] - ImageAlpine/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/3 is flaky in some OS.
* [MESOS-7440] - Various DefaultExecutorCheckTest* tests flaky on ASF CI.
* [MESOS-7500] - Command checks via agent lead to flaky tests.
* [MESOS-7504] - Parent's mount namespace cannot be determined when launching a nested container.
* [MESOS-7509] - CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper fails on some Linux distros.
* [MESOS-7511] - CniIsolatorTest.ROOT_DynamicAddDelofCniConfig is flaky.
* [MESOS-7519] - OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky.
* [MESOS-7541] - Cannot compile without pre-compiled headers on Windows.
* [MESOS-7586] - Make use of cout/cerr and glog consistent.
* [MESOS-7589] - CommandExecutorCheckTest.CommandCheckDeliveredAndReconciled is flaky.
* [MESOS-7660] - HierarchicalAllocator uses the default filter instead of a very long one.
* [MESOS-7661] - Libprocess timers with long durations trigger immediately.
* [MESOS-7704] - Remove use of #pragma comment (lib, "IPHLPAPI.lib").
* [MESOS-7726] - MasterTest.IgnoreOldAgentReregistration test is flaky.
* [MESOS-7729] - ExamplesTest.DynamicReservationFramework is flaky.
* [MESOS-7741] - SlaveRecoveryTest/0.MultipleSlaves has double free corruption.
* [MESOS-7781] - Windows API GetVersionExW was declared deprecated.
* [MESOS-7784] - MasterTestPrePostReservationRefinement.CreateAndDestroyVolumesV1 is flaky.
* [MESOS-7791] - subprocess' childMain using ABORT when encountering user errors.
* [MESOS-7811] - libprocess-tests depend on gtest but it's not setup.
* [MESOS-7828] - Current approach to parse protobuf enum from JSON does not support upgrades.
* [MESOS-7835] - CMake build does not support Marathon.
* [MESOS-7851] - Master stores old resource format in the registry.
* [MESOS-7867] - Master doesn't handle scheduler driver downgrade from HTTP based to PID based.
* [MESOS-7873] - Expose `ExecutorInfo.ContainerInfo.NetworkInfo` in Mesos `state` endpoint.
* [MESOS-7877] - Audit test code for undefined behavior in accessing container elements.
* [MESOS-7917] - Docker statistics not reported on Windows.
* [MESOS-7921] - ProcessManager::resume sometimes crashes accessing EventQueue.
* [MESOS-7923] - Make args optional in mesos port mapper plugin.
* [MESOS-7927] - The composing containerizer leaks memory in some scenarios.
* [MESOS-7929] - `Metrics()` hangs on second call on Windows.
* [MESOS-7945] - MasterAPITest.EventAuthorizationFiltering is flaky.
* [MESOS-7963] - Task groups can lose the container limitation status.
* [MESOS-7964] - Heavy-duty GC makes the agent unresponsive.
* [MESOS-7968] - Handle `/proc/self/ns/pid_for_children` when parsing available namespace.
* [MESOS-7969] - Handle cgroups v2 hierarchy when parsing /proc/self/cgroups.
* [MESOS-7972] - SlaveTest.HTTPSchedulerSlaveRestart test is flaky.
* [MESOS-7975] - The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed.
* [MESOS-7978] - Lint javascript files to enable linting.
* [MESOS-7980] - Stout fails to compile with libc >= 2.26.
* [MESOS-7988] - Mesos attempts to open handle for the system idle process.
* [MESOS-7993] - Fix Windows header orderings.
* [MESOS-7996] - ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.
* [MESOS-7997] - ContentType/MasterAPITest.CreateAndDestroyVolumes is flaky.
* [MESOS-7998] - PersistentVolumeEndpointsTest.UnreserveVolumeResources is flaky.
* [MESOS-8000] - DefaultExecutorCniTest.ROOT_VerifyContainerIP is flaky.
* [MESOS-8001] - PersistentVolumeEndpointsTest.NoAuthentication is flaky.
* [MESOS-8003] - PersistentVolumeEndpointsTest.SlavesEndpointFullResources is flaky.
* [MESOS-8010] - AfterTest.Loop is flaky.
* [MESOS-8027] - os::open doesn't always atomically apply O_CLOEXEC.
* [MESOS-8035] - Correct mesos-tests CMake build dependencies.
* [MESOS-8039] - A broken connection during LaunchNestedContainer call might result in the nested container not being cleaned up.
* [MESOS-8046] - MasterTestPrePostReservationRefinement.ReserveAndUnreserveResourcesV1 is flaky.
* [MESOS-8048] - ReservationEndpointsTest.GoodReserveAndUnreserveACL is flaky.
* [MESOS-8051] - Killing TASK_GROUP fail to kill some tasks.
* [MESOS-8052] - "protoc" not found when running "make -j4 check" directly in stout.
* [MESOS-8057] - Apply security patches to AngularJS and JQuery in the Mesos UI.
* [MESOS-8058] - Agent and master can race when updating agent state.
* [MESOS-8066] - Pylint report errors in apply-reviews.py on Ubuntu 14.04.
* [MESOS-8070] - Bundled GRPC build does not build on Debian 8.
* [MESOS-8076] - PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy is flaky.
* [MESOS-8080] - The default executor does not propagate missing task exit status correctly.
* [MESOS-8082] - updateAvailable races with a periodic allocation and leads to flaky tests.
* [MESOS-8084] - Double free corruption in tests due to parallel manipulation of signal and control handlers.
* [MESOS-8085] - No point in deallocate() for a framework for maintenance if it is deactivated.
* [MESOS-8090] - Mesos 1.4.0 crashes with 1.3.x agent with oversubscription.
* [MESOS-8093] - Some tests miss subscribed event because expectation is set after event fires.
* [MESOS-8095] - ResourceProviderRegistrarTest.AgentRegistrar is flaky.
* [MESOS-8116] - Fix off by-one error in Windows long path support.
* [MESOS-8119] - ROOT_DOCKER_DockerHealthyTask segfaults in debian 8.
* [MESOS-8121] - Unified Containerizer Auto backend should check xfs ftype for overlayfs backend.
* [MESOS-8123] - GPU tests are failing due to TASK_STARTING.
* [MESOS-8135] - Masters can lose track of tasks' executor IDs.
* [MESOS-8136] - Update XFS isolator tests to handle TASK_STARTING.
* [MESOS-8157] - Review #62775 broke the build.
* [MESOS-8159] - ns::clone uses an async signal unsafe stack.
* [MESOS-8165] - TASK_UNKNOWN status is ambiguous.
* [MESOS-8169] - Incorrect master validation forces executor IDs to be globally unique.
* [MESOS-8171] - Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop.
* [MESOS-8173] - Improve fetcher exit status message.
* [MESOS-8178] - UnreachableAgentReregisterAfterFailover is flaky.
* [MESOS-8179] - Scheduler library has incorrect assumptions about connections.
* [MESOS-8180] - Port mesos-fetcher to Windows.
* [MESOS-8200] - Suppressed roles are not honoured for v1 scheduler subscribe requests.
* [MESOS-8217] - Don't run linters on every commit.
* [MESOS-8220] - Can't build with Visual Studio 15.5.
* [MESOS-8223] - Master crashes when suppressed on subscribe is enabled.
* [MESOS-8225] - Port os::which to Windows.
* [MESOS-8237] - Strip (Offer|Resource).allocation_info for non-MULTI_ROLE schedulers.
* [MESOS-8245] - SlaveRecoveryTest/0.ReconnectExecutor is flaky.
* [MESOS-8249] - Support image prune in mesos containerizer and provisioner.
* [MESOS-8263] - ResourceProviderManagerHttpApiTest.ConvertResources is flaky.
* [MESOS-8267] - NestedMesosContainerizerTest.ROOT_CGROUPS_RecoverLauncherOrphans is flaky.
* [MESOS-8272] - Fall back to bind mounting container devices.
* [MESOS-8279] - Persistent volumes are not visible in Mesos UI using default executor on Linux.
* [MESOS-8280] - Mesos Containerizer GC should set 'layers' after checkpointing layer ids in provisioner.
* [MESOS-8282] - Take pending offer operations into account when calculating framework allocated resources.
* [MESOS-8288] - SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect is flaky.
* [MESOS-8289] - ReservationTest.MasterFailover is flaky when run with `RESOURCE_PROVIDER` capability.
* [MESOS-8293] - Reservation may not be allocated when the role has no quota.
* [MESOS-8297] - Built-in driver-based executors ignore kill task if the task has not been launched.
* [MESOS-8312] - Pass resource provider information to master as part of UpdateSlaveMessage.
* [MESOS-8315] - ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider is flaky.
* [MESOS-8316] - Tests that fetch docker images might be flaky due to insufficient wait timeout.
* [MESOS-8318] - OfferOperationStatusUpdateManagerTest tests fail on Windows.
* [MESOS-8320] - Expose information about local resource providers in master.
* [MESOS-8325] - Mesos containerizer does not properly handle old running containers.
* [MESOS-8337] - Invalid state transition attempted when agent is lost.
* [MESOS-8339] - Quota headroom may be insufficiently held when role has more reservation than quota.
* [MESOS-8341] - Agent can become stuck in (re-)registering state during upgrades.
* [MESOS-8344] - Improve JSON v1 operator API performance.
* [MESOS-8346] - Resubscription of a resource provider will crash the agent if its HTTP connection isn't closed.
* [MESOS-8349] - When a resource provider driver is disconnected, it fails to reconnect.
* [MESOS-8350] - Resource provider-capable agents not correctly synchronizing checkpointed agent resources on reregistration.
* [MESOS-8352] - Resources may get over allocated to some roles while fail to meet the quota of other roles.
* [MESOS-8356] - Persistent volume ownership is set to root despite of sandbox owner (frameworkInfo.user) when docker executor is used.
* [MESOS-8369] - CI build failure compiling volume_profile.proto.
* [MESOS-8376] - Bundled GRPC does not build on Debian 9.
* [MESOS-8377] - RecoverTest.CatchupTruncated is flaky.
* [MESOS-8391] - Mesos agent doesn't notice that a pod task exits or crashes after the agent restart.
* [MESOS-8393] - SLRP NewVolumeRecovery and LaunchTaskRecovery tests CHECK failures.
* [MESOS-8410] - Reconfiguration policy fails to handle mount disk resources.
* [MESOS-8417] - Mesos can get "stuck" when a Process throws an exception.
* [MESOS-8419] - RP manager incorrectly setting framework ID leads to CHECK failure.
* [MESOS-8422] - Master's UpdateSlave handler not correctly updating terminated operations.
* [MESOS-8443] - Fix Docker Containerizer PATH on Windows so Docker is usable.
* [MESOS-8444] - GC failure causes agent miss to detach virtual paths for the executor's sandbox.
* [MESOS-8446] - Agent miss to detach `virtualLatestPath` for the executor's sandbox during recovery.
* [MESOS-8460] - `Slave::detachFile` can segfault because it could use invalid Framework*.
* [MESOS-8461] - SLRP should no assume a CSI plugin always has GetNodeID implemented.
* [MESOS-8469] - Mesos master might drop some events in the operator API stream.
* [MESOS-8480] - Mesos returns high resource usage when killing a Docker task.
* [MESOS-8481] - Agent reboot during checkpointing may result in empty checkpoints.
* [MESOS-8514] - SLRP failed to connect to CSI endpoint.
** Documentation
* [MESOS-5078] - Document TaskStatus reasons.
* [MESOS-7663] - Update the documentation to reflect the addition of reservation refinement.
* [MESOS-8007] - Add documentation for MARK_AGENT_GONE call.
* [MESOS-8303] - Add user doc for agent reconfiguration.
* [MESOS-8304] - Update CHANGELOG to call out agent reconfiguration feature.
* [MESOS-8310] - Document container image garbage collection.
** Epic
* [MESOS-1739] - Allow slave reconfiguration on restart.
* [MESOS-4945] - Garbage collect unused docker layers in the store.
* [MESOS-7235] - Improve Storage Support using Resource Provider and CSI.
* [MESOS-7289] - Support Container Storage Interface (CSI).
* [MESOS-7302] - Support launching standalone containers.
* [MESOS-7749] - Support gRPC client.
** Improvement
* [MESOS-564] - Update Contribution Documentation.
* [MESOS-5675] - Add support for master capabilities.
* [MESOS-5771] - Add benchmark test for shared resources.
* [MESOS-5902] - CMake should generate protobuf definitions for Java.
* [MESOS-6350] - Raise minimum required cmake version.
* [MESOS-6390] - Ensure Python support scripts are linted.
* [MESOS-6971] - Use arena allocation to improve protobuf message passing performance.
* [MESOS-7306] - Support mount propagation for host volumes.
* [MESOS-7330] - Add resource provider to offer.
* [MESOS-7361] - Command checks via agent pollute agent logs.
* [MESOS-7370] - Fix create symlink code to use flag which enables non-admins to make symlinks.
* [MESOS-7497] - Remove CMake anti-pattern of `set(x "${x} ..")`.
* [MESOS-7616] - Consider supporting changes to agent's domain without full drain.
* [MESOS-7675] - Isolate network ports.
* [MESOS-7695] - Add heartbeats to master stream API.
* [MESOS-7737] - Harden Mesos when building with cmake.
* [MESOS-7785] - Pass Operator API subscription events through authorizer.
* [MESOS-7795] - Remove "latest" symlink after agent reboot.
* [MESOS-7798] - Improve libprocess message passing performance.
* [MESOS-7837] - Propagate resource updates from local resource providers to master.
* [MESOS-7840] - Add Mesos CLI command to list active tasks.
* [MESOS-7842] - Basic sandbox GC metrics.
* [MESOS-7861] - Include check output in the DefaultExecutor log.
* [MESOS-7880] - Add an option to skip the Mesos style check when applying a review chain.
* [MESOS-7889] - Avoid Multiple PROTOC invocations when generating Protobuf & GRPC code in libprocess.
* [MESOS-7895] - ZK session timeout is unconfigurable in agent and scheduler drivers.
* [MESOS-7916] - Improve the test coverage of the DefaultExecutor.
* [MESOS-7924] - Add a javascript linter to the webui.
* [MESOS-7941] - Send TASK_STARTING status from built-in executors.
* [MESOS-7951] - Design Doc for Extended KillPolicy.
* [MESOS-7961] - Display task health in the webui.
* [MESOS-7962] - Display task state counters in the framework page of the webui.
* [MESOS-7973] - Non-leading VOTING replica catch-up.
* [MESOS-7987] - Initialize Google Mock rather than Google Test.
* [MESOS-8012] - Support Znode paths for masters in the new CLI.
* [MESOS-8015] - Design a scheduler (V1) HTTP API authenticatee mechanism.
* [MESOS-8016] - Introduce modularized HTTP authenticatee.
* [MESOS-8017] - Introduce a basic HTTP authenticatee.
* [MESOS-8021] - Update HTTP scheduler library to allow for modularized authenticatee.
* [MESOS-8034] - Remove LIBNAME_VERSION from EXTERNAL.
* [MESOS-8040] - Return nested/standalone containers in `GET_CONTAINERS` API call.
* [MESOS-8072] - Change Mesos common events verbose logs to use VLOG(2) instead of 1.
* [MESOS-8074] - Change Libprocess actor state transitions verbose logs to use VLOG(3) instead of 2.
* [MESOS-8078] - Some fields went missing with no replacement in api/v1.
* [MESOS-8115] - Add a master flag to disallow agents that are not configured with fault domain.
* [MESOS-8117] - Update Getting Started documentation.
* [MESOS-8221] - Use protobuf reflection to simplify downgrading of resources.
* [MESOS-8286] - Making bind mounts readonly fails with user namespaces.
* [MESOS-8294] - Support container image basic auto gc.
* [MESOS-8295] - Add excluded image parameter to containerizer::pruneImages() interface.
* [MESOS-8301] - Support moving into defer/dispatch/install handlers.
* [MESOS-8302] - Improve master failover performance.
* [MESOS-8328] - Improve logs displayed after a slave failed recovery.
* [MESOS-8358] - Create agent endpoints for pruning images.
* [MESOS-8365] - Create AuthN support for prune images API.
* [MESOS-8421] - Duration operators drop precision, even when used with integers.
* [MESOS-8455] - Avoid unnecessary copying of protobuf in the v1 API.
** Task
* [MESOS-3107] - Define CMake style guide.
* [MESOS-3110] - Harden the CMake system-dependency-locating routines.
* [MESOS-3384] - Include libsasl in Windows CMake build.
* [MESOS-3437] - Port flags_tests.
* [MESOS-4527] - Roles can exceed limit allocation via reservations.
* [MESOS-6193] - Make the docker/volume isolator nesting aware.
* [MESOS-6709] - Enable HTTP and TCP health checks on Windows.
* [MESOS-6714] - Port `slave_tests.cpp`.
* [MESOS-6733] - Windows: Enable authentication to the master.
* [MESOS-6894] - Checkpoint 'ContainerConfig' in Mesos Containerizer.
* [MESOS-7284] - Allow Mesos CLI to take masters IP.
* [MESOS-7285] - Implement a plugin to list container's on a given agent.
* [MESOS-7303] - Support Isolator capabilities.
* [MESOS-7305] - Adjust the recover logic of MesosContainerizer to allow standalone containers.
* [MESOS-7328] - Validate offer operations for converting disk resources.
* [MESOS-7388] - Update allocator interfaces to support resource providers.
* [MESOS-7443] - Add the MARK_AGENT_GONE call to the Operator v1 API protos.
* [MESOS-7444] - Add support for storing gone agents to the master registry.
* [MESOS-7445] - Implement the API handler on the master for marking agents as gone.
* [MESOS-7446] - Add authorization for the MARK_AGENT_GONE call.
* [MESOS-7448] - Add support for pruning the list of gone agents in the registry.
* [MESOS-7469] - Add resource provider driver.
* [MESOS-7491] - Build a CSI client to talk to a CSI plugin.
* [MESOS-7533] - Add a function stub for resource provider re-registration.
* [MESOS-7534] - Notify resource providers if they've been reregistered.
* [MESOS-7535] - Distinguish between active and inactive resource providers in RP Manager.
* [MESOS-7550] - Publish Local Resource Provider resources in the agent before container launch or update.
* [MESOS-7555] - Add resource provider IDs to the registry.
* [MESOS-7557] - Test that resource providers can reregister after agent fails over.
* [MESOS-7561] - Add storage resource provider specific information in ResourceProviderInfo.
* [MESOS-7578] - Write a proposal to make the I/O Switchboards optional.
* [MESOS-7594] - Implement 'apply' for resource provider related operations.
* [MESOS-7757] - Update master to handle updates to agent total resources.
* [MESOS-7790] - Design hierarchical quota allocation.
* [MESOS-7807] - Docker executor needs to return multiple IP addresses for the container.
* [MESOS-7892] - Filter results of `/state` on agent by role.
* [MESOS-7899] - Expose sandboxes using virtual paths and hide the agent work directory.
* [MESOS-7936] - Move sandbox path volume logic to 'volume/sandbox_path' isolator.
* [MESOS-7982] - Create Centos 6/7 RPM package.
* [MESOS-7985] - Use ASF CI for automating RPM packaging and upload to bintray.
* [MESOS-7992] - Enable OpenSSL build on Windows.
* [MESOS-8013] - Add test for blkio statistics.
* [MESOS-8032] - Launch CSI plugins in storage local resource provider.
* [MESOS-8050] - Mesos HTTP/HTTPS health checks for IPv6 docker containers.
* [MESOS-8060] - Introduce first class 'profile' for disk resources.
* [MESOS-8071] - Add agent capability for resource provider.
* [MESOS-8075] - Add ReadWriteLock to libprocess.
* [MESOS-8079] - Checkpoint and recover layers used to provision rootfs in provisioner.
* [MESOS-8086] - Update ACCEPT call handler in master for new operations.
* [MESOS-8087] - Add operation status update handler in Master.
* [MESOS-8088] - Introduce Lamport timestamp for offer operations.
* [MESOS-8089] - Add messages to publish resources on a resource provider.
* [MESOS-8097] - Add filesystem layout for local resource providers.
* [MESOS-8098] - Benchmark Master failover performance.
* [MESOS-8099] - Add protobuf for checkpointing resource provider states.
* [MESOS-8100] - Authorize standalone container calls from local resource providers.
* [MESOS-8101] - Import resources from CSI plugins in storage local resource provider.
* [MESOS-8102] - Add a test CSI plugin for storage local resource provider.
* [MESOS-8107] - Add a call to update total resources in the resource provider API.
* [MESOS-8108] - Process offer operations in storage local resource provider.
* [MESOS-8130] - Add placeholder handlers for offer operation feedback.
* [MESOS-8131] - Add new protobuf messages for offer operation feedback.
* [MESOS-8132] - Design a library to send offer operation status updates.
* [MESOS-8139] - Upgrade protobuf to 3.5.x.
* [MESOS-8141] - Add filesystem layout for storage resource providers.
* [MESOS-8143] - Publish and unpublish storage local resources through CSI plugins.
* [MESOS-8181] - Add tests that a failed offer operation on resource provider resources leads to a clock update.
* [MESOS-8183] - Add a container daemon to monitor a long-running standalone container.
* [MESOS-8186] - Implement the agent's AcknowledgeOfferOperationMessage handler.
* [MESOS-8187] - Enable LRP to send operation status updates, checkpoint, and retry using the SUM.
* [MESOS-8193] - Update master’s OfferOperationStatusUpdate handler to acknowledge updates to the agent if OfferOperationID is not set.
* [MESOS-8195] - Implement explicit offer operation reconciliation between the master, agent and RPs.
* [MESOS-8196] - Propagate failures from applying offer operations from resource providers.
* [MESOS-8197] - Implement a library to send offer operation status updates.
* [MESOS-8198] - Update the ReconcileOfferOperations protos.
* [MESOS-8199] - Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.
* [MESOS-8207] - Reconcile offer operations between resource providers, agents, and master.
* [MESOS-8211] - Handle agent local resources in offer operation handler.
* [MESOS-8218] - Support `RESERVE`/`CREATE` operations with resource providers.
* [MESOS-8222] - Add resource versions to RunTaskMessage.
* [MESOS-8244] - Add operator API to reload local resource providers.
* [MESOS-8251] - Introduce a way to resolve the "profile" for disk resources.
* [MESOS-8265] - Add state recovery for storage local resource provider.
* [MESOS-8269] - Support resource provider re-subscription in the resource provider manager.
* [MESOS-8270] - Add an agent endpoint to list all active resource providers.
* [MESOS-8309] - Introduce a UUID message type.
* [MESOS-8375] - Use protobuf reflection to simplify upgrading of resources.
* [MESOS-8394] - Bump CSI to 0.1.0.
Release Notes - Mesos - Version 1.4.2 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-4527] - Roles can exceed limit allocation via reservations.
* [MESOS-6616] - Error: dereferencing type-punned pointer will break strict-aliasing rules.
* [MESOS-7099] - Quota can be exceeded due to coarse-grained offer technique.
* [MESOS-7504] - Parent's mount namespace cannot be determined when launching a nested container.
* [MESOS-7975] - The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed.
* [MESOS-8125] - Agent should properly handle recovering an executor when its pid is reused.
* [MESOS-8159] - ns::clone uses an async signal unsafe stack.
* [MESOS-8171] - Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop.
* [MESOS-8237] - Strip (Offer|Resource).allocation_info for non-MULTI_ROLE schedulers.
* [MESOS-8293] - Reservation may not be allocated when the role has no quota.
* [MESOS-8297] - Built-in driver-based executors ignore kill task if the task has not been launched.
* [MESOS-8339] - Quota headroom may be insufficiently held when role has more reservation than quota.
* [MESOS-8352] - Resources may get over allocated to some roles while fail to meet the quota of other roles.
* [MESOS-8356] - Persistent volume ownership is set to root despite of sandbox owner (frameworkInfo.user) when docker executor is used.
* [MESOS-8411] - Killing a queued task can lead to the command executor never terminating.
* [MESOS-8480] - Mesos returns high resource usage when killing a Docker task.
* [MESOS-8488] - Docker bug can cause unkillable tasks.
* [MESOS-8550] - Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`.
* [MESOS-8552] - CGROUPS_ROOT_PidNamespaceForward and CGROUPS_ROOT_PidNamespaceBackward tests fail.
* [MESOS-8569] - Allow newline characters when decoding base64 strings in stout.
* [MESOS-8574] - Docker executor makes no progress when 'docker inspect' hangs.
* [MESOS-8575] - Improve discard handling for 'Docker::stop' and 'Docker::pull'.
* [MESOS-8576] - Improve discard handling of 'Docker::inspect()'.
* [MESOS-8604] - Quota headroom tracking may be incorrect in the presence of hierarchical reservation.
* [MESOS-8605] - Terminal task status update will not send if 'docker inspect' is hung.
* [MESOS-8651] - Potential memory leaks in the `volume/sandbox_path` isolator.
* [MESOS-8876] - Normal exit of Docker container using rexray volume results in TASK_FAILED.
* [MESOS-8881] - Enable epoll backend in libevent integration.
* [MESOS-8885] - Disable libevent debug mode.
Release Notes - Mesos - Version 1.4.1
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-7873] - Expose `ExecutorInfo.ContainerInfo.NetworkInfo` in Mesos `state` endpoint.
* [MESOS-7921] - ProcessManager::resume sometimes crashes accessing EventQueue.
* [MESOS-7964] - Heavy-duty GC makes the agent unresponsive.
* [MESOS-7968] - Handle `/proc/self/ns/pid_for_children` when parsing available namespace.
* [MESOS-7969] - Handle cgroups v2 hierarchy when parsing /proc/self/cgroups.
* [MESOS-7980] - Stout fails to compile with libc >= 2.26.
* [MESOS-8051] - Killing TASK_GROUP fail to kill some tasks.
* [MESOS-8080] - The default executor does not propagate missing task exit status correctly.
* [MESOS-8090] - Mesos 1.4.0 crashes with 1.3.x agent with oversubscription
* [MESOS-8135] - Masters can lose track of tasks' executor IDs.
* [MESOS-8169] - Incorrect master validation forces executor IDs to be globally unique.
Release Notes - Mesos - Version 1.4.0
-------------------------------------------
This release contains the following new features:
* [MESOS-5116] - The `disk/xfs` isolator now supports the
`--enforce_container_disk_quota` flag to efficiently measure disk
usage without enforcing usage constraints.
* [MESOS-6223] - Agents are now allowed to recover the agent ID
after a host reboot. See docs/upgrades.md for details.
* [MESOS-6375] - **Experimental** Support for hierarchical resource
allocation roles. Hierarchical roles allows delegation of resource
allocation policies (i.e. fair sharing and quota) further down the
hierarchy. For example, the "engineering" organization gets a 75%
share of the resources, but it's up to the operators within the
"engineering" organization to figure out how to fairly share between
the "engineering/backend" team and the "engineering/frontend" team.
The same delegation applies for quota. NOTE: There are known issues
related to hierarchical roles (e.g. hierarchical quota allocation
is not implemented and quota will be over-allocated if used with
hierarchical roles, see: MESOS-7402) and thus it is not recommended
for production usage at this time.
* [MESOS-7418, MESOS-7088] - File-based secrets are now supported for Mesos
and Universal containerizer. Image-pull secrets are supported for Docker
registry credentials.
* [MESOS-7477] - Linux ambient capabilites are now supported, so
frameworks can run tasks that use ambient capabilites to grant
limited additional privileged to tasks.
* [MESOS-7476, MESOS-7671] - Support for frameworks and operators
specifying Linux bounding capabilities in order to limit the
maximum privileges that a task may acquire.
Deprecations/Removals:
* [MESOS-7671] - LinuxInfo.capabilities is deprecated in favor
of LinuxInfo.effective_capabilities.
* [MESOS-7477] - The agent `--allowed_capabilities` flag is
deprecated in favor of `--effective_capabilities`
Unresolved Critical Issues:
* [MESOS-7643] - The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically
* [MESOS-7402] - Quota is over-allocated when used with hierarchical roles.
Additional API Changes:
* [MESOS-7755] The interpretation of the optional resource argument
passed in `Allocator::updateSlave` was changed from the total
amount of oversubscribed resources on the agent to the new total
resources (both revocable and non-revocable) on the agent. Custom
allocator implementation should be changed to interpretation of the
passed value as a total before updating.
Feature Graduations:
* [MESOS-2533] - Support HTTP checks in Mesos.
* [MESOS-3567] - Support TCP checks in Mesos.
All Resolved Issues:
** Bug
* [MESOS-1987] - Add support for SemVer build and prerelease labels to stout.
* [MESOS-4210] - Investigate increasing protobuf protocol message size limit.
* [MESOS-4331] - git commit-msg hook completely breaks fixup commits.
* [MESOS-4467] - Implement `sleep` in Windows
* [MESOS-4983] - Segfault in ProcessTest.Spawn with GCC 6
* [MESOS-4992] - sandbox uri does not work outisde mesos http server
* [MESOS-5187] - The filesystem/linux isolator does not set the permissions of the host_path.
* [MESOS-5903] - `GTEST_IS_THREADSAFE` guards prevent many tests from being run on Windows.
* [MESOS-5937] - `flags::parse` assumes the filesystem is rooted at '/'
* [MESOS-5938] - `net::links` is not implemented on Windows.
* [MESOS-6115] - Source tree contains compiled protobuf source
* [MESOS-6539] - Compile warning in GMock: "binding dereferenced null pointer to reference"
* [MESOS-6743] - Docker executor hangs forever if `docker stop` fails.
* [MESOS-6814] - Make sure compilation configuration is propagated correctly to third party dependencies
* [MESOS-6817] - Audit the use of UNICODE-related code paths
* [MESOS-6916] - Improve health checks validation.