Skip to content

Add draft for VM time synchronisation decisions #577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

kgube
Copy link
Contributor

@kgube kgube commented Apr 24, 2024

@kgube kgube marked this pull request as draft April 24, 2024 11:11
kgube added 5 commits June 14, 2024 16:42
Signed-off-by: Konrad Gube <konrad.gube@cloudandheat.com>
Signed-off-by: Konrad Gube <konrad.gube@cloudandheat.com>
Signed-off-by: Konrad Gube <konrad.gube@cloudandheat.com>
Signed-off-by: Konrad Gube <konrad.gube@cloudandheat.com>
@kgube kgube changed the title Add draft for VM clock synchronisation recommendations Add draft for VM time synchronisation decisions Aug 21, 2024
@kgube kgube marked this pull request as ready for review August 21, 2024 08:18
@scoopex
Copy link
Contributor

scoopex commented Aug 21, 2024

Had a discussion with @kgube about the motivation , goals and the contents of this DR.

Input/framework conditions that may be useful:
(needs to be evaluated)

  • Its good to add reference to software-systems which might be used by cusomers that use shared quorum algorithms and reference to the relevance of good system time
    (Zookeper, RabbitMQ, ETCD, Consul, Hazelcast, Ceph)
  • Its good to add a reference that using public (internet) NTP servers with the same S-NAT IP might lead to ratelimit situations if dozens of systems in a project are using the same ntp servers because the the NTP servers might see the same IP with dozens of NTP sessions
  • SCS environments itself should be operated with at least 3 central and CSP-local NTP sources (for Ceph, RabbitMQ, ...)
  • Whether overcommit or that a VM is not “scheduled” plays a role for the quality of the time synchronization with the virtualization used must not matter to the user
  • The CSP offers at least three local and not rate limited NTP servers that have at least 5 statically defined upstream stratum servers or local time sources with high quality
  • We can define a minimum quality that is based on the requirements of common systems and provides some reserve to keep popular systems running without problems (offset, jitter, frequency drift, ...)
  • The CSP ensures that a time with a minimum quality can be maintained in VMs with a reference setup
    • defined chrony setup/configuration that uses the min. 3 CSP NTP servers
    • this should be possible with all flavors (in some virtualization technologies the size of the virtual machine has impact to the scheduling of it and related to that to its time sychronization
    • the health check service activates several VMs with a single defined flavor distributed across the CSP landscape (e.g. 3) that run permanently and checks their quality to evaluate the compliance
  • Subordinate, but exciting would be a idea how to provide the flavor images with a standardized setup by default which can be used independent from the CSP (e.g. by using a standardized setup mechanism, or standardized references to the servers)

Signed-off-by: Konrad Gube <konrad.gube@cloudandheat.com>
@kgube
Copy link
Contributor Author

kgube commented Oct 23, 2024

I discussed the potential upstream topic with Neutron Team, and created an RFE issue for it.

The topic will also be discussed during the PTG, it is currently scheduled for the 2014-10-24 15:00 - 16:00 UTC timeslot.

@kgube
Copy link
Contributor Author

kgube commented Feb 19, 2025

I could not attend the PTG unfortunately, but the Topic was discussed and there were some questions on the scope of the feature that were forwarded to the RFE ticket, which I answered.
In particular, both ovn and dnsmasq allow global dhcp-options, so provided that the link-local NTP server address is the same in all subnets (which would be a design goal), we can configure it as a global option and there is no dynamic port-specific DHCP-config necessary.

If we want to proceed with pursuing this feature, it would probably best to track it in a separate issue.
The next step would be to take the RFE Ticket to a Neutron Drivers meeting, get affirmation of the scope of the feature from the team, and ask for guidance on how to proceed with the implementation.

Copy link
Contributor

@artificial-intelligence artificial-intelligence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if jitter times and such should be mandated but I currently don't have much time reviewing and researching this topic in depth, so I don't want to hold this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants