-
Notifications
You must be signed in to change notification settings - Fork 171
Add rd.kiwi.install.retain_last deployment option #2895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@davidcassany This is a feature I wanted to implement for a long time. There are several customers who came to me with the question if they can build a disk image and install it on one disk retaining the data of a I'm particularly interested in your feedback regarding the feature itself and of course the code as it's all done in the kiwi dump dracut code Thanks much in advance |
02d8337 to
034f34e
Compare
rd.kiwi.install.retain_last deployment option
rd.kiwi.install.retain_last deployment optionrd.kiwi.install.retain_last deployment option
rd.kiwi.install.retain_last deployment option034f34e to
94ff624
Compare
|
Integration test builds here: Retain profile |
|
Procedure for testing: # Deploy to an empty disk
qemu-img create mydisk 10g
qemu-kvm -cdrom ...iso -hda mydisk -boot d
# login to the system and create some data on the data part
mkfs.ext4 ...on_the_last_part
mount and add some data
# reboot and install the same image again
qemu-kvm -cdrom ...iso -hda mydisk -boot d
# data on the data part must still be there
# rebuild the image and change e.g the size of the root partition
# now try to install again to the same disk as used before
# the installation must end with an error that the address for retaining the last part does not match
# the only way to install the new image is by dropping rd.kiwi.install.retain_lastI did more tests, feel free to apply testing as you see fit |
|
@schaefi interesting feature. A disclaimer from my side, I am exposing my comments around the ideas of how I would have implemented it without being familiar enough into the details of the current KIWI dracut code, it could well be that my first implementation idea turns to be very invasive, hence not that desirable. It could also be that I missed some relevant detail, without actually getting into code it is hard to foresee implementation details that could ruin my idea. From a usability perspective I have some comments, I understand you aim to make partition matching as strict as possible and it definitely has to be this way, I guess we all agree that manipulating a partition table of a disk including data we want to keep is a sensitive process. However I am not sure I see the benefits of ensuring the sector start of the last partition of the target disk matches the last sector start of the image to deploy. IMHO this has several implications on the usability of this feature and, at the same time, I am not seeing this makes the implementation much safer. This forces to make images ad-hoc for a specific deployment infrastructure, the build description requires changes and defining partitions and parameters to perfectly match the target system. For instance, I could imagine a case where someone needs to deploy into many boxes which are all equivalent from an OS perspective but not from hardware perspective, aka, the disk is bigger or smaller in some of them (e.g. because they got replaced and the new one is just bigger). In that case a new image needs to be crafted, which could be a PITA for large deployments if whoever crafted the image is not fully aware of this details. Moreover, if I understand it correctly, if the disk is big and the root partition of the deployed system is also big it means there will be a deployment copying tons zeros, isn't it? Not a big concern but still not a very good performance. Why I am not sure adding this sector match is making the implementation much safer? Because at the end of the day what you relay on is the offset to dd and then recreate the image partition. Wouldn't it be equivalent just ensuring the built image fits, entirely into the area before the last partition to prevent data corruption and then recreate the last one? This way we relax the nuances of this feature at build time, which, IMHO, are a bit counter intuitive (define spare partition last plus a specific system size). Here is how I would have approached it. First, implement a new feature I missed few times. I'd like to make it possible to set the system size at run time as kernel command line parameter. Why is that relevant? This brings an interesting feature for golden OEM images, imagine I have a huge disk and I want to deploy the image I have not created and self expand, but only to a certain size, not full disk which is way too big. I consider this a deployment choice, not a build time choice. Then after the deployment I could create an addition partition in free area manually or automated with the tool of my choice (ansible, cloud-init and friends). The important bit here is that as I user I am deploying a golden image or an image I have not created, so I do not consider building a new one (there might be support constraints across teams or organizations). Imagine something like Second, implement a the retain_last feature as an extension of the feature just suggested above. If
With this approach any expandable image could be used to deploy retaining the last partition without having to define an static and fixed layout at build time. steps 1, 2 and 3 are analysis, checks and setups that could all be done before running the current dump code (which is step 4). Finally step 5 is an additional step we could just add after the dump. What do you think? |
I believe there is a misunderstanding. I guess it's easier to explain what I have done by looking at the partition table. The integration test image I built as the following partition table: As you can see the last partition is just some space that doesn't belong to the OS. For the implementation here it doesn't matter what the last partition is but for the feature to be useful it should be something not belonging to the OS itself. We are still in an imging world, the OS should be a clean entity. This means the the partitions 1-4 belongs to the OS. This also means the address space from 2048 - 6338559 is reserved and any new image deployment of an OS image must not write beyond sector: 6338559. That's all my code is checking when you are in the rd.kiwi.install.retain_last use case and I don't see how you can do this any differently ? In fact only the address space from 2048 - 6338559 gets dumped in the retain case.
I agree and that is a use case I wanted to cover. The point is that the partition layout of what defines the main OS must not change, or if you change it, it must strictly meet the address space requirement. If the above example got deployed on a target system. You can change the image as you want as long as one restriction is not violated.
no why ?
no why ? I can't follow you. The image that makes up the OS is not affected by all of this. If we go with the above example and deploy the image to a large disk, let's say 5TB. The only partition that can grow is the last one, a restriction that you cannot change. Let's say I have rebuilt the image because of some package updates. The new image must meet the requirement
You can do this today. The kiwi resize code supports that.
yes exactly that, customers doing it today. So David, all that already exists and what you suggest is moving parts of the partition setup to be a runtime action when the image boots. That's all fine but has no relation to this feature here. Actually this exact idea was implemented by GE healthcare. They only build the golden image with kiwi and wrote their own "installer" code which applies extra partitioning and stuff during boot on target.
again all that exists, with But again I think we are talking about two different use cases my main goal is to provide an option to the kiwi dump code that allows to retain a data partition not belonging to the OS, which is something I was asked pretty often if kiwi can do this. Did I misunderstood something ? |
Agree, we are on the same page here. My concern is what happens if the system disk is big, imagine we want a deployment with a system partition of 64 GB and the data partition is the rest of the disk (no matter how big, we agree on that too).
Probably this is the root cause of our misalignment, let's get into the code, probably I am not understanding it correctly. In
Ok, yes I did assume a lot in that statement. I imagined layout deployment requirements set around the data partition, not the system itself. So something like data partition defined to a fixed size and the rest is used by the system, then in baremetal deployments of disks with different sizes this means a different start sector, which in turn means a new image build. I assume we do not control the data partition because it is already there and that probably in a large deployment over a bunch of boxes the last partition is not necessarily starting in the same sector for all of them.
I could be wrong, but this is how I understand it. I am imagining a deployment where the last partition is something like 4TB over a disk of 5TB (leaving first 1TB for the OS). Then we should define an image with around one 1TB for the OS. An image that is not expanding, hence dding this full TB despite being emtpy, isn't it? |
Yes you are of course correct with this. I might question why it is a good idea to set a fixed size of 1TB for only the OS, but you are fully correct. If someone does that the dump of the OS will span 1TB in any deployment. The initial one, as well as any subsequent one, independent if we retain the last (data partition of 4TB) or not. As such I wonder why you bring up this as an issue connected to retaining the last partition ? If a customer is forced to specify a size for the OS partition because after the OS there is another one (e.g a data partition) and that customers decided that this OS partition must be 1TB of size, then the customer is producing a heavy deployment task by this image design for the simple blob dump based "installation" procedure If you are creating an image without specifying a data partition such that the OS partition is last and you use a dynamic resize value e.g
At this point you now have a relatively complicated parttable fixup task to perform. You actually need the parttable prior deployment, ok that's easy. Next you need to inspect where the rootfs partition ended and correct that in the table. Next you need to add the missing parttable entries (actually add the ones not existing before). This is all doable, but a real frankenstein surgery and you cannot boot into the system until this fixup was performed. As such no mistakes allowed :) Except for the mentioned fixup code, you can produce such an image but I really would not recommend it. For a nice data retainment I had the following layout in mind:
That's it and you can update the OS part image based at any time and none of your data segments will ever be touched by it. I would also say it's relatively save. The only restriction that comes with this design is that the rootfs is a fixed value |
|
I can update the integration test build to implement that design, if it helps to showcase what I mean ? |
ed21e8f to
213518b
Compare
|
I changed the integration test to add a As stated by David, the size of the OS is a fixed value. That's the restriction that comes with the feature. I still believe this is a good feature @davidcassany feedback very much welcome |
Instructs an OEM installation to retain the contents of the last partition on the target disk. This setting is only useful if the last partition does not belong to the main OS e.g. an extra data partition added via the spare_part attribute in the type setup of the image description. The implementation also checks if the start address of the last partition on the target disk matches with the start adress of the image to be deployed. Only if they match the data on the last partition can be retained.
213518b to
f36e2ef
Compare
The current kiwi-resize code was restricted to the root partition for historical reasons. As in a partition table only the last partition can resize, this should be the only limitation for the resize code to perform its job. In connection with the rd.kiwi.install.retain_last feature it is also very likely that the last partition is not the root partition but it should be properly restored by the resize code after deployment
f36e2ef to
b46fe77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated by David, the size of the OS is a fixed value. That's the restriction that comes with the feature.
I still believe this is a good feature
Yes you are right, this is a way to keep logic constrained and simple, which means safer and less error prone code in very sensitive areas. Agree.
I think now I better understand the use case you had in mind. Where there is a KIWI based deployment and at a later point in time the same deployment is renewed from an image rebuild for some reason, however the last spare partition of data is kept (of course the rebuild needs to match the same layout). This is a nice feature and the implementation makes sense I buy it. Specially if last partition is can be expanded too.
The use case I had in mind, which now I presume you are not considering of supporting, is there is a server somewhere including data in a partition we need to keep, but at the same time we want to deploy a KIWI based image in there keeping the already existing partition data. In that context having to match the exact sector makes the image build convoluted and forces the OS to be as big as the portion of the disk existing before the partition data. In that context the layout is a given and it could based on any random criteria which is not really meaningful from the OS PoV. My idea was in the spirit of what if we assume there could be empty space between the OS and the partition data? So instead of aiming for a perfect match of the last sector if the OS vs the first sector of the spare data we just assume last OS sector is lower than first partition data sector. This idea is probably a bit tricky to implement with the current last partition resize logic.
I am fine with the feature and the implementation, I think it is good improvement. I am curious about your thoughts about my idea of not requiring a perfect sector match while keeping the OS size fixed and defined as part of the image. I am suggesting it because I have the feeling the implementation would be almost the same as it is now and it could also cover the use case I imagined from the very beginning. That said I am not strongly suggesting to go for it now, it might also be a wise choice to wait for the demand and need before implementing it. In any case considering if this would require a relevant changes or not is an interesting exercise to do now, as it could help to improve current implementation.
|
@davidcassany Thanks much for the great review, much appreciated
An interesting thought as well. So let's assume we add something like the following in a subsequent PR With this setup in place the current address match if [ ! "${source_start}" = "${target_start}" ];then
report_and_quit "Cannot retain partition, start address mismatch"
fiwould then change to if [ "${source_start}" -gt "${target_start}" ];then
report_and_quit "Cannot retain system data, OS would overlap"
fiThe challenges I see are:
So I believe the biggest challenges are, where is data and is all data following the OS A situation on a server in which partitions are mixed and spread across the disk, or even multiple disks would make a deployment of an image OS to such a disk pretty complicated. So I think also the image to some_server deployment needs some constraints set that a user has to know before making use of the feature I'm happy to take the rest of the conversation offline, it would for sure be a very powerful feature Thanks |
No, not really, I my head I was also assuming ONLY the last partition can be retained and I'd compute the
Yes, that was my initial idea to somehow solve the above challenge, just assume the OS size will be defined by the first sector of the last partition. So it internally defines the equivalent of |
ok
yes the resize must be based on the address situation of a pre-dump stored partition table. It shouldn't be much of a problem. You take the address from that table instead of the table that you see after the OS image got dumped. Exactly this then can create the gap I talked about if you do not apply a strict address match. But that's ok also that gap can be closed afterwards. The real thing to absolutely make sure is that the OS does not overlap with the data partition
yep I got that and makes sense. It's also the reason why we have to store the real partition table on target before the dump. Because if the parttable of the image is not complete (with regards to the target disk) we have to store it beforehand such that we can sneak in the missing bits later |
Instructs an OEM installation to retain the contents of the last partition on the target disk. This setting is imho only useful if the last partition does not belong to the main OS e.g. an extra data partition added via the spare_part attribute in the type setup of the image description. The implementation also checks if the start address of the last partition on the target disk matches with the start address of the image to be deployed. Only if they match the data on the last partition can be retained.