-
-
Notifications
You must be signed in to change notification settings - Fork 812
Description
As about a week ago i and @ThomasWaldmann both were dumbfounded and wasted several hours on issue #9245 i have thought long and hard about how the handling of symlinks could be improved significantly, and here is what i have come up with.
In the following, “mount” or "mount command" refers to borg mount of an archive (or set of archives), not to repository structure itself.
The Problem
suppose you are backing up your home folder on a linux system
What causes the problem
said home folder contains a path: /home/martin/annoyingly/long/path/which/you/dont/wanna/type/out/every/time along with a host of other files, so you create a symlinks /home/martin/annoyingpath which points towards this path and often enough (especially when automatically created by some program), this symlink will point to the absolute path and borg will back it up exactly like that
What does it do
well after youve created your borg repo and backed up all your data for months, maybe one day a file in this very annoying path gets corrupted, overridden, whatever. you have decided you want to restore the file. as such you mount the borg repository and go to yesterdays archive you have mounted perhaps in /mnt/borg/repo and go look at /mnt/borg/repo/yesterday/home/martin/annoyingpath.
and what you you see?
the actual real file in your home folder, not the files in the backup archive
but why?
well because borg has backed up the absolute-path-symlink exactly, and of course it will point to the absolute path, but in the cli or the file explorer program that may not be immediately clear, and it may also not be always as easy to find, as with ttps://github.com//issues/9245 for example it was a absolute symlink which was hidden in a more than 10 folder long path.
additionally because accessing this folder may have become muscle memory one might not immediately think this is an issue, it is what has happened to me anyway.
How to fix it?
well there are some easy and some harder ways to fix this
The easiest, most non-invasive and allmost trivial option
Borg upon calling borg mount could and should warn the user, that symlinks can and will point outside of the borg repository. a warning could look like this:
$borg mount <repo> <mountpoint>
WARNING: this repository may contain symlinks. Symlinks can and will point outside of the Backed up Folder Structure. When following symlinks you may see files in the repository which are not contained in the repository, or are different versions of the files in the repository
or perhaps alternatively
$borg mount <repo> <mountpoint>
WARNING: the mounted archive may contain symlinks that point outside the archive tree.
When following symlinks you may see files that are not part of the archive, or different
versions of files than those stored in the archive.
this or a similar warning does not prohibit other options from being used/implemented
This alone would have prevented the confusion in #9245 , costs almost nothing to implement, and does not alter existing semantics in any way, shape or form.
A much more expansive, possibly controversial and more versatile option
This option is nontrivial to implement and would present a strong warning to the user in case of a symlink pointing outside of the backup it is in, as well as fail hard rather than let the user look into a mounted archive that appears correct, but will present wrong files under the right(or wrong) circumstances
It would give options to circumvent the hard failure by either acknowloging the possibility of a symlink caused problem, or altering the symlinks presented in the mount with ample and strong warning as well as user confirmation to enable a exploratory view for the archive/repository
The default mount behavior remains a faithful representation of the archive; any mode that alters or disables symlinks is explicitly intended for inspection and analysis only, not for restore. Additionally any such mode will usually only be activated after a hard failure, when the user is cautious anyway
When backing up
when backing up a symlink, borg could save both the absolute path of the symlink as well as the absolute path of the file or folder it points to, or(not an exclusive or) if necessary just the relevant relative pathes assuming the backup is not of a root level folder like /home as well as the information wether the symlink points outside of the folder structure borg backs up as metadata into the archive.
what does this allow?
for one this allows borg to have the info wether a symlink points outside of the backed up folder stucture
secondly this allows computing a relative path to a file or folder to which a symlink points to
what could be done with this
for one this allows borg to know if and archive contains a symlink which points outside of the archive folder structure, or a repository and would allow displaying a warning to the user, as mentioned above as well as requiring confirmation to mount the repository with these problematic symlinks present
additionally this would enable handling symlinks when mounted in different ways depending on options given.
for example:
- one could introduce an option to disable symlinks pointing outside of the backed up folder structure, enabled by default and warned about when mounting, which shows symlinks as text containing files, which contain information about the symlink, such as the path it points to, converted both relative and absolute, such as if it was investigated with
ls -lwith an additional line above explaining what it is, and one below giving the relative or absolute path depending on what the symlink is(if it was absolute, then show the absolute e.g.) - one could introduce a second option on how to display symlinks unaltered by the previous option. this option would specify 4 behaviours. untouched, absolute, relative, disable. Untouched is self explanatory. Absolute would convert all absolute symlinks to ones pointing within the archive they are in(
/homewould become for example/mnt/<mountpoint>/<archive>/homeand disable outward facing symlinks. Relative would convert all symlinks to relative pathes inside the archive and disable outward facing symlinks similar to absolute, except the symlinks would show as pointing towards relative pathes. Disable would obviously disable all symlinks as described in the "disable symlinks" option above - a third optional option could be introduced, "--print-offending-symlinks" or something which does not alter mount behaviour but prints a list of offending symlinks as they are in the repo/archive being mounted, displayed as ls -l would, one line per symlink
as well as some override options( may be something like this)
--dont-ask-symlink-ignore-confirmation
--dont-ask-symlink-alteration-confirmation
obviously care needs to be taken to ensure these options are understood before they are used, as a backup containing symlinks should not be restored when they are used. both options should thus be do not disable and untouched by default and a thorough warning should be printed when mounting with one or both these options in use, as well as user confirmation asked. Alternatively the mount command could fail hard with an offending symlink present, unless both options are set explicitly. Additionally with a offending symlink present a strong warning and user confirmation should be presented when mounting, especially when mounting with options that would alter symlink behaviour in the mountpoint.
the following pseudocode could be how the mount command is handled
if(no offending symlinks present)
mount normally
else if( offending symlinks present)
if(one or both options is not set)
mount command fails
print list of offending symlinks if option is given
print strong warning/error about offending symlink being present and why the mount command has failed
print strong suggestion to use borg extract for extracting from repo/archive
print informational about the two/three options and why they must be used in this case, as well as the manpage part about them
print "resistance is futile" joke or "you must comply" joke
else if( both options are set to "do not disable" and "untouched")
print list of offending symlinks if option is given
print strong warning about symlinks pointing outside of the repo/archive folder structure
print strong warning about being careful to not restore file that arent actually part of the backup
print strong suggestion to use borg extract for extracting from repo/archive
request user confirmation by typing something like "my-symlinks-may-point-outside-of-my-mountpoint" ( overridden by --dont-ask-symlink-ignore-confirmation)
mount normally
else if( one or both options are specified and it is not "do not disable and untouched")
print list of offending symlinks if option is given
print strong warning about symlinks having been altered for displaying in mountpoint
print strong warning about being careful to not restore symlinks from this mount
print strong suggestion to use borg extract for extracting from repo/archive
request user confirmation by typing something like "my-symlinks-are-altered-and-i-definitely-wont-restore-any-symlinks-from-this-mount" ( overridden by --dont-ask-symlink-alteration-confirmation)
mount with symlinks altered according to options
Algorithm part
this part is basically redundant, you can ignore it at this time, i am leaving it present in case i need to add more algorithms later
a folder in a path is what is contained between two "/"es as well as the following "/"
algorithm for calculating a relative path from 2 absolute pathes( or 2 relative ones assuming they originate from the same place) of a symlink
this seems to be basically what os.path.relpath() does, but just in case i will leave this here, but the algorithm is probably redundant
this algorithm assumes both pathes are within the backup
source is path of symlink
target is where symlink points
remove preceding ".", "./" or "/" from both pathes
while the lowest level folders match between source and target
remove the lowest level folder from both pathes
replace all folders still preceding the name of the symlink in source with ../
replace the name of the symlink in source with what is left of target
return source
and here are the source and target variables with each step of the algorithm
/home/martin/annoyingly/long/path/which/you/dont/wanna/type/out/every/time
/home/martin/folder/annoyingpath
home/martin/annoyingly/long/path/which/you/dont/wanna/type/out/every/time
home/martin/folder/annoyingpath
martin/annoyingly/long/path/which/you/dont/wanna/type/out/every/time
martin/folder/annoyingpath
annoyingly/long/path/which/you/dont/wanna/type/out/every/time
folder/annoyingpath
annoyingly/long/path/which/you/dont/wanna/type/out/every/time
../annoyingpath
annoyingly/long/path/which/you/dont/wanna/type/out/every/time
../annoyingly/long/path/which/you/dont/wanna/type/out/every/time
return value is "../annoyingly/long/path/which/you/dont/wanna/type/out/every/time"