Skip to content

[Usability Improvement] Improved handling of symlinks #9254

@MartinKurtz

Description

@MartinKurtz

As about a week ago i and @ThomasWaldmann both were dumbfounded and wasted several hours on issue #9245 i have thought long and hard about how the handling of symlinks could be improved significantly, and here is what i have come up with.

In the following, “mount” or "mount command" refers to borg mount of an archive (or set of archives), not to repository structure itself.

The Problem

suppose you are backing up your home folder on a linux system

What causes the problem

said home folder contains a path: /home/martin/annoyingly/long/path/which/you/dont/wanna/type/out/every/time along with a host of other files, so you create a symlinks /home/martin/annoyingpath which points towards this path and often enough (especially when automatically created by some program), this symlink will point to the absolute path and borg will back it up exactly like that

What does it do

well after youve created your borg repo and backed up all your data for months, maybe one day a file in this very annoying path gets corrupted, overridden, whatever. you have decided you want to restore the file. as such you mount the borg repository and go to yesterdays archive you have mounted perhaps in /mnt/borg/repo and go look at /mnt/borg/repo/yesterday/home/martin/annoyingpath.

and what you you see?

the actual real file in your home folder, not the files in the backup archive

but why?

well because borg has backed up the absolute-path-symlink exactly, and of course it will point to the absolute path, but in the cli or the file explorer program that may not be immediately clear, and it may also not be always as easy to find, as with ttps://github.com//issues/9245 for example it was a absolute symlink which was hidden in a more than 10 folder long path.

additionally because accessing this folder may have become muscle memory one might not immediately think this is an issue, it is what has happened to me anyway.

How to fix it?

well there are some easy and some harder ways to fix this

The easiest, most non-invasive and allmost trivial option

Borg upon calling borg mount could and should warn the user, that symlinks can and will point outside of the borg repository. a warning could look like this:

$borg mount <repo> <mountpoint>

WARNING: this repository may contain symlinks. Symlinks can and will point outside of the Backed up Folder Structure. When following symlinks you may see files in the repository which are not contained in the repository, or are different versions of the files in the repository

or perhaps alternatively

$borg mount <repo> <mountpoint>

WARNING: the mounted archive may contain symlinks that point outside the archive tree.
When following symlinks you may see files that are not part of the archive, or different
versions of files than those stored in the archive.

this or a similar warning does not prohibit other options from being used/implemented

This alone would have prevented the confusion in #9245 , costs almost nothing to implement, and does not alter existing semantics in any way, shape or form.

A much more expansive, possibly controversial and more versatile option

This option is nontrivial to implement and would present a strong warning to the user in case of a symlink pointing outside of the backup it is in, as well as fail hard rather than let the user look into a mounted archive that appears correct, but will present wrong files under the right(or wrong) circumstances

It would give options to circumvent the hard failure by either acknowloging the possibility of a symlink caused problem, or altering the symlinks presented in the mount with ample and strong warning as well as user confirmation to enable a exploratory view for the archive/repository

The default mount behavior remains a faithful representation of the archive; any mode that alters or disables symlinks is explicitly intended for inspection and analysis only, not for restore. Additionally any such mode will usually only be activated after a hard failure, when the user is cautious anyway

When backing up

when backing up a symlink, borg could save both the absolute path of the symlink as well as the absolute path of the file or folder it points to, or(not an exclusive or) if necessary just the relevant relative pathes assuming the backup is not of a root level folder like /home as well as the information wether the symlink points outside of the folder structure borg backs up as metadata into the archive.

what does this allow?

for one this allows borg to have the info wether a symlink points outside of the backed up folder stucture
secondly this allows computing a relative path to a file or folder to which a symlink points to

what could be done with this

for one this allows borg to know if and archive contains a symlink which points outside of the archive folder structure, or a repository and would allow displaying a warning to the user, as mentioned above as well as requiring confirmation to mount the repository with these problematic symlinks present

additionally this would enable handling symlinks when mounted in different ways depending on options given.

for example:

  • one could introduce an option to disable symlinks pointing outside of the backed up folder structure, enabled by default and warned about when mounting, which shows symlinks as text containing files, which contain information about the symlink, such as the path it points to, converted both relative and absolute, such as if it was investigated with ls -l with an additional line above explaining what it is, and one below giving the relative or absolute path depending on what the symlink is(if it was absolute, then show the absolute e.g.)
  • one could introduce a second option on how to display symlinks unaltered by the previous option. this option would specify 4 behaviours. untouched, absolute, relative, disable. Untouched is self explanatory. Absolute would convert all absolute symlinks to ones pointing within the archive they are in(/home would become for example /mnt/<mountpoint>/<archive>/home and disable outward facing symlinks. Relative would convert all symlinks to relative pathes inside the archive and disable outward facing symlinks similar to absolute, except the symlinks would show as pointing towards relative pathes. Disable would obviously disable all symlinks as described in the "disable symlinks" option above
  • a third optional option could be introduced, "--print-offending-symlinks" or something which does not alter mount behaviour but prints a list of offending symlinks as they are in the repo/archive being mounted, displayed as ls -l would, one line per symlink

as well as some override options( may be something like this)
--dont-ask-symlink-ignore-confirmation
--dont-ask-symlink-alteration-confirmation

obviously care needs to be taken to ensure these options are understood before they are used, as a backup containing symlinks should not be restored when they are used. both options should thus be do not disable and untouched by default and a thorough warning should be printed when mounting with one or both these options in use, as well as user confirmation asked. Alternatively the mount command could fail hard with an offending symlink present, unless both options are set explicitly. Additionally with a offending symlink present a strong warning and user confirmation should be presented when mounting, especially when mounting with options that would alter symlink behaviour in the mountpoint.

the following pseudocode could be how the mount command is handled

if(no offending symlinks present)
    mount normally
else if( offending symlinks present)
    if(one or both options is not set)
        mount command fails
        print list of offending symlinks if option is given
        print strong warning/error about offending symlink being present and why the mount command has failed
        print strong suggestion to use borg extract for extracting from repo/archive
        print informational about the two/three options and why they must be used in this case, as well as the manpage part about them
        print "resistance is futile" joke or "you must comply" joke

    else if( both options are set to "do not disable" and "untouched")
        print list of offending symlinks if option is given
        print strong warning about symlinks pointing outside of the repo/archive folder structure
        print strong warning about being careful to not restore file that arent actually part of the backup
        print strong suggestion to use borg extract for extracting from repo/archive
        request user confirmation by typing something like "my-symlinks-may-point-outside-of-my-mountpoint" ( overridden by --dont-ask-symlink-ignore-confirmation)
        mount normally

    else if( one or both options are specified and it is not "do not disable and untouched")
        print list of offending symlinks if option is given
        print strong warning about symlinks having been altered for displaying in mountpoint
        print strong warning about being careful to not restore symlinks from this mount
        print strong suggestion to use borg extract for extracting from repo/archive
        request user confirmation by typing something like "my-symlinks-are-altered-and-i-definitely-wont-restore-any-symlinks-from-this-mount" ( overridden by --dont-ask-symlink-alteration-confirmation)
        mount with symlinks altered according to options

Algorithm part

this part is basically redundant, you can ignore it at this time, i am leaving it present in case i need to add more algorithms later

a folder in a path is what is contained between two "/"es as well as the following "/"

algorithm for calculating a relative path from 2 absolute pathes( or 2 relative ones assuming they originate from the same place) of a symlink

this seems to be basically what os.path.relpath() does, but just in case i will leave this here, but the algorithm is probably redundant

this algorithm assumes both pathes are within the backup
source is path of symlink
target is where symlink points

remove preceding ".", "./" or "/" from both pathes

while the lowest level folders match between source and target
    remove the lowest level folder from both pathes

replace all folders still preceding the name of the symlink in source with ../
replace the name of the symlink in source with what is left of target

return source

and here are the source and target variables with each step of the algorithm

/home/martin/annoyingly/long/path/which/you/dont/wanna/type/out/every/time
/home/martin/folder/annoyingpath
home/martin/annoyingly/long/path/which/you/dont/wanna/type/out/every/time
home/martin/folder/annoyingpath
martin/annoyingly/long/path/which/you/dont/wanna/type/out/every/time
martin/folder/annoyingpath
annoyingly/long/path/which/you/dont/wanna/type/out/every/time
folder/annoyingpath
annoyingly/long/path/which/you/dont/wanna/type/out/every/time
../annoyingpath
annoyingly/long/path/which/you/dont/wanna/type/out/every/time
../annoyingly/long/path/which/you/dont/wanna/type/out/every/time
return value is "../annoyingly/long/path/which/you/dont/wanna/type/out/every/time"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions