Massive data backup question: What Linux software do you folks recommend for helping sort out and organize terabytes of files and remove duplicates?

over_clox@lemmy.world · edit-2 2 months ago

Massive data backup question: What Linux software do you folks recommend for helping sort out and organize terabytes of files and remove duplicates?

Alas Poor Erinaceus@lemmy.ml · 2 months ago

For duplicates: Czkawka. Also, you get a gold ⭐ if you can figure out how to pronounce it 😉

Dessalines@lemmy.ml · 2 months ago

Nightly rsync job in crontab works well enough, if its an external hard drive.

If you’re going over a network, syncthing.

Nine@lemmy.world · 2 months ago

I’ve abused syncthing in some many ways migrating servers and giant data sets. It’s freaking amazing. Though it’s been a few years since I’ve used it. Can only guess how much better it’s gotten.

over_clox@lemmy.world · 2 months ago

‘An’ drive? I mean like 10+ drives, looking to do a master backup.

Dessalines@lemmy.ml · 2 months ago

Rsync then.

over_clox@lemmy.world · 2 months ago

Please do explain then.

I have multiple drives with various differing directory trees.

Dessalines@lemmy.ml · 2 months ago

I have no idea what your setup is so you’ll need to do your own research on rsync.

over_clox@lemmy.world · 2 months ago

That’s just it, there is no setup, except Linux Mint as the main system. It’s literally a physical bucket of discs and drives in all sorts of various formats…

MrPoopbutt@lemmy.world · 2 months ago

Isnt syncthing no longer supported?

Does that even matter if it isnt?

Dessalines@lemmy.ml · 2 months ago

Syncthing is very much alive.

🧟‍♂️ Cadaver@lemmy.world · 2 months ago

Syncthing has been discontinued on android (but a fork exists)

lordnikon@lemmy.world · 2 months ago

I have had good luck with Dupeguru

JTskulk@lemmy.world · 2 months ago

fdupes to find duplicate files, freefilesync to back it up.

MonkderVierte@lemmy.ml · edit-2 2 months ago

That is filesystem-level. Btrfs and i think ZFS? have deduplication built in.

Btrfs gave me 150 GB on my 2 TB gaming disk that way.

solrize@lemmy.world · edit-2 2 months ago

I’m using Borg and it’s fine at that scale. I don’t know if it would still be viable with 100TB or whatever. The initial backup will be kind of slow but it encrypts everything, and deduplicates it too if I’m not mistaken. In any case, it deduplicates the common situation where you back up another snapshot later. Only the differences get written in the second backup. So you can save new snapshots fairly quickly and without much additional space.

over_clox@lemmy.world · 2 months ago

I don’t even want this data encrypted. Quite the opposite actually.

This is mostly the category of files getting deleted from the Internet Archive every day. I want to preserve what I got before it gets erased…

solrize@lemmy.world · 2 months ago

You can turn off Borg encryption but maybe what you really want is an object store (S3 style). Those exist too.

billwashere@lemmy.world · 2 months ago

Honestly I maintain a list of file types I care about and copy those off. It’s mostly things I’ve created or specifically accumulated. Things like mp3, mkv, gcode, stl, jpeg, doc, txt, etc. Find all of those and copy them off. I also find any files over a certain size and copy them off unless they are things like library files, dlls, that sorta thing. Am I possible going to kiss something, yeah. But I’ll get most of the things I care about.

over_clox@lemmy.world · 2 months ago

Not everything is an individual file though, a lot of the stuff needs to be stored and maintained as bulk folders.

I mod operating systems and occasionally games, plus write software. I can’t just dump off all text files into a single folder, that’ll just dump off all readme.txt files off into a single TXT folder, losing association with the project folders from which they came.

billwashere@lemmy.world · 2 months ago

Isn’t all the code in git somewhere? I would totally do that for code projects.

I do the same thing with arduino code so I know where you’re coming from.

over_clox@lemmy.world · 2 months ago

Not my code, I didn’t even have internet access when I started programming.

billwashere@lemmy.world · 2 months ago

I feel you. I started coding before the internet even existed (well technically it existed, just nobody had access to it)

serenissi@lemmy.world · 2 months ago

Not recommending software. As you mentioned old hard disks, it is better to copy the files or better dd them on a ssd. That way making index and finding duplicates will be faster cause you’ve to access files once and not care about fragmentation if you dd.

just_another_person@lemmy.world · 2 months ago

Deduping only works for a single target or context at a time, so if you’re working with many drives, you’ll need to sort your data into unified locations on the backup target first, THEN run dedupe tools against it all.

Second, if all of your data from these drives fits uncompressed on the target drive, rsync will be the fastest to get the data from A to B.

over_clox@lemmy.world · 2 months ago

Of course.

Goal #1 is to migrate what data I can (which is a fucking lot) all over to the 4TB, in separate folders for each drive. Only after that will I worry with scanning for dupes and organizing things.

I’m just looking for advice on what software is recommend for helping deal with such large tasks in advance.

I’ve actually got 2X 4TB drives plus a single 2TB drive. But yeah, I know the best and easiest way is to consolidate it all on one drive first.

just_another_person@lemmy.world · 2 months ago

Then rsync is your friend, like so rsync -avzp /drive1/ /target2/drive1/

That will copy all the files from drive1 to a destination folder in the backup drive called ‘drive1’.

over_clox@lemmy.world · 2 months ago

Joy oh joy, I got like 75+ optical discs and like 10+ hard drives (whatever still works) to back up.

This is already gonna take months I know, just my free time at the end of the day.

This is gonna be fun. /s

Thank you and everyone for the advice though.

Side note, I think one of my drives has almost all the SNES game ROMS…