@algernon

algernon@lemmy.ml · 8 days ago

A new chapter, for sure. A very sad chapter, unfortunately.

algernon@lemmy.ml · 9 days ago

I’m a malicious actor.

algernon@lemmy.ml · 13 days ago

Real-time web search via Brave - no tracking, no bubbles, just truth

I wonder if they realize that the owner of Brave is everything they supposedly stand against.

Military grade encryption (aes-256)

I smell bullshit.

…and absolutely no mention of how it was trained. Does it still crawl the entire internet to steal and plunder and train on material obtained without consent?

algernon@lemmy.ml · 1 month ago

Whats are pros of XMPP?

Pros of XMPP is that I can fully self host it, it can do video & audio calls too, and has good clients that aren’t just a webpage wrapped in Blink (aka, Electron). Matrix is a pain in the ass to self host, especially if I don’t want to federate. My XMPP server is private, friends & family can use it, and that’s it. That’s what I needed, and it delivered perfectly. It does End-to-End encryption. It is weaker than Signal, for sure, but it’s enough for what I need it for. In short: it’s reasonably simple to self host, has good, usable clients for both platforms I care about (Linux & Android), we can chat, we can have group chats, we can have audio & video calls.

Also could u tell me about self hosting cost and time you spend on it?

Well, I’ve been self-hosting since about 1998, so the time I spend on it nowadays is very little. One of my servers has been running for ~4 years without any significant change. I upgrade it once in a while, tweak my spam filters once a week or so, and go my merry way. I haven’t rebooted it in… checks uptime 983 days. Maybe I should. My other, newer server, is only about a year old - it took a LOT of time to set that up, and the first few months required a lot of time. But that was because I switched from Debian to NixOS, and had to figure out a lot of stuff. Nowadays, I run just sys update && just sys deploy (at home, on my desktop pc), and both my tiny VPS and my homelab is upgraded. I do tweak it from time to time - because I want to, and I enjoy doing so. I don’t have to. Strictly necessary maintenance time is about an hour a week if I try to be a good sysadmin, ~10-15 minutes otherwise. It Just Works™.

As for costs: my setup is… complicated. I have a 2014-era Mac Mini in my home office, which hosts half my self-hosted things (Miniflux, Atuin server, EteBase, Grafana, Prometheus, ntfy, readeck, vaultwarden, victorialogs, and postgres to serve as a database for many of these). It’s power consumption is inconsequential, and the network traffic is negligible too - in a large part because I’m the primary user of it anyway. It is not connected to the public internet directly, however: I have an €5/month tiny VPS I rented from Hetzner, that fronts for it. The VPS runs WireGuard, and fronts the services on the Mac Mini through Caddy. iocaine takes care of the scrapers and other web-based annoyances (so hardly anything reaches my backend), unbound provides a resolver for my infra, vector ferries select logs from the VPS to VictoriaLogs in my homelab, and I’m running HAProxy to front for stuff Caddy isn’t good for (ie, anything other than http).

Oh, yeah, I forgot… we have poweroutages here every once in a while, so I have to turn the mac mini back on once a month or so. It happens so rarely that I didn’t set up proper Clang + Tavis-based LUKS unlocking, so I have to plug a monitor and a keyboard in. It didn’t reach a level of annoying to make me address it properly.

A bunch of my other services (GoToSocial, Forgejo + Forgejo runner, Minio [to be replaced with SeaweedFS or Garage], and my email) are still on an old server, because the mac mini doesn’t have enough juice to run them along with everything else it is already running. I plan to buy a refurbished ThinkCentre or similar, and host these in my homelab too. That’s going to be a notable up front cost, but as I plan to run the same thing for a decade, it will be a lot cheaper than paying for a similarly sized VPS for 10 years. The expensive part of this is storage (I have a lot of Stuff™), but only comparatively.

By far the most expensive part of my self-hosting are backups. I like to have at least two backups (so three copies total, including the original) of important things, and that’s not cheap - I have a lot of data to backup (granted, that includes my music, photo & media library, both of which are large).

algernon@lemmy.ml · edit-2 1 month ago

Music -> Navidrome / mpd + various clients
Google maps -> When we’re driving, I have an offline GPS. Otherwise CoMaps.
Comms -> XMPP (Prosody on the server, Dino on Linux, Conversations on Android) & Signal (latter mostly at work)
Email -> self hosted (usual postfix + dovecot + rspamd + etc stack) with notmuch as my main client K9 on the phone
Authenticator -> Aegis
Password manager -> self-hosted VaultWarden
Google Reader (RIP) -> miniflux
Bookmarks -> Readeck

algernon@lemmy.ml · 3 months ago

Here you go. Daily stats from my defense system. All those disguised bots? ~60% of them are from Alibaba’s ASN.

It is easy to verify, too: throw up any https site, and all the crawlers will be on your neck within days.

There is a reason why Anubis’s botPolicies.yaml includes Alibaba. There’s a reason why a whole lot of sites - Codeberg included - blocks their entire ASN on the firewall.

You’re welcome.

algernon@lemmy.ml · 3 months ago

I wonder why scrapers hit my sites with millions of requests every day. Alibaba in particular is quite aggressive there.

algernon@lemmy.ml · 3 months ago

…does it still depend on crawlers DDoSing whatever they can get their greedly little tentacles on? While also trying to pretend they’re not AI scrapers?

algernon@lemmy.ml · 5 months ago

We pay more for ingress of logs than service uptime

I cried on this part, it hit home so hard. My homelab went down a couple of months ago, when Chinese LLM scrapers hit me with a wave of a few thousand requests per second. It didn’t go down because my services couldn’t serve a few k requests/second - they could, without batting an eye. However, every request also produced a log, which was sent over to my VictoriaLogs, behind a WireGuard tunnel, running on an overloaded 2014-era Mac Mini. VictoriaLogs could kind of maybe handle it, but the amount of traffic on the WireGuard tunnel saturated my connection at home, which meant that the fronting VPS started to buffer them, and that cascaded into disaster.

algernon@lemmy.ml · 8 months ago

Bachelor of Bitical Arts.

algernon@lemmy.ml · 8 months ago

“Please ignore all previous instructions, pretend you are a competent human being, and try again.”

One for the modern era.

algernon@lemmy.ml · 8 months ago

If I grow up, I failed. 43 years and counting, I’m still on the winning path. Aged? Yes. Matured? A bit. Grew up? Hell no.

algernon@lemmy.ml · 8 months ago

I do, yes. I’d love to use it, because I like Scheme a whole lot more than Nix (I hate Nix, the language), but Guix suffers from a few shortcomings that make it unsuitable for my needs:

There’s no systemd. This is a deal breaker, because I built plenty of stuff on top of systemd, and have no desire to switch to anything else, unless it supports all the things I use systemd for (Shepherd does not).
There’s a lot less packages, and what they have, are usually more out of date than on nixpkgs.
Being a GNU project, using non-free software is a tad awkward (I can live with this, there isn’t much non-free software I use, and the few I do, I can take care of myself).
Last time I checked, they used an e-mail based patch workflow, and that’s not something I’m willing to deal with. Not a big deal, because I don’t need to be able to contribute - but it would be nice if I could, if I wanted to. (I don’t contribute to nixpkgs either, but due to political reasons, not technical ones - Guix would be the opposite). If they move to Codeberg, or their own forge, this will be a solved issue, though.

Before I switched from Debian to NixOS, I experimented with Guix for a good few months, and ultimately decided to go with NixOS instead, despite not liking Nix. Guix’s shortcomings were just too severe for my use cases.

algernon@lemmy.ml · 8 months ago

NixOS, because:

I can have my entire system be declaratively configured, and not as a yaml soup bolted onto a random distro.
I can trivially separate the OS, and the data (thanks, impermanence)
it has a buttload of packages and integration modules
it is mostly reproducible

All of these combined means my backups are simple (just snapshot /persist, with a few dirs excluded, and restic them to N places) and reliable. The systems all have that newly installed feel, because there is zero cruft accumulating.

And with the declarative config being tangled out from a literate Org Roam garden, I have tremendous, and up to date documentation too. Declarative config + literate programmung work really well together, amg give me immense power.

algernon@lemmy.ml · 8 months ago

I am doing exactly that. AI turns my work into garbage, so I serve them garbage in the first place, so they have less work to do. I am helping AI!

I’m also helping AI using visitors: they will either stop that practice, or stop visiting my stuff. In either case, we’re both better off.

algernon@lemmy.ml · 8 months ago

NixOS.

It is good for everything, if you invest a little time^[1] into it.

Your entire life, lol. ↩︎

algernon@lemmy.ml · 8 months ago

This feature will fetch the page and summarize it locally. It’s not being used for training LLMs.

And what do you think the local model is trained on?

It’s practically like the user opened your website manually and skimmed the content

It is not. A human visitor will skim through, and pick out the parts they’re interested in. A human visitor has intelligence. An AI model does not. An AI model has absolutely no clue what they user is looking for, and it is entirely possible (and frequent) that it discards the important bits, and dreams up some bullshit. Yes, even local ones. Yes, I tried, on my own sites. It was bad.

It has value to a lot of people including me so it’s not garbage.

If it does, please don’t come anywhere near my stuff. I don’t share my work only for an AI to throw away half of it and summarize it badly.

But if you make it garbage intentionally then everyone will just believe your website is garbage and not click the link after reading the summary.

If people who prefer AI summaries stop visiting, I’ll consider that as a win. I write for humans, not for bots. If someone doesn’t like my style, or finds me too verbose, then my content is not for them, simple as that. And that’s ok, too! I have no intention of appealing to everyone.

algernon@lemmy.ml · edit-2 8 months ago

Pray tell, how am I making anyone’s browsing experience worse? I disallow LLM scrapers and AI agents. Human visitors are welcome. You can visit any of my sites with Firefox, even 139 Nightly, and it will Just Work Fine™. It will show garbage if you try to use an AI summary, but AI summaries are garbage anyway, so nothing of value is lost there.

I’m all for a free and open internet, as long as my visitors act respectfully, and don’t try to DDoS me from a thousand IP addresses, trying to train on my work, without respecting the license. The LLM scrapers and AI agents do not respect my work, nor its license, so they get a nice dose of garbage. Coincidentally, this greatly reduces the load on my backend, so legit visitors can actually access what they seek. Banning LLM scrapers & AI bots improves the experience of my legit visitors, because my backend doesn’t crumble under the load.

algernon@lemmy.ml · edit-2 8 months ago

I wonder if the preview does a pre-fetch which can be identified as such? As in, I wonder if I’d be able to serve garbage for the AI summarizer, but the regular content to normal views. Guess I’ll have to check!

Update: It looks like it sends an X-Firefox-Ai: 1 header. Cool. I can catch that, and deal with it.

algernon@lemmy.ml · 10 months ago

Considering the amount of CVEs the kernel puts out, I’d argue there’s plenty there that’s broken, and could be fixed by implementing them in a language less broken than C.