• 0 Posts
  • 11 Comments
Joined 2 years ago
cake
Cake day: July 7th, 2023

help-circle
  • Some details. One of the major players doing the tar pit strategy is Cloudflare. They’re a giant in networking and infrastructure, and they use AI (more traditional, nit LLMs) ubiquitously to detect bots. So it is an arms race, but one where both sides have massive incentives.

    Making nonsense is indeed detectable, but that misunderstands the purpose: economics. Scraping bots are used because they’re a cheap way to get training data. If you make a non zero portion of training data poisonous you’d have to spend increasingly many resources to filter it out. The better the nonsense, the harder to detect. Cloudflare is known it use small LLMs to generate the nonsense, hence requiring systems at least that complex to differentiate it.

    So in short the tar pit with garbage data actually decreases the average value of scraped data for bots that ignore do not scrape instructions.






  • A major caveat I’ve noticed some people misunderstand: it’s corporate CLAs that are problematic. The Apache Foundation also requires contributors sign a CLA, but it’s to provide a legal fail safe and a way to update to say Apache 3.0 if need be one day. Apache’s non profit, open source mission aligns with respecting the rights of contributors and the community. Corporations, on the other hand, not so much.



  • If you want vertical tabs with the ability to organize them in trees I suggest the Sideberry extension. It legitimately makes me nervous that the functionality would ever go away, it improves my productivity so much.

    You can bookmark trees, collapse them, search them, load/unload them manually, I could go on. It makes it easy to organize dozens or hundreds of tabs. I have some trees for emails, news, forums, projects, etc. When I’m done just fold it up: the top tab bar can hide tabs that aren’t in the active tree you’re using, so you can still navigate the tabs normally.