Machine translators have made it easier than ever to create error-plagued Wikipedia articles in obscure languages. What happens when AI models get trained on junk pages?
Yes! I mean, blame those who post AI-generated translations as if they were their own, or blame the AI scrappers that use those poorly generated pages for training, but it makes no sense to blame Wikipedia when the only thing they have done is just exist there and offer a platform for knowledge sharing.
In fact, this problem is hardly exclusive to Wikipedia, every platform with crowdsourced content is in some level susceptible to AI poisoning which ultimately ends up feeding other AIs, the loop exists in all platforms. Though I understand wanting to highlight particularly the risk of endangered languages being more vulnerable to this, since they have less content available to them so the AI models have a smaller dataset which makes them worse and more sensible to bad data.
If you build the infrastructure for a certain thing to happen, you’re responsible for the thing. For the same reason we hold facebook accountable for the rise of the far-right, we should hold WikiPedia accountable for this stuff. Infrastructure is never neutral.
That is a completely unfair comparison. For starters, Facebook is a for-profit advertising company and Wikipedia is a community-driven encyclopedia and should be judged by different standards
Second, both admins and users can edit Wikipedia when there’s a problem. Everyone is “responsible” for fixing it - or at the very least equally at fault
Next, the content in question. Facebook was (rightfully) given hell for hosting gore, CSAM, adult porn, etc. Things that are immoral, illegal, or outright dangerous. The offending content on Wikipedia is bad translations.
Lastly, the bigger issue is always enforcement of said content. Facebook was made aware of the problem users/pages/uploads and slacked off on doing anything. These Wikipedia pages have very low traffic and weren’t getting reported. And even with reports, Wikipedia then has to consult with people who speak the rare language.
They’re similar problems of vastly different scales
Don’t blame Wikipedia for that wtf
Yes! I mean, blame those who post AI-generated translations as if they were their own, or blame the AI scrappers that use those poorly generated pages for training, but it makes no sense to blame Wikipedia when the only thing they have done is just exist there and offer a platform for knowledge sharing.
In fact, this problem is hardly exclusive to Wikipedia, every platform with crowdsourced content is in some level susceptible to AI poisoning which ultimately ends up feeding other AIs, the loop exists in all platforms. Though I understand wanting to highlight particularly the risk of endangered languages being more vulnerable to this, since they have less content available to them so the AI models have a smaller dataset which makes them worse and more sensible to bad data.
If you build the infrastructure for a certain thing to happen, you’re responsible for the thing. For the same reason we hold facebook accountable for the rise of the far-right, we should hold WikiPedia accountable for this stuff. Infrastructure is never neutral.
That is a completely unfair comparison. For starters, Facebook is a for-profit advertising company and Wikipedia is a community-driven encyclopedia and should be judged by different standards
Second, both admins and users can edit Wikipedia when there’s a problem. Everyone is “responsible” for fixing it - or at the very least equally at fault
Next, the content in question. Facebook was (rightfully) given hell for hosting gore, CSAM, adult porn, etc. Things that are immoral, illegal, or outright dangerous. The offending content on Wikipedia is bad translations.
Lastly, the bigger issue is always enforcement of said content. Facebook was made aware of the problem users/pages/uploads and slacked off on doing anything. These Wikipedia pages have very low traffic and weren’t getting reported. And even with reports, Wikipedia then has to consult with people who speak the rare language.
They’re similar problems of vastly different scales