Scibot!

fossilesque@mander.xyz · 22 days ago

Scibot!

Deebster@programming.dev · 21 days ago

Have they taken out the AI generated papers? We know that training LLMs on LLM-generated text leads to an absolute collapse in quality, and we also know that AI has been showing up in papers so if they haven’t, then this will be quite unreliable.

brucethemoose@lemmy.world · edit-2 21 days ago

We know that training LLMs on LLM-generated text leads to an absolute collapse in quality.

This is often repeated, and true. But needs to be qualified.

Modern LLMs use tons and tons of “augmented” data, which is code for LLM generated or massaged data. Some is even generated during training, and judged; papers on that are what made Deepseek famous.

Training on LLM trash will, of course, yield greater trash, and obviously good text has to come from something real. But that’s because slop is slop. And there are issues with “deep frying” LLMs, yes, but simply training on LLM on LLM output does not necessarily reduce quality. It often helps, significantly.

And we also know that AI has been showing up in papers so if they haven’t, then this will be quite unreliable.

Now this is a problem.

TBH LLMs would be pretty good at flagging papers for humans to check, similar to what Wikipedia is already doing. But yeah, if you just feed a prompt bad papers, LLMs just assume the context is true, generally, and that’s a tremendous problem.

T156@lemmy.world · 21 days ago

I would be surprised if it was something that they trained themselves, and not an off the shelf model hooked up to a search.

brucethemoose@lemmy.world · edit-2 21 days ago

It’s probably their own search/RAG backend, or at least their configuration of some open source project.

And that’s the important part. Get the article retrieval right, and the LLM performance isn’t that important; they could self-host Qwen 27B or something and it’d work fine.

Oriion@jlai.lu · 21 days ago

And without hallucinations ??? That sounds freaking awesome

a_non_monotonic_function@lemmy.world · 21 days ago

Of course not.

Madrigal@lemmy.world · 21 days ago

Yeah they added “Don’t hallucinate” to the prompt.

fartographer@lemmy.world · 21 days ago

Seems like the kind of prompt a hallucination would say

TrackinDaKraken@lemmy.world · 21 days ago

What fun would that be?

MithranArkanere@lemmy.world · 21 days ago

If research was funded with public money, be it government money or from people buying their products, then that research belongs to the people.

gh0stb4tz@lemmy.world · 21 days ago

Why does the URL have a Russian government domain (.ru)? Consider me highly skeptical.

exixx@lemmy.world · 21 days ago

Because Alexandra Elbakyan lives in Russia. One of the official sci hub homes is .ru also

fossilesque@mander.xyz · 21 days ago

⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️

melsaskca@lemmy.ca · 21 days ago

Those chilling FBI warnings on old videotapes mean absolutely nothing to me now.

foiledAgain@lemmy.world · 21 days ago

Getting hugged to death

TrackinDaKraken@lemmy.world · 21 days ago

I stared at it, and didn’t know what to ask, so I closed it.

Psychodelic@lemmy.world · 21 days ago

Uh… it gave me ~45 min wait time and then gave up. lol

Sounds neat tho

SnarkoPolo@lemmy.world · edit-2 21 days ago

Too right! Why, if regular people can get science for free, Capitalism might not profit!

Iusedtobeanalien@lemmy.world · 21 days ago

Could have just called it Claude

Avicenna@programming.dev · 20 days ago

academic publishers that charge thousands of euros for publishing articles are scum of the earth.