For Data-Guzzling AI Companies, the Internet Is Too Small

0nekoneko7@lemmy.world · 8 months ago

For Data-Guzzling AI Companies, the Internet Is Too Small

elshandra@lemmy.world · edit-2 8 months ago

Idk, I find this hard to believe. I would think the challenge is more access to the information (gates, bandwidth), a speedy vault to store that information, and improving their models.

When you think about what’s available on the internet, how much of human knowledge and propaganda is out there. With enough/deus ex tech, there’s no way ai shouldn’t be able to learn most of anything with the knowledge available, and the right trainers.

General_Effort@lemmy.world · 8 months ago

Yes, it’s BS, like most of the AI takes here.

The kernel of truth is scaling laws:

[T]he Chinchilla scaling law for training Transformer language models suggests that when given an increased budget (in FLOPs), to achieve compute-optimal, the number of model parameters (N) and the number of tokens for training the model (D) should scale in approximately equal proportions.

isles@lemmy.world · 8 months ago

and propaganda

Well, that’s the rub, right? Garbage in, garbage out. For an LLM, the value is predicting the next token, but we’ve seen how racist current datasets can be. If you filter it, there’s not as much lot of high quality data left.

So yes, we have a remarkable amount of (often wrong) information to pull from.

elshandra@lemmy.world · 8 months ago

Mhm, I wonder when we’ll have the resources to build one that can tell the truth from other lies. I suppose you have to learn to crawl before you learn to walk, but these things still having trouble rolling over.

Lvxferre@mander.xyz · 8 months ago

Good. The current approach towards generative models is basically bruteforcing; a constrain on the amount of data available might encourage those companies to refine the approach.

0nekoneko7@lemmy.world · 8 months ago

“Companies also are experimenting with using AI-generated, or synthetic, data as training material – an approach many researchers say could actually cause crippling malfunctions. These efforts are often secret, because executives think solutions could be a competitive advantage.”

AI was supposed to be Assistance for people. Now it’s a competition of who got the better AI for company profits.

Even_Adder@lemmy.dbzer0.com · 8 months ago

Support Open Source developers. Corporations aren’t the only game in town, they just want you to think that.

An easy way to get started with local LMs is LM Studio and Stable Diffusion for Images.

PipedLinkBot@feddit.rocks · 8 months ago

Here is an alternative Piped link(s):

Stable Diffusion

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

Sanctus@lemmy.world · 8 months ago

All will bow to profits as long as Mammon still tops the paradigm. Change the main motivator of the world from profit and you will see this go away. I hate it more than anything. It consumes all and leaves nothing.

elshandra@lemmy.world · 8 months ago

Everything’s a competition for company profits.

QuandaleDingle@lemmy.world · 8 months ago

Everything’s too small for power-hungry corporations with an irrational need for infinite expansion.

isles@lemmy.world · 8 months ago

From Lemmy, this link took me to Slashdot, which took me to The Verge, which took me to the Wall Street Journal, each with a section I can discuss this article.