Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds

leninmummy@lemmy.ml · 1 year ago

Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds

TheSaneWriter@lemmy.thesanewriter.com · 1 year ago

I’m not too surprised, they’re probably downgrading the publicly available version of ChatGPT because of how expensive it is to run. Math was never its strong suit, but it could do it with enough resources. Without those resources, it’s essentially guessing random numbers.

PupBiru@kbin.social · 1 year ago

from what i understand, the big change in chat-gpt4 was that the model could “ask for help” from other tools: for maths, it knew it was a maths problem, transformed it to something a specialised calculation app could do, and then passed it off to that other code to do the actual calculation

same thing for a lot of its new features; it was asking specialised software to do the bits it wasn’t good at

whyrat@lemmy.ml · 1 year ago

Chat GPT will just become a front end for Wolfram Alpha?

Excel@lemmy.megumin.org · 1 year ago

It literally can do that, yes. But the plug-in version is separate and requires a subscription.

PupBiru@kbin.social · 1 year ago

that would actually be great

reverie@lemmy.world · 1 year ago

And those plugins are like beta release quality at best. Even the web searching capability is just meh

DrMux@kbin.social · 1 year ago

My guess is that it’s more a result of overfitting for alignment. Fine-tuning for “safety” (rather, more corporate-friendly outputs).

That is, by focusing on that specific outcome in training the model, they’ve compromised its ability to give well-“reasoned” “intelligent” sounding answers. A tradeoff between aspects of the model.

It’s something that can happen even in simple statistical models. Say you have a scatter plot of data that loosely follows some trend, and you come up with two equations to describe that trend. One is a simple equation that loosely follows it but makes a good general approximation, and the other is a more complicated equation that very tightly fits the existing data. Then you use those two models to predict future data. But you find that the complicated equation is making predictions way off the mark that no longer fit the trend, and the simple one still has a wide error (how far its prediction is from the actual data) but still more or less accurately fits the general trend. In the more complicated equation, you’ve traded predictive power for explanatory power. It describes the data you originally had but it’s not useful for forecasting data that follows.

That’s an example of overfitting. It can happen in super-advanced statistical models like GPT, too. Training the “equation” (or as it’s been called, spicy autocorrect) to predict outcomes that favor “safety” but losing the model’s power to predict accurate “well-reasoned” outcomes.

If that makes any sense.

I’m not a ML researcher or statistician (I just went through a phase in college), so if this is inaccurate I’m open to corrections.

DR_Hero@programming.dev · 1 year ago

I’ve definitely experienced this.

I used ChatGPT to write cover letters based on my resume before, and other tasks.

I used to give it data and tell chatGPT to “do X with this data”. It worked great.
In a separate chat, I told it to “do Y with this data”, and it also knocked it out of the park.

Weeks later, excited about the tech, I repeat the process. I tell it to “do x with this data”. It does fine.

In a completely separate chat, I tell it to “do Y with this data”… and instead it gives me X. I tell it to “do Z with this data”, and it once again would really rather just do X with it.

For a while now, I have had to feed it more context and tailored prompts than I previously had to.

givesomefucks@lemmy.world · 1 year ago

Yep.

Standard VC bullshit.

Burn money providing a lot for nothing to build brand recognition. Then cut free service before bringing out “premium” that at first works better than the original.

Until a bunch of people starting paying and the resources aren’t scaled up to match.

chaogomu@kbin.social · 1 year ago

The important note, the “premium” service works just a bit better than (or maybe identically to) the original before the company cut features in order to develop that “premium” service.

zurohki@aussie.zone · 1 year ago

Stage one and stage three enshittification. You forgot the bit in the middle where they chase business customers.