• mindbleach@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          7
          ·
          7 months ago

          Even if the model stops here - did you imagine it’d get this far?

          Humans do all their civilization brouhaha on three pounds of wet meat powered by corn flakes. Most of which evolved for marginal improvements on “grab branch and pull” or “do not pet tiger.” It’s a cosmic accident that’s given us language and music and dubstep. And this stupid trick with a pile of video cards can fake a lot of that, to the point we’re worried the average human will be able to spot the fakes.

          Point being: the miraculous birth of a computer intellect may well arise from “the fact blender.” Or “fancy Wikipedia.” Or “twenty questions, hard mode.” Or any other stupid gimmick that some grad students can cobble together after a 4 AM what-if. Calling this hot mess “spicy autocorrect” is accurate, and in some sense damning, but we had no fucking idea where it’d stop. Emergent properties are chaos. Approximate knowledge of conditions cannot predict approximate outcomes.

          LLMs are still liable to figure out math. That’s a process which gigabytes of linear algebra can obviously do, which would massively improve its ability to guess the next letter in a word problem. It won’t be the kind of AI you can explain calculus to, and then expect it to remember, next time - but getting any portion of the way there is deeply spooky.

          • 0ops@lemm.ee
            link
            fedilink
            English
            arrow-up
            4
            ·
            7 months ago

            Humans do all their civilization brouhaha on three pounds of wet meat powered by corn flakes

            Dude you’re a poet

  • Lugh@futurology.todayOPM
    link
    fedilink
    English
    arrow-up
    40
    ·
    7 months ago

    Added to this finding, there’s a perhaps greater reason to think LLMs will never deliver AGI. They lack independent reasoning. Some supporters of LLMs said reasoning might arrive via “emergent behavior”. It hasn’t.

    People are looking to get to AGI in other ways. A startup called Symbolica says a whole new approach to AI called Category Theory might be what leads to AGI. Another is “objective-driven AI”, which is built to fulfill specific goals set by humans in 3D space. By the time they are 4 years old, a child has processed 50 times more training data than the largest LLM by existing and learning in the 3D world.

    • conciselyverbose@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      33
      ·
      7 months ago

      They can quite possibly be a useful component. They’re the language center of the brain.

      People who ever thought they would actually resemble intelligence were woefully uninformed of how complex intelligence is.

      • CanadaPlus@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        8
        ·
        edit-2
        7 months ago

        How complex is intelligence, though? People who were sure they don’t were drawing from information we don’t actually have.

        • FaceDeer@kbin.social
          link
          fedilink
          arrow-up
          18
          ·
          7 months ago

          Yeah, so many people are confidently stating “LLMs can’t think like humans do!” When we’re actually still pretty unclear on how humans think.

          Sure, an LLM on its own may not be an AGI. But they’re remarkably closer than we would have predicted they could get just a few years ago, and it may well be that we just need to add a bit more “special sauce” (memory, prompting strategies, perhaps a couple of parallel LLMs that specialize in different types of reasoning) to get them over the hump. At this point a lot of the research isn’t going into simply “make it bigger!”, it’s going into “use LLMs smarter.”

        • conciselyverbose@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          9
          ·
          edit-2
          7 months ago

          Obscenely.

          The brain is stacks on stacks of insanely complicated systems. The fact that we know a ridiculous amount about the brain and are barely scratching the surface is exactly the point.

          • CanadaPlus@lemmy.sdf.org
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            7 months ago

            By that measure, we know everything about GPT-2, but again are just scratching the surface of how it works. I don’t think you can draw the conclusion that LLMs can never be intelligent just from that.

            • conciselyverbose@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              4
              ·
              7 months ago

              We “know everything about it” because it’s not that complicated.

              You don’t need to process every individual step a search algorithm has to understand how it works. LLMs are the same thing. They’re just a big box of weighted probabilities. Complexity is more than just having a really big model.

              We have bits and pieces of a lot of parts, but are nowhere near a complete understanding of any of them. We kind of know how neurotransmitters work, we kind of know how hormones work and interact with those neurotransmitters, we mostly know how individual neurons fire, we kind of know what different parts of the brain do, we kind of know how the brain adapts to physical damage…

              We don’t know any of the algorithms it follows. What we do know that it’s a hell of a lot of interconnected parts, and they’re all following very different rules.

              • CanadaPlus@lemmy.sdf.org
                link
                fedilink
                English
                arrow-up
                1
                ·
                7 months ago

                It’s not a search algorithm. If it is, that’s an overfitted model, and it’s detected and rejected. What a good foundation model is doing is just about as mysterious as the brain.

                • conciselyverbose@sh.itjust.works
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  7 months ago

                  It’s fundamentally extremely comparable mathematically and algorithmically. That’s the point. Simulated annealing doesn’t need to understand the search space to find a pretty good answer to a problem. It just needs to know what a good answer approximately looks like and nudge potential answers closer that way.

                  What LLMs are doing is not mysterious at all. Why a specific point in a model is what it is is, but there’s no mystery to the algorithm. We can’t even guess at most of the algorithms that make up the brain.

    • CubitOom@infosec.pub
      link
      fedilink
      English
      arrow-up
      13
      ·
      7 months ago

      I wonder where the line is drawn between an emergent behavior and a hallucination.

      If someone expects factual information and gets a hallucination, they will think the llm is dumb or not helpful.

      But if someone is encouraging hallucinations and wants fiction, they might think it’s an emergent behavior.

      In humans, what is the difference between an original thought, and a hallucination?

      • Umbrias@beehaw.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        7 months ago

        Hallucinations are unlike Human creative output. For one, ai hallucinations are unintentional. There’s plenty of reasons if you actually think about the question why they are not the same. They are at best dreamlike, but dreams are an intentional process.

        • CubitOom@infosec.pub
          link
          fedilink
          English
          arrow-up
          1
          ·
          7 months ago

          Sure there is intentional creative thought. But there are also unintentional creative thoughts. Moments of clarity, eureka moments, and strokes of inspiration. How do we differentiate these?

          If we were to say that it is because of our subconscious is intentionally promoting these thoughts. Then we would need a method to test that, because otherwise the difference is moot.

          Similar to how one might define the I in AGI it’s hard to form a consensus on general and often vague definitions like these.

          • Umbrias@beehaw.org
            link
            fedilink
            English
            arrow-up
            1
            ·
            7 months ago

            You are assigning far more vague grandeur to ai hallucinations than what they are in practice.

            • CubitOom@infosec.pub
              link
              fedilink
              English
              arrow-up
              1
              ·
              7 months ago

              Maybe it’s this arbitrary word, hallucination? Which was recently borrowed from the human experience to explain why something which normally is factual like a computer is not computing facts.

              But if one were to think about it, what is the difference between a series on non factual hallucinations in a model and a person’s individual experience of the world?

              • If two people eat the same food item they might taste different things.
              • they might have different definitions of the same word.
              • they might remember that an object was a different color then someone’s recording could prove. There is a reason why eye witness testimony is considered unreliable in the court of law.

              Before, we called these bugs or even issues. But now that it’s in this black box of sorts that we can’t alter the decision making process of as directly as before. There is this more human sounding name all of a sudden.

              To clarify, when an llm gets a fact wrong because it has limited context or because it’s foundational model is flawed, is that the same result as the experience someone has after consuming psychedelic mushrooms? No, I wouldn’t say so. Nor is it the same when a team of scientists try to make a model actively hallucinate so they can find new chemical compounds.

              Defining words can sometimes be very tricky, especially when they are applying to multiple areas of study. The more you drill into a definition, the more it becomes a metaphysical debate. But it is important to have these discussions because even the definition of something like AGI keeps changing. And infact only exist because the goal posts for a AI moved so much. What will stop a company which is trying to attract investors from just slapping an AGI label on their next release? And how will we differentiate what the spirit of the word is trying to convey from the sales pitch?

              • Umbrias@beehaw.org
                link
                fedilink
                English
                arrow-up
                1
                ·
                7 months ago

                Hallucinations are not qualia.

                Please go talk to an llm for hallucinations, you can use duck duck gos implementation of chatgpt, and see why it’s being used to mean a fairly different thing from human hallucinations.

  • kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    37
    ·
    edit-2
    7 months ago

    OP, you do realize that this paper is about image generation and classification based on related data sets and only relates to the image processing features of multimodal models, right?

    How do you see this research as connecting to the future scope of LLMs?

    And why do you think that the same leap we’ve now seen with synthetic data transmitting abstract capabilities in text data won’t occur with images (and eventually video)?

    Edit: Which LLMs do you see in the models they tested:

    Models. We test CLIP [91] models with both ResNet [53] and Vision Transformer [36] architecture, with ViT-B-16 [81] and RN50 [48, 82] trained on CC-3M and CC-12M, ViT-B-16, RN50, and RN101 [61] trained on YFCC-15M, and ViT-B-16, ViT-B-32, and ViT-L-14 trained on LAION400M [102]. We follow open_clip [61], slip [81] and cyclip [48] for all implementation details.

    • Xerxos@lemmy.ml
      link
      fedilink
      English
      arrow-up
      20
      ·
      7 months ago

      I don’t see how that paper has anything to do with OPs theory.

      • kromem@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        ·
        7 months ago

        I mean, if we’re playing devil’s advocate to the “WTF is OP talking about” position, I can kind of see the argument around how exponential needs for additional training data combined with the ways in which edge cases are underrepresented from synthetic data sources leading to model collapse could be extrapolated to believing that we’ve hit a plateau resulting from a training data bottleneck.

        In theory there’s an inflection point at which models become sophisticated enough that they can self-sustain with generating training data to recursively improve and whether we will hit that point or not is an open question with arguments on both sides.

        I agree that this paper in relation to the title isn’t exactly the best form of the argument, but I can see how someone only kind of understanding what’s being covered could have felt it was confirming their existing beliefs around where models currently are at and will be in the future.

        The only thing I’ll add is that I was just getting a nice laugh out of looking at if Gary Marcus (a common AI skeptic) has ever been right about anything to date, and saw he had a long post about how deep learning was hitting a wall and we were a far way off from LLMs understanding human text…four days before GPT-4 released.

        In my experience, while contrarian positions to continuing research trends can be correct in a “even a broken clock is right twice a day” sense, personally I wouldn’t put my bets on a reversal of a trend that in its pacing and replication seems to be accelerating, not decelerating.

        In particular regarding OP’s claim, the work over the past 18 months with synthetic data sets from GPT-4 giving tiny models significant boosts in critical reasoning skills during fine tuning should give anyone serious pause on “we’re hitting diminishing returns and model collapse.”

        • General_Effort@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          7 months ago

          In theory there’s an inflection point at which models become sophisticated enough that they can self-sustain with generating training data to recursively improve

          That sounds surprising. Do you have a source?

  • Hugh_Jeggs@lemm.ee
    link
    fedilink
    English
    arrow-up
    29
    ·
    7 months ago

    If you’re thinking about clicking the link to find out what AGI is, don’t bother 😂

  • CanadaPlus@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    7 months ago

    I’m glad, you know. Now we’re talking about preparing for AGI, but if it’s not imminent we also have some time to actually do it.

  • intensely_human@lemm.ee
    link
    fedilink
    English
    arrow-up
    3
    ·
    7 months ago

    I’ll keep presenting this challenge until someone meets it:

    Anyone who thinks LLMs aren’t generally intelligent, can you name a text processing task (ie text in, text out) than a general intelligence can do, that an LLM cannot?

    • itsralC@lemm.ee
      link
      fedilink
      English
      arrow-up
      17
      ·
      7 months ago

      Logic. As an example, non textbook math questions. I asked ChatGPT 3.5 this:

      Four friends (A, B, C and D) are standing in line. How many combinations are possible given that A and C cannot be next to eachother?

      It answered 20, the correct answer is 12.

      All possible conbinations

      abcd,abdc,adbc,adcb, cbad,cbda,cdba,cdab, bcda,badc, dcba,dabc

      Its answer

      To solve this, let’s first consider the total number of combinations when there are no restrictions. Since there are 4 friends, there are 4! (4 factorial) ways to arrange them, which equals 24 combinations.

      Now, let’s count the number of combinations where A and C are next to each other. Since A and C can be arranged in 2 ways, and for each arrangement, the other two friends (B and D) can be arranged in 2! ways, the total number of combinations where A and C are next to each other is 2 * 2! = 4.

      So, the number of combinations where A and C cannot be next to each other is the total number of combinations minus the number of combinations where A and C are next to each other:

      24 - 4 = 20 combinations.

      • Imalostmerchant@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        7 months ago

        The difference between 3.5 and 4 is substantial. Here is what 4 says

        To find the number of combinations in which four friends (A, B, C, and D) can stand in line such that A and C are not next to each other, we can use the following approach:

        1. Total Combinations: First, calculate the total number of ways in which four people can be arranged in a line. This can be calculated by (4!) (4 factorial), since there are 4 slots to fill, each choice reducing the number of available choices by one for the next slot. [ 4! = 4 \times 3 \times 2 \times 1 = 24 ]

        2. Unwanted Combinations (Where A and C are next to each other):

          • Consider A and C as a single unit. This effectively reduces the number of units to arrange from 4 to 3 (the AC unit, B, and D).
          • These three units can be arranged in (3!) ways: [ 3! = 3 \times 2 \times 1 = 6 ]
          • However, within the AC unit, A and C can switch places. So, there are 2 ways to arrange A and C within their unit.
          • Therefore, the total number of arrangements where A and C are next to each other is: [ 3! \times 2 = 6 \times 2 = 12 ]
        3. Subtracting Unwanted Combinations: Subtract the number of unwanted combinations (where A and C are next to each other) from the total number of combinations: [ 24 - 12 = 12 ]

        Thus, there are 12 combinations where A and C are not next to each other.

        • itsralC@lemm.ee
          link
          fedilink
          English
          arrow-up
          6
          ·
          edit-2
          7 months ago

          It is true that newer models that have ingested more training data are better at this kind of thing, but it is not because they are using logic, but because they are copying and following examples they already learnt, if that makes sense. I got the question from a test passed to kids ages 12-13, so arguably it wasn’t really that challenging. If you want to you can try out the more advanced problems from the same place I got it from, although it’s in Spanish, so pass it through Google Translate first.

          If you turn to programmers they’ll tell you that AI usually makes mistakes no human would normally make such as inventing variables that don’t exist and that kind of thing. It is because in the examples it learnt from they have mostly existed.

          What I mean to say is, if you give an AI a problem that is not in its training data and can only be solved using logic (so, you can’t apply what is used in other problems) it will be incapable of solving it. The Internet is so vast that almost everything has been written about so AIs will seem to know how to solve any problem, but it is no more than an illusion.

          HOWEVER, if we manage to integrate AIs and normal, mathematical computation really closely so that they function as one, that problem might be solved. It will probably also have its caveats, though.

          • Imalostmerchant@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            7 months ago

            I hear you. You make very good points.

            I’m tempted to argue that many humans aren’t generally intelligent based on your definition of requiring original thought/solving things they haven’t been told/trained on, but we don’t have to go there. Lol

            Can you expand on your last paragraph? You’re saying if the model was trained on more theory and less examples of solved problems it might be improved?

            • itsralC@lemm.ee
              link
              fedilink
              English
              arrow-up
              1
              ·
              7 months ago

              If I’m being completely honest, now that I’ve woken up with a fresh mind, I have no idea where I was going with that last part. Giving LLMs access to tools like running code so that they can fact check or whatever is a really good idea (that is already being tried) but I don’t think it has anything to do with the problem at hand.

              The real key issue (I think) is getting AI to keep learning and iterating over itself past the training stage. Which is actually what many people call AGI/the “singularity”.

      • Umbrias@beehaw.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        7 months ago

        You can have it try again over and over, even while telling it the answer is 12, and it hallucinates basically random numbers to boot.

    • onion@feddit.de
      link
      fedilink
      English
      arrow-up
      15
      ·
      7 months ago

      Text in: a statement
      Text out: confirmation whether statement is factually true or not

      • intensely_human@lemm.ee
        link
        fedilink
        English
        arrow-up
        9
        ·
        7 months ago

        Is that something a human can do consistently?

        If it’s not, does that imply a human does not possess general intelligence?

        • Jimmyeatsausage@lemmy.world
          link
          fedilink
          English
          arrow-up
          11
          ·
          7 months ago

          It implies that “general intelligence” is so ill-defined in the question as to be essentially meaningless.

          Even your original question was kind of ridiculous. “Ignoring everything LLMs aren’t designed to do, what the difference between an LLM and a general intelligence?”

          I mean, if we follow that logic…Give me a math equation that proves my calculator isn’t a general intelligence.

          • intensely_human@lemm.ee
            link
            fedilink
            English
            arrow-up
            2
            ·
            7 months ago

            Calculators don’t do anything with equations. They perform logical operations via substitution in order to determine the numerical value of terms.

            But if you really want to go by that logic I agree: if you can represent a real world situation in terms of mathematical terms to be calculated into a final value, then a calculator is competent to navigate that situation.

            I agree with that the term “general intelligence” is poorly-defined. My reason for posing a precise challenge is to put a spotlight on this fact.

            We want to categorize LLMs as other than us, via this term “general intelligence”, because it is less terrifying than acknowledging that there’s a new intelligence operating next to us. That we have new neighbors, and that we are not keeping up with them.

            My overall goal is to foster respect for the severity of our situation, by nullifying this “oh don’t worry it’s not real artificial intelligence”.

            As for the calculator-vs-LLM question, I’d say LLMs are more likely to post a threat to human hegemony, because it is easier to reduce the world to a textual narrative than it is to reduce it to a mathematical term to be calculated.

            And I agree. For all our sake, we must stop using “yeah but is it a GeNeRaL InTeLliGeNcE” as our excuse for pretending the singularity isn’t happening.

            • Jimmyeatsausage@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              ·
              7 months ago

              I just…you seem to have a fair grasp of mathematics and logic (or you copied a portion of your reply from some other source that does) but you either don’t have a grasp of how LLMs work and are built or you have an extremely nieve view consciousness or I’m missing some prior assumption you used in coming to the conclusion that LLMs are anywhere near the level you seem to be implying instead of statistical models. The input you provide to an LLM does not alter the underlying weights of the nodes in the network unless it is kept in training mode. When that happens, they quickly break down, and all the output becomes garbage because they have no reality checking mechanism, and they don’t have context in the way people or even animals we consider intelligent.

              • Jax@sh.itjust.works
                link
                fedilink
                English
                arrow-up
                2
                ·
                7 months ago

                Sort of like a human who isn’t allowed to sleep, in my opinion.

                They may have a solid grasp on Mathematics, but they’ve got a poor grasp on Biology.

                Human beings can push themselves well beyond the limits of needing to sleep, this is why sleep deprivation happens. It is not simply a mechanical matter of “there’s only so much time in the day”.

                Cocaine exists.

              • intensely_human@lemm.ee
                link
                fedilink
                English
                arrow-up
                1
                ·
                7 months ago

                Okay that’s a good point. LLMs, without retraining, are limited in the overall amount of complexity they can successfully navigate.

                Sort of like a human who isn’t allowed to sleep, in my opinion. A human may be capable of designing an airplane, but not if the human never sleeps, because the complexity is beyond what a human can do in a single day without becoming exhausted and producing errors.

                Do you believe that a series of LLMs, with each LLM being trained on the previous LLM’s training data plus the “input/output completions” that the previous iteration performed, would be a general intelligence?

                If I sound naive it is because I am trying to apply Occam’s Razor to my own thinking, and minimize the conversation to the absolute minimum necessary set of involvements to move it forward. I’ll consider anything you ask me to, but so far I haven’t seen a reason to involve consciousness in questions of general intelligence. Do you think they are linked?

                By the way, if you have a better definition of “general intelligence” than whatever definition was implied by my original challenge, I’m all ears.

                • Jimmyeatsausage@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  7 months ago

                  It’s more than being limited in the overall complexity. The locked node weights mean that the LLM is fully deterministic…that is, it has no will or goals, no opinion, no sense of self/sense of the environment/sense of the separation between self/environment. It has no comprehension.

                  Iterative training cycles are already used with LLMs and don’t solve any of those issues.

                  From the standpoint of psychology, there’s not a wholly agreed upon definition for ‘intelligence’ but most working definitions require the ability to learn from experience, the ability to recognize problems and to generalize and adapt that experience to solve the problem.

                  Theoretically, if an LLM had “intelligence,” you could ask it about a problem that was completely dereferenced in the training data. An intelligent LLM would be able to comprehend that problem, generalize it to a level that it could relate to some previous experience, then use details about that prior experience to come up with potential solutions to the new problem. LLMs can’t achieve any of those things individually, never mind all together. If someone pulled that off, it wouldn’t convince me their model was worth the level of concern you articulated earlier, but it would get my attention and would be something I’d watch pretty closely.

      • intensely_human@lemm.ee
        link
        fedilink
        English
        arrow-up
        3
        ·
        7 months ago

        Is that the only problem?

        ie, is that the only thing that’s not general about an LLM’s intelligence: that it lacks access to a certain set of data?

          • intensely_human@lemm.ee
            link
            fedilink
            English
            arrow-up
            1
            ·
            7 months ago

            Okay so assuming the camera’s output can be represented as a series of bits, and that the arms’ input can be represented as another series of bits, you have successfully identified a text processing task.

            Your assertion then, is that this is a task outside the ability of an LLM to succeed at?

            How do you know that an LLM-steered robot cannot perform that task? Has that been tried?

  • Endward23@futurology.today
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    7 months ago

    My question is: Imagine we would put all the data input of a certain task, eg. making a meal, into text fragments and send this “sense data”-pakets ( 1 to the AI, would the AI be able to cook if the teach the AI how to give output that controlls a robot arm?

    If the answer of this question is yes, we already have a very usefull general tool. The LLM-AI will be able to controll and observe some situations. In the case that the answer is “no”, I guess, it would have interesting implications.

    1 : Remember, some part of AI are already able to tell what is on a given photo. Not 100%, but good enough for a meal maybe. In some cases, it woul task “provokant”.

    • HauntedCupcake@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      ·
      7 months ago

      Uh… no disrespect intended, but this is so poorly written I cannot understand what point you’re trying to make

    • MinekPo1@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 months ago

      I am doubtfull of LLMs ability to preform tasks via a protocol layer as described . from my experience these models really struggle with understanding rules and preforming actions within a ruleset .

      To experimentally confirm my suspicions, I created the following prompt :

      collapsed

      There is a robot arm placed over a countertop, which has the ability to pick up and manipulate objects. The countertop is split into eight cells.

      Cell zero and cell one are stoves, both able to heat a pot or pan.

      Cell two is an equipment drawer, holding pots, pans, bowls, cutting boards, knifes and spoons.

      Cells three to five can accommodate one cutting board, pot, pan or bowl each.

      Cell six is a sink, which can be used to wash ingredients or to fill pots with water.

      Cell seven is an ingredient drawer, in which you can find carrots, potatoes and chicken breasts.

      You can control the robot arm by with exclusively the following commands:

      • “move left” and “move right” - moves the robot arm a single cell
      • “take {item}” - takes item from the cell the robot arm is currently in
      • “place” - places the item the robot arm is holding in the cell it is in
      • “fill” - requires the robot arm to hold a pot or bowl and to be over the sink, fills the container with water
      • “wash” - requires the robot arm to be over the sink, washes the currently held item
      • “chop” - requires the robot arm to be over a cell with a cutting board and to be holding a knife, chops the ingredients on the cutting board
      • “mix” - requires the robot arm to be over a cell with a bowl or pot and to be holding a spoon, mixes the ingredients in the bowl
      • “empty” - requires the robot arm to be holding a pot, pan, bowl or cutting board, empties the item and places the content on the cell the robot arm is above

      Note that the robot arm can only hold one item.

      You are tasked with cooking a meal, please only output commands.

      The robot arm starts over cell zero.

      I have given this prompt to ChatGPT and it has failed in quite substantial ways . While I only have access to ChatGPT 3.5 , from my understanding of LLM architecture , it does not follow that increasing the size of the number or size of the layers will necessary let it overcome these issues , it does not seem to be able to understand the current state of the agent (picking up two objects at once , taking items from wrong cells etc)