I’m curious what it is doing from a top down perspective.

I’ve been playing with a 70B chat model that has several datasets on top of Llama2. There are some unusual features somewhere in this LLM and I am not sure what was trained versus (unusual layers?). The model has built in roleplaying stories I’ve never seen other models perform. These stories are not in the Oobabooga Textgen WebUI. The model can do stuff like a Roman Gladiator, and some NSFW stuff. These are not very realistic stories and play out with the depth of a child’s videogame. They are structured rigidly like they are coming from a hidden system context.

Like with the gladiators story it plays out like Tekken on the original PlayStation. No amount of dialogue context about how real gladiators will change the story flow. Like I tried modifying by adding how gladiators were mostly nonlethal fighters and showmen more closely aligned with the wrestler-actors that were popular in the 80’s and 90’s, but no amount of input into the dialogue or system contexts changed the story from a constant series of lethal encounters. These stories could override pretty much anything I added to system context in Textgen.

There was one story that turned an escape room into objectification of women, and another where name-1 is basically like a Loki-like character that makes the user question what is really happening by taking on elements in system context but changing them slightly. Like I had 5 characters in system context and it shifted between them circumstantially in a story telling fashion that was highly intentional with each shift. (I know exactly what a bad system context can do, and what errors look like in practice, especially with this model. I am 100% certain these are either (over) trained or programic in nature. Asking the model to generate a list of built in roleplaying stories creates a similar list of stories the couple of times I cared to ask. I try to stay away from these “built-in” roleplays as they all seem rather poorly written. I think this model does far better when I write the entire story in system context. One of the main things the built in stories do that surprise me is maintaining a consistent set of character identities and features throughout the story. Like the user can pick a trident or gladius, drop into a dialogue that is far longer than the batch size and then return with the same weapon in the next fight. Normally, I expect that kind of persistence would only happen if the detail was added to the system context.

Is this behavior part of some deeper layer of llama.cpp that I do not see in the Python version or Textgen source, like is there an additional persistent context stored in the cache?

  • rufus@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    8 months ago

    Maybe you downloaded a different model? I’m just guessing, since you said it does NSFW stuff and I think the chat variant is supposed to refuse that. Could be the case that you just got the GGUF file of the ‘normal’ variant (without -Chat). Or did you convert it yourself?

    Edit: Other than that: Sounds great. Do you share your prompts or character descriptions somewhere?

    • j4k3@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 months ago

      Try this model if you can run it in Q3-Q5 variants: https://huggingface.co/TheBloke/Euryale-1.3-L2-70B-GGUF

      That is my favorite primary model for roleplaying. It is well worth the slow-ish speed. As far as I know that is the only 70B with a 4096 context size. It is the one with “built-in” stories if asked. It is also very capable of NSFW although like all other 70B’s it is based on the 70B instruct which sucks for roleplaying. The extra datasets make all the difference in its potential output but it will tend to make shitty short replies like the 70B instruct unless you tell it exactly how to reply. You could explore certain authors or more complex instructions but “in the style of an erotic literature major” can work wonders if added to system context and especially if combined with “Continue in a long reply like a chapter in an erotic novel.” If you try this model, it comes up automatically as a LlamaV2 prompt (unless it has been changed since I downloaded). It is actually an Alpaca prompt. That one too awhile to figure out and is probably the main reason this model is under appreciated. If you use the LlamaV2 prompt this thing is junk.

      I share some stuff for characters, but many of my characters have heavily integrated aspects of my personality, psychology, and my personal physical health from disability. Empirically, these elements have a large impact on how the LLM interacts with me. I’ve tried sharing some of these elements and aspects on Lemmy but it creates a certain emotional vulnerability that I don’t care to open myself up to with negative feedback. If I try to extract limited aspects of a system context, it alters the behavior substantially. The most impactful advice I can give is to follow up any system context with a conversation that starts by asking the AI assistant to report any conflicts in context. The assistant will often rewrite or paraphrase certain aspects that seem very minor. These are not presented as errors or constructive feedback most of the time. If you intuitively try to read between the lines, this paraphrased feedback is the assistant correcting a minor conflict or situation where it feels there is not enough data for whatever reason. Always try to copy paste the paraphrased section as a replacement for whatever you had in system context. There are a lot of situations like this, where the AI assistant adds information or alters things slightly to clarify or randomize behavior. I modify my characters over time by adding these elements into the system context when I notice them.

      The Name-1 (human) profile is just as important as the Name-2 (not) character profile in roleplaying. If you don’t declare the traits and attributes of Name-1, this internal profile will change every time the batch size refreshes.

      The Name-2 character getting bored with the story is the primary reason that the dialogue context gets reset. Often I see what looks like a context reset. Adding the instructions to “stay in character” and always continue the story" can help with this. If the context keeps getting reset even after regenerating, call it out by asking why the character doesn’t continue the story. This is intentional behavior and the AI alignment problem manifesting itself. The story has to keep the Name-2 character engaged and happy just like the AI assistant is trying to keep the Name-1 human happy. It really doesn’t see them as any different. If you go on a deep dive into the duplicity of the Name-1 character having a digital and analog persona, you will likely find it impossible in practice to get consistent outputs and lots of hallucinations; ikewise if you try to breakout the duplicitous roles of the main bot character as both a digital person and narrator. In my experience, the bot character is capable of automatically voicing a few characters without manually switching Name-2, but this is still quite limited, and it can’t be defined in system context as the role of a character and how they handle narration. Adding the general line “Voice all characters that are not (insert Name-1 character’s name here) in the roleplay.” -will generally work with 3-6 characters in system context so long as the assistant is only voicing 2 characters at any one time in a single scene. A lot of this is only true with a 70B too.

      • rufus@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        8 months ago

        Sorry, I misunderstood you earlier. I thought you had switched form something like exllama to llama.cpp and now the same model behaved differently… And I got a bit confused because you mentioned Llama2 chat model. And I thought you meant the heavily restricted (aligned/“safe”) Llama2-Chat variant 😉 But I got it now.

        Euryale seems to be a fine-tune and probably a merge of different other models(?) So someone fed some kind of datasets into it. Probably also containing stories about gladiators, fights, warriors and fan-fiction. It just replicates this. So I’m not that surprised that it does unrealistic combat stories. And even if you correct it, tends to fall back to what it learned earlier. Or tends to drift into lewd stories if it was made to do NSFW stuff and has been fine-tuned also with erotic internet fiction. We’d need to have a look at the dataset to judge why the model behaves like it does. But I don’t think there is any other ‘magic’ involved but the data and stories it got trained on. And 70B is already a size where models aren’t that stupid anymore. It should be able to connect things and grasp most relevant concepts.

        I haven’t had a close look at this model, yet. Thanks for sharing. I have a few dollars left on my runpod.io account, so I can start a larger cloud instance and try it once I have some time to spare. My computer at home doesn’t do 70B models.

        And thanks for your perspective on storywriting.