Hi there, If I’m looking to use LLM AI in a similar way like Stable Diffusion, i.e. running it on my own PC using pre-trained models (checkpoints?) - where would I start?

If I would want to have access to it on my mobile devices - is this a possibility?

If I would then later want to create workflows using these AI tools - say use the LLM to generate prompts and automatically run them on Stable Diffusion - is this a possibility?

I’m consistently frustrated with ChatGPT seemingly not beeing able to remember a chat history past a certain point. Would a self-run model be better in that regard (i.e. will I be able to reference somethin in a chat thread that happened 2 weeks ago?)

Are there tools that would allow cross-thread referencing?

I have no expert knowledge whatsoever, but I don’t shy away from spending hours learning new staff. Will I be able to take steps working towards my own personal AI assistant? Or would this be way out of scope for a hobbyist?

  • Prophet@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    Depends on your hardware and how far you’re willing to go. For serious development I think you need at least 12-16 GB of VRAM, but there’s still some things you can do with ~8. If you just have a cpu, you can still test some models but generation will be slow.

    I’d recommend trying out the oogabooga webui. This should work with quite a few models on hugging face. Hopefully I don’t get in trouble for recommending a subreddit but r/localllama has a lot of other great resources and us a very active community. They’re doing exactly what you want.

    As far as your other questions…

    1. Accessing it on your phone is going to be tricky. You would most likely want to host it somewhere but I’m not sure how easy that is for someone without a bit of software background. Maybe there is a good service for this, huggingface might offer something.

    2. Cross thread referencing is an interesting idea. I think you would need to create a log store of all your conversations and then embed those into a a vector store (like milvus or weaviate or qdrant). This is a little tricky since you have to decide how to chunk your conversations, but it is doable. The next step is somewhat open ended. You could always query your vector store with any questions that you are already sending your model, and then pass any hits to the model along with your original question. Alternatively, you could tell the model to check for other conversations and trigger a function call to do this on command. A good starting point might be this example, which makes references to a hardware manual in a Q&A style chatbot.

    3. Using an LLM with stable diffusion: not especially sure what you are hoping to get out of this. Maybe to reduce boilerplate prompt writing? But yes you can finetune a model to handle this and then have the model execute a function that calls stable diffusion and returns the results. I am pretty sure langchain provides a framework for this. Langchain is almost certainly a tool you will want to become familiar with.