• FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    5 months ago

    Reddit is actually extremely good for AI. It’s a vast trove of examples of people talking to each other.

    When it comes to factual data then there are better sources, sure, but factual data has never been the key deficiency of AI. We’ve long had search engines for that kind of thing. What AIs had trouble with was human interaction, which is what Reddit and Facebook are all about. These datasets train the AI to be able to communicate.

    If the Fediverse was larger we’d be a significant source of AI training material too. Would be surprised if it’s not being collected already.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        1
        ·
        5 months ago

        The “glue on pizza” thing wasn’t a result of the AI’s training, the AI was working fine. It was the search result that gave it a goofy answer to summarize.

        The problem here is that it seems people don’t really understand what goes into training an LLM or how the training data is used.