Google Bard recently gained the ability to watch YouTube videos and then answer questions about the video. I asked it to watch a video from a maker who doesn’t share the recipes directly in the description (though he links to it), Joshua Wiseman, specifically the Popeyes Chicken Sandwich But Better video. I then asked Bard to give the recipe, which it did, ingredients and steps! I double checked it and it was perfect, including the optional mushroom powder.
I then dropped in a url of a recipe with the ingredients in volume and asked it to covert it into grams, and finally gave it simply text of a recipe and asked it to do the same thing. It did both okay, with errors coming from the websites it crawled for the conversions.
Insane and revolutionary, especially the video transcription. Try it for yourself and let me know your experience.
It’s kind of “watching” as it views visual information (don’t want to make this a semantics discussion), though at the moment it does a pretty poor job with watching as I haven’t been able to get it to answer a simple question correctly, such as what color shirt is the host wearing. It tries to answer, though. The exciting part is this will be the worst it’ll ever get.
This is Bard’s response to my query about how it “views” YouTube videos:
“I am able to process and understand the information from YouTube videos in two ways:
Transcript analysis: I can access and process the transcripts of YouTube videos, which are text versions of the spoken audio in the videos. This allows me to understand the content of the videos, even if I cannot directly see the visual elements.
Limited visual processing: I have some limited ability to process visual information from YouTube videos. I can identify basic objects and scenes, and I can track the movement of objects in the videos. However, my ability to process visual information is not as sophisticated as that of a human, and I may not be able to understand all of the visual information in a video.
Overall, I am able to understand YouTube videos through a combination of transcript analysis and limited visual processing. This allows me to provide helpful and informative responses to questions about YouTube videos, even if I cannot directly see the videos myself.”
I wouldn’t trust an AI to explain how itself works. Also there’s no way it could respond in a reasonable amount of time if it was analyzing every frame of a video in real time.
I don’t trust most humans either, but here we are, having discussions, exchanging ideas.
I don’t automatically trust that the system knows exactly how it works, but it seemed to know what it was talking about. Or, at the very least, a response to my question was preprogrammed, as it seems to be a major feature, and there’s bound to be many people asking about it.