Amidst the frenzy that is the generative AI market, major players are fiercely vying for the shiniest product. For its part, Google, traditionally a more measured participant in this race, unveiled a teaser video for their Gemini large language model this week. However, things took a controversial turn when reports revealed the video was not actually a real time representation of the AI in action.
In the demo video released by Google, the showcased AI model shows its multimodal capabilities, demonstrating an ability to deftly decipher and handle information gleaned from live video and audio. It’s a formidable achievement for Google, particularly in the fierce arena of competition against the likes of OpenAI, where it has lagged behind. However, as reported by Bloomberg, the showcased demo was crafted by “using still image frames from the footage, and prompting via text,” rather than the real-time and vocal and video processing it seemed to achieve.
On stage at Fortune‘s Brainstorm AI conference in San Francisco on Monday, vice president and general manager of Google Assistant and Bard Sissie Hsiao spoke about the contentious demo video, focusing on the benchmarks Gemini reached as a model, and how it’ll propel Google’s chatbot Bard.
“The video is completely real. All the prompts and the model responses are real,” Hsiao said. “We did shorten parts for brevity, which we put in the video as information on making the video,” she noted.
The demo video displays the new AI model’s multimodal capabilities, identifying a squiggly line, then the curves of new lines, culminating in the creation of the drawing of a duck. Throughout this process, the model consistently recognizes each element, offering duck-related facts and answers in real-time.
Hsiao highlighted the milestones conquered by Gemini, showcasing its abilities in benchmarks that put AI models to the test, spanning high school physics, professional legal quandaries, and moral scenarios. According to the Verge, Gemini Ultra beat OpenAI’s GPT-4 in 30 out of 32 benchmarks—an achievement worth boasting about, although Gemini Ultra will not be released until next year. For now, Bard uses the less advanced Gemini Pro, which is roughly akin to GPT 3.5.
Hsiao said these Gemini models will continue to improve Google search as well as the Google Bard chatbot, which she said is “the most preferred free chat bot now in the market.”