Skip to content

Part 2: Hands-on Experience with Gemini – Unlocking Multimodal Magic

Google just released its latest AI model: Gemini, opening a whole new range of creative possibilities with AI for businesses. What’s changing for AI with this new model? And what’s the value organisations can get out of it? We interviewed Jason Quek, Devoteam G Cloud’s Global CTO, and Tristan Van Thielen, Devoteam G Cloud’s Machine Learning Tribe Lead to get you up to speed.

Google Cloud

Initial Impressions

The second interview with Jason Quek, CTO, and Tristan Van Thielen, Head of Machine Learning, provides a fascinating insight into their hands-on experience with Gemini, Google’s latest AI language model. Initial impressions highlight the seamless integration of Gemini with the Google Cloud platform, leveraging familiar tools like service accounts and single sign-on. It is very beneficial to have the entire Google Cloud experience integrated into Gemini.

I feel that it is a really good experience because it’s something that I’m used to. Using single sign-on with my Google Workspace logins. So it’s really integrated the whole Google Cloud experience.

Real-World Applications

In a noteworthy use case, is the application of Gemini in retail intelligence. The scenario involves capturing images of SIM card plans from various retailers distributed across numerous stalls. Traditionally, this process required sending individuals to take pictures of plans, such as those offering five gigabytes for 50 Euros, with different pricing plans available. The conventional method involved using Optical Character Recognition (OCR) technology, followed by the integration with custom software to match prices with specific plans. However, with Gemini, a transformative shift occurred. The team experimented with Gemini’s capabilities and discovered its proficiency in understanding context. This meant that Gemini could discern the placement of prices in relation to products, providing a comprehensive response in a single iteration.

A noteworthy enhancement emerged in the seamless integration of image and text data. This stands out as a considerable advancement, particularly in contrast to the complexities associated with video processing in traditional methods. Previously, describing an image necessitated the involvement of a model like Imogen, followed by the translation of text through a separate model. However, Gemini streamlines this process by consolidating image and text data into a single prompt, efficiently managed by a high-quality machine-learning model. Gemini’s impact on combining image and text data, citing examples like virtual trials in retail and potential applications in manufacturing, where it can streamline processes and save time for technicians.

The ease with which I can combine image and Text data that’s really a huge difference. I just took a picture of my dog and asked it, Hey, what would be good food for this dog? And it knew it was a German shepherd and it knew what kind of thing should take into account.

Comparing Gemini to Its Competitors

A notable aspect of Gemini is its superiority over competitors, such as ChatGPT, in handling multimodal inputs. Gemini’s capability to process both text and images in a single prompt is a feature lacking in some of its counterparts.

Gemini’s low inference time for image and video data is a significant advantage that enhances its usability in real-time conversations.

The inference time for Gemini is really low, especially looking at video data image data. That’s something that would have been hard to use in a conversation before but actually, for Gemini, it’s very quick and usable

Addressing Challenges and Potential Limitations

The challenge associated with AI models is the importance of responsible use, which emphasises the need for filtering prompts and adopting a rule-based approach as an initial step to ensure ethical usage.

There should be a filter for prompts fed into Gemini, initially adopting a rule-based approach before trusting AI-based decisions blindly.

Google’s commitment to responsible AI, mentioning the responsible AI filter that flags potentially offensive or discriminatory output is great. The challenges in constraining models and prompts effectively.

In a retail context, deploying a chatbot introduces a challenge when users divert from the intended purpose. For instance, inquiries about unrelated topics incur a cost for the business. To address this, establishing clear boundaries and defining the chatbot’s specific task becomes crucial. Setting constraints ensures user interactions align with the intended scope, preventing misuse and keeping the chatbot focused on the retailer’s objectives. This management is essential for optimising user engagement and maintaining the chatbot’s effectiveness.

With Gemini, it might become a little harder because you also have that multimodal image input for example, which you’ll have to constrain.

Future Outlook and Predictions

Anticipating the future of AI language models, particularly Gemini, there is a shared sense of optimism expressed by experts in the field. The flexibility in deployment options, including edge deployment and customisable model sizes, stands out as a key feature, hinting at a future where the emphasis is on utility and customisation rather than sheer model size. In addition, the concept of LLMOps, introduced by one of the experts, brings attention to the operational aspects of large language models. This includes advancements in managing prompts and knowledge bases and implementing ongoing monitoring for quality improvement. The vision painted by these insights suggests a dynamic future for AI language models, emphasising practicality, customisation, and efficient operational management.

What is interesting in my opinion is they are able to deploy these models not only in the cloud but on the edge. This also gives the ability to deploy in more places without Wi-Fi places that need to have additional privacy considerations.

Positive Impact on Society and Businesses

The intriguing possibilities, particularly in the realm of accessibility understanding. Noting current challenges for individuals with visual impairments, Gemini’s role in converting image content to text potentially enhances aspects of life like  Sensory Processing Disorder (SPD). 

How can you use Google Assistant like those Google Home devices to help with Assisted Living. For example, if someone has a heart attack – Someone can call an ambulance for you through the AI or measure the severity of the case.

The essence of time-saving in professions where efficiency is paramount. There are benefits for medical professionals, first responders, and public servants, foreseeing streamlined access to vital information. The cumulative impact of these efficiency gains, predicts a collective liberation of time globally. 

We will be able to focus on a lot of other things and move forward on a bunch of different fronts, there are more opportunities open for us to grow as a society. I think that place for me the biggest excitement.

Your future with AI? Experience the possibilities with a Gen AI Hackathon.