The Potential of ChatGPT-4o

by Joe Lonergan

Recently you might have heard that ChatGPT has upped its game again and added a new form of Open AI called ChatGPT-4o.

This version of ChatGPT has a capability called multimodality. Multimodality allows a model to process and provide output in the form of text, audio, or visual feedback. If you were impressed by the version of Open AI integrated into the Be My Eyes app called Be My AI, then you will be blown away by what’s coming soon in the form of Open AI’s new flagship model ChatGPT-4o.

It is touted that the developers of Be My Eyes are going to introduce this version of ChatGPT into their Be My AI feature and Andy Lane of Be My Eyes said it is going to be life-changing. That is a big statement, but if it is as good as the promotional videos lead us to believe, I agree with him. Andy can be seen in the video pointing his phone camera at scenes and objects and having a natural conversation with the AI. The AI feature replies to him in a pleasant and smiling voice describing the action around him in great detail and personality. He can be seen doing some sightseeing around London and at the end he uses ChatGPT-4o to flag down a taxi. Naturally, we expect this to make it onto smart glasses in the future as it would work great as a hands-free operation. We look forward to trying this feature when it is released.

Follow Andy’s adventures around London with OpenAI’s new GPT-4o model, as tested by Be My Eyes:

Open AI announced the ChatGPT 4 Omni on Monday 13th May, a day before the Google I/O event. Open AI probably did this to get one up on their rivals, like how do you follow a feature like that? On stage they demonstrated two ChatGPT-4o models chatting to each other, also they showed how ChatGPT-4o can now do instant live translation and speak back to you in a natural-sounding voice, this is going to be so handy. OpenAI also loves mentioning virtual assistance for blind users as a use case, they are so proud of this as a practical use for their product.

The future of virtual assistance is going to be amazing.

Google I/O Developer Conference

The day after the OpenAI event it was Google’s turn to announce the next generation of developments coming to Google Gemini AI. Google announced that later this year an upgrade to its AI model called Gemini Nano will have full multimodal capabilities, meaning owners of the new Pixel 8 phone will be able to understand more information in a context beyond text – like sight, sounds, and spoken language.

For now, Gemini Nano will be integrated into the new Pixel 8 phones. The great thing is it will not need an internet connection to do most tasks. For example, TalkBack users will now get descriptions of details in a photo sent by family or friends, or a description of products when shopping online, such as the style and look of clothes and because Gemini Nano runs on the device, these descriptions will happen quickly, and will work without a network connection. If you didn’t know already, TalkBack is Android’s screen reader and is getting better all the time. The Pixel is starting to sound like a go-to option for more and more blind and low-vision users, especially if momentum continues in the accessibility space.

Google Lookout is getting a new find mode feature

Google Lookout is an app from Google that uses an Android phone’s camera to help blind and low-vision users recognise objects, text, currency, images, labels, and food.

The new update called ‘Find mode’ which is rolling out in beta, is capable of identifying objects in the users’ surroundings. Users will be able to select from a list of seven categories of items such as seating, tables, bathrooms, etc – and as they move their camera around the room, Lookout will notify them of the direction and distance to the item. What a cool feature. We now have a great list of apps in this sphere for object finding, text recognition, and more with some great channels and features being added all the time. AI is developing at a frightening pace, but that is a positive thing when it comes to assistive technology.

Do you use any of these apps for example Seeing AI, Be My Eyes, or Google Lookout? If so, tell us your experience and which one you like the best with a voice note to the talking technology podcast on WhatsApp 086 199 0011.

Sign Up For Our Technology Newsletter

*By clicking submit you are consenting to receive information from Vision Ireland

Please Subscribe to our Talking Technology Podcast