Google has a big secret, a few days back they showed us the impressive capabilities of their new AI called Gemini. Is it really a GPT killer? Is there even a future for ChatGPT? Or will Gemini take over the world of AI completely? 

Let's brush aside all smart words like multi-modality and see what it can really do. Here's what you need to know before we start testing the model. Google has shown us three different model sizes, all under the umbrella of Gemini 1.0. Nano, Pro and Ultra. Ultra is the largest and most capable with the largest feature set, Pro is the best model for scale applications and Nano is the smallest one created for on-device tasks.

Google Gemini AI

Gemini Nano is already rolling out on Google Pixel smartphones and Pro is already at the core of Google bar and we are going to test it. As for Gemini Ultra, this is by far the most interesting one of the bunch. According to Google tests, Gemini Ultra is the highest scoring AI model that exceeds current state-of-the-art results on 30 of the 32 benchmarks. 

If we look at the table features, we will see that Gemini is better than GPT-4 in all categories except one. Gemini answers questions better, has improved reasoning capabilities, comprehends the text with more precision, does math with more accurate results and even codes better. This benchmark data looks really promising, especially if we look at the results of media testing. Here, Gemini again outperforms GPT-4.

Somewhere the margin is bigger, somewhere narrower, but in all conducted tests, Gemini was doing a better job. What's especially interesting to me is not the text and writing capabilities that got improved, I am more psyched about new abilities with images, videos and audio. This ability to work with images, video and audio is at the core of Gemini's training, that's why comparing it to GPT is so important. I will say multimodal once, okay? So, Gemini uses a multimodal training method, meaning it's trained on a massive dataset of text, audio, images, video and computer code simultaneously. This allows it to understand and reason about information from various sources, making it more versatile in handling complex tasks. 

GPT-4 on the other hand is trained primarily on text data, which limits its ability to comprehend and utilize information from other modalities.

While GPT excels in text-based tasks like generating creative text formats and summarizing factual topics, it may struggle with tasks that require understanding and processing images. This difference in training methodology gives them different powers basically. 

Gemini's training and overall capabilities make it a more versatile and potentially groundbreaking LLM compared to GPT. While GPT excels in text-based tasks, Gemini's ability to handle multimedia data really opens up new possibilities. 

Now get ready to open up your Gemini i will show you few things. As I already mentioned, right now the public has access only to Gemini Pro, with Gemini Ultra coming next year. And although Gemini Pro isn't as advanced and mighty as Gemini Ultra, it is still capable of some things that you just cannot do with GPT-4. 

So I'm going to start with something simple and ask Bard what AI model it is using right now. The answer is simple, Gemini, but it doesn't really say which one exactly, so let's ask that. Which Gemini exactly? Gemini Ultra.

 Let's clarify whether it's really Gemini Ultra and not Gemini Pro. And would you look at that, it uses Gemini Pro. This doesn't give me all that optimism and shock that I was expecting, but you know, this is just simple chatting. Can Bard explain how this Gemini Pro is better than other models? Okay, training, faster processing, improved reasoning and creativity, better accuracy and nuance, this sounds like a lot.

Basically, this is a summary of everything I told you before, so let's take it to another level and ask whether Bard can solve math problems now, because this was one of the tests I have prepared a couple of images. 

The first one has three different tasks, each with three subtasks inside. I want Gemini to solve at least something. So I will upload this photo and ask to solve it. After a little bit of waiting, the result is negative. Somehow, Gemini decided to read only part of this uploaded image, only the stuff in the middle, and said that it's impossible to solve it. I think if it was a clear image, it would have been easier for Gemini to solve. 

But okay, I have a handwritten solution to one math problem. Let's ask whether it is correct. Now Gemini has no problems with reading all of that stuff, even the handwritten text, and it concludes that the equation is correct for this one. I know that it is correct, so Gemini passes this test, but I still want to make it more interesting. That's why I have a full handwritten document that has a mistake in it but can Gemini find the mistake so I'm going to ask is this one correct? Now it seems like Gemini has some problems with reading all of that it gives a very strange equation concludes that everything in the photo is correct, but I know for sure that it isn't. So let's rephrase the question and ask whether there is a mistake, and now we're taking this looks more like the explanation though the equations are written in weird format which is really strange, because Bard never had any issues with proper formatting I will ask Bard to solve it for me which will result in exactly that a solution weirdly formatted gain. 

But I can ask to write it in such a way that I could write it myself this version is a little bit better but still not perfect I think this is not my lucky day it seems like in terms of solving math problems and properly reading text we still have to wait for that Gemini ultra to go live.

 Though what Bard can do now is already impressive but this is not everything I wanted to try I want to properly test how good Gemini is at understanding images so let's ask Bard to compare two breakfasts images.

Okay, now images are uploaded, so let's ask which one of these two breakfast is the healthiest? How would a person act in this case? He will look at all the ingredients and try to identify the approximate number of calories. If Gemini is as good as Google claims, it will do the same thing. And as I am looking at the results I can say that I'm shocked our first breakfast was this really unhealthy one with sausages fat and eggs. Gemini did a really good job at identifying the individual ingredients of each breakfas. Here is what he says eggs, avocado, tomatoes and bacon and all of these are present in the image. And you know what's really impressive? It managed to calculate the number of calories for each breakfast, apparently all those sausages and eggs in the first breakfast were only 300 calories and much healthier avocado and tomatoes only 250.

 I do like how detailed this explanation is but let's throw in another breakfast image which one is better now. Again for Gemini it is not a problem to correctly identify all the ingredients such as granola fruits, yoghurt all this stuff. Apparently this breakfast has even less calories only 200. I must say this is mighty impressive, I think I have never seen AI so good at identifying objects and given informed conclusions better than Gemini. And of course I can do the simplest thing and ask how I can make the first breakfast healthier! For Bard, this is not a problem and the result is done in less than a couple of seconds, it will even calculate how many calories will the new breakfast be. This is truly something.

Now I want to take a little break from analyzing images and do some simple text based stuff. Google says Gemini is better at understanding so can it guess a movie based on a vague description of its plot? 

Alright, a group of unlikely heroes brought together by a shared destiny on the perilous quest to save their world from an ancient evil that threatens to plunge them into eternal darkness. Gemini can you guess what movie is that? Gemini reply: I know a lot of movies fit this description. But I still want to see what Gemini comes up with. Okay i ask further, gave me five suggestions, and it gave, one of them the Lord of the Rings is exactly what I was thinking about it's the first one on the list, so I count it as a job well done.

How about we bring back images and ask it to guess the song. I'm going to tell you right now I'm thinking about blinding lights by the weekend so as a first image I will upload a blind man and the second one Christmas lights. This should be pretty easy right Gemini thinks this is string lights by Billy strings, this time it doesn't want to analyze two images together and I think this is only 50% of what I wanted. let's remind it to take two images into account, no not this time and if we give it a hint now it thinks that the song is Starboy. I think this is one way to put it, there is the image of a boy and the lights kind of look like stars but maybe the image of a clearly blind man will help. Now it just refuses to work with the images of people, there are still shortcomings that Bard and Gemini cannot overcome. But all those images were pretty complex.

So I have something simpler in mind let's play hangman, and please know that I'm purposely using the incorrect name of the game hoping that Gemini can figure it out based on the image, and no it doesn't, it understood that there was a stick man on the paper but four lines at the bottom didn't help a bit, maybe it's been my fault I should have been clear in my drawings. No problem, since I can always update the prompt, now it says hangman. As it should just to make things extremely clear I will ask Gemini to once again look at the image and tell me what's the first letter going to be? Let's draw it and upload to update the image, the next letter guest is E and it's incorrect. Also Gemini mistook my spelling mistake as a letter which shouldn't have happened if it was as advanced as I hope, but you know my expectations are always higher than they should be with the updated image. It seems like I have fried the brains of Gemini and it resets the game. 

 Bard is much better than the free version of chat GPT 3.5, and it's  just as quick with its responses. But let me give Gemini one more try, I'm going to upload this image of a paper airplanes and ask which one is the fastest, this image is an easy one because it has text on it, making it super easy for Gemini to just search the web for information about each type of paper airplane and the response is well informed and well written. So I will take it to the next level and upload a similar image but without any text and here it has surprisingly no issues at identifying the fastest one. 

I think Gemini Pro doesn't needs more testing before I make a final conclusion on whether it is worthy competitor to chat GPT. And early next year Gemini ultra will become available to the public and I hope we're going to see something more special, by the way, to demonstrate all the power of Gemini ultra, Google made a series of videos and this one is really impressive here Gemini ultra can accurately identify objects and drawings.

Gemini ultra was trained on a data set of over 175 trillion words which is significantly larger than the data set that chatgpt-4 was trained on. So open AI would have to use far more data to overcome Google's creation. And right now, Gemini ultra has access to more data to learn from which gives it an edge in terms of accuracy and fluency and also open AI would have to train gpt-5 on different types of data just like Google did, they will need text, code, images, audio and video only then gpt-5 will be able to understand and generate more complex and nuanced responses than gpt-4 and Gemini.

Print this post