Released about a year ago, OpenAI’s GPT-4o has been refined and improved with new features. The latest is Image Generation – the AI model can generate high-quality, detailed images and can follow your natural language instructions to modify them until you get just the image you were picturing in your head.
You know how older AI models struggled with text – if you ask them to generate a sign, at best, you get a sign with gibberish words, at worst, you get squiggles that aren’t even letters. But check this out:
GPT-4o can create images with perfectly legible text
Image generation typically starts with entering a text prompt, then you refine the image by refining the original prompt. GPT-4o works differently – you ask it for an image, then tell it what to change, then ask it to change more things and so on until you get your result. Here are some examples:
Generating and modifying an image through plain English
You can follow the Source link below to examine the prompts that created these images. Note that OpenAI did some cherry picking – a lot of the images are “best of 2” or even “best of 8”, so the model needed a few tries to get it right. Still, the results look quite impressive and the UI is as simple as it gets.
Here is another example. GPT-4o can start from scratch or it can modify an image you give it. Here, the user gives it a photo of a cat and asks the AI to give it a detective hat and monocle. Then the user proceeds to refine the image, turning it into something that can be a screenshot from an RPG.
Prototyping a cat detective RPG
You can start with multiple images too and integrate elements from each image into the final result. OpenAI says that GPT-4o is great at following detailed instructions – it can manipulate 10-20 different objects in a scene without getting tripped up (other AI models can only handle 5-8 objects, says the company).
GPT-4o is not perfect and OpenAI is the first to admit it. Sometimes, it crops images off at the bottom, hallucinations are still an issue, working with more than 10-20 objects can be tricky, rendering text with non-Latin characters needs work too and more.
Examples of GPT-4o getting it wrong
Finally, here are some video demonstrations showing off GPT-4o’s new image generation skills: