Google Unveils Latest AI Image Generation Tech 'Imagen 4' - Expands Developer Access

Google has officially launched its highest-performing text-to-image generation model, “Imagen 4.” This release is considered a significant enhancement in the quality of text-based image generation technology. It is particularly noteworthy for its noticeably improved ability to express text within images.

Through this announcement, Google has made the Imagen 4 model accessible via the Gemini API and Google AI Studio for user-friendly usage. Notably, the introduction of the ‘Imagen 4 Fast’ model, which emphasizes fast processing speed and low cost, further enhances practicality.

The “Imagen 4” series is divided into three models to suit different purposes and needs. The most eye-catching, ‘Imagen 4 Fast,’ is a model designed for quick image production. The cost to generate a single image is $0.02 (approximately 27 won), making it suitable for high-speed processing and bulk operations.

The basic model, ‘Imagen 4,’ has greatly improved text expressiveness compared to its predecessor. This is advantageous for situations where important phrases need to be accurately visualized, such as in advertising images or informational contexts. The use cost of this model is known to be $0.04 per image.

The highest version, ‘Imagen 4 Ultra,’ offers top-tier detail and precision in text reflection. It targets users in professional design, artwork, and marketing fields with complex visual requirements.

Google unveils next-generation image generation model 'Imagen 4' (photo=@Google AI Developers X account) — Google unveils next-generation image generation model ‘Imagen 4’ (photo=@Google AI Developers X account)

Imagen 4 and Imagen 4 Ultra support image generation up to 2K resolution. Typically, ‘2K resolution’ refers to high-quality images with a horizontal resolution of about 2,000 pixels, suitable for web content as well as print materials. Google explained that high resolution enables detailed expression necessary for marketing materials or artistic compositions.

Compared to previous models, the quality of text embedded in images has also significantly improved. For example, if you input a sentence like “draw a happy family going on a picnic,” the generated image will more accurately reflect the characters in the sentence. Previously, the meaning of the sentence was only roughly incorporated, or text appeared awkwardly within the image.

Google has applied a watermarking technology called ‘SynthID’ to this Imagen 4 model series. This technology marks the generated image with a hidden identifier to easily confirm that it is AI-created, while making it unnoticeable to the naked eye. As concerns over the misuse of AI images grow, the ability to verify who made it and how it was generated is becoming increasingly important. Google stated that it is strengthening AI ethical standards through this SynthID technology.

Breath-taking beautiful landscape of dawn unfolding mountain ranges (provided by Google Blog)

Currently, the Imagen 4 series is available via Google’s Gemini API and Google AI Studio. Developers can easily integrate the model using the documentation and ‘cookbook (example usage by function)’ provided by Google. As a result, any developer from startups to enterprises can develop services and apps utilizing Imagen 4.

Google stated that the release of Imagen 4 aims not just as a new product announcement but as a step towards democratizing advanced AI image generation technology. The technology of converting text into images is rapidly being used across various industries such as education, media, marketing, and e-commerce. By making it easily accessible to more developers, Google aims to strengthen market competitiveness while establishing a culture of responsible technology use.

Imagen 4 demonstrates the advancement of AI technology capable of creating professional-level images from textual input alone. By offering three models that simultaneously meet image quality, generation speed, cost, and ethical standards, Google encompasses a diverse user base. It is expected to be a practical tool for developers needing mass content production or rapid prototyping in the field.