AI / Machine Learning / Big DataDecember 3, 2024December 3, 2024

Amazon Unveils Nova: A New Era in Multimodal AI Models

Amazon Web Services (AWS) has introduced a groundbreaking suite of multimodal AI models, dubbed Amazon Nova, at its annual re:Invent conference. This new family of state-of-the-art foundation models (FMs) is designed to deliver frontier intelligence and industry-leading price performance, exclusively available in Amazon Bedrock.

Amazon Nova encompasses a range of models tailored for various generative AI tasks, from document analysis and video understanding to creative content generation. The suite includes both understanding models and creative content generation models, each optimized for different enterprise workloads.

The Amazon Nova understanding models are designed to accept text, image, or video inputs and generate text outputs. The suite currently includes three models, with a fourth in development:

Amazon Nova Micro: A text-only model optimized for speed and cost, ideal for tasks such as text summarization, translation, and content classification.
Amazon Nova Lite: A low-cost multimodal model capable of processing image, video, and text inputs, suitable for real-time customer interactions and document analysis.
Amazon Nova Pro: A highly capable multimodal model balancing accuracy, speed, and cost, excelling in tasks like financial document analysis and code processing.
Amazon Nova Premier: The most advanced model, still in training, aimed at complex reasoning tasks and custom model distillation, expected to be available in early 2025.

These models support fine-tuning and customization, allowing enterprises to tailor them to specific industry terminology and use cases. For instance, a legal firm could customize Amazon Nova to better understand legal documents and terminology.

Amazon Nova also introduces two creative content generation models:

Amazon Nova Canvas: An advanced image generation model producing studio-quality images with precise control over style and content.
Amazon Nova Reel: A state-of-the-art video generation model capable of creating short videos from text prompts and images, with advanced camera control features.

Both models include built-in safety controls and watermarking capabilities to promote responsible AI use.

Amazon Nova Pro has demonstrated its capabilities in document and video analysis. For example, it can summarize lengthy documents and create decision trees, as well as analyze videos to describe their content and extract specific information.

Amazon Nova Reel allows for the creation of videos from text prompts and reference images, with features like camera zoom and panning. The model supports asynchronous invocation, enabling users to check the status of video generation tasks.

Amazon has emphasized the importance of responsible AI use. All Nova models include comprehensive safety features and content moderation capabilities. The models are built with protections against misinformation, child sexual abuse material (CSAM), and chemical, biological, radiological, or nuclear (CBRN) risks.

Amazon Nova models are available in Amazon Bedrock in the US East (N. Virginia) AWS region, with additional regions supported via cross-Region inference. Pricing follows a pay-as-you-go model, and the models support over 200 languages, making them suitable for global applications.

Amazon plans to expand the Nova family in 2025 with a speech-to-speech model and an any-to-any modality model, capable of processing and generating text, images, audio, and video. These developments aim to enhance the capabilities of AI assistants and content editors.