Latest Features Added in Azure OpenAI

Latest Features Added in Azure OpenAI

Azure OpenAI is constantly evolving, with new features and models being released regularly. This article highlights some of the latest additions to the service that enhance its capabilities and user experience. ?

New Models

GPT-4o Model Updates

  • GPT-4o and GPT-4o mini: The GPT-4o series models, launched in 2024, are multimodal and support text, images, and audio . GPT-4o mini is a smaller, faster version of GPT-4o, ideal for cost-effective, lightweight applications . ?
  • GPT-4o-audio-preview: Released in December 2024, this model is the latest in audio completions . ?
  • gpt-4o-realtime: Also released in December 2024, this model enables the development of fast speech-to-speech experiences . ?

o3-mini

The o3-mini model, announced in January 2025 , builds upon the foundation of the o1 model, offering enhanced efficiency, cost-effectiveness, and reasoning capabilities. Key features of o3-mini include: ?

  • Reasoning effort parameter: Allows users to fine-tune the model's cognitive load with low, medium, and high reasoning levels, providing greater control over the response and latency. This parameter is also available in the o1 models . This fine-grained control allows developers to optimize the balance between response speed and the depth of reasoning applied by the model. ?
  • Structured outputs: Supports JSON Schema constraints, enabling the generation of well-defined, structured outputs for automated workflows . This feature streamlines the integration of o3-mini into applications that require structured data, such as databases or APIs. ?
  • Functions and Tools support: Seamlessly integrates with functions and external tools, facilitating AI-powered automation. This allows developers to extend the capabilities of o3-mini by connecting it to external services and data sources. ?
  • Developer messages: Introduces the "role": "developer" attribute, replacing the system message in previous models and offering more flexible and structured instruction handling . This provides a more intuitive way to guide the model's behavior and responses. ?

GPT-4 Turbo with Vision

The preview version of GPT-4 Turbo with Vision was released in November 2023. The GA version (turbo-2024-04-09) was released in April 2024. This model accepts both text and image input, supports enhancements , JSON Mode , parallel function calling, and reproducible output (preview) . However, it's important to note that Azure AI-specific Vision enhancements, such as Optical Character Recognition (OCR) and object grounding, are not supported for the turbo-2024-04-09 version. ?

Realtime API

OpenAI introduced the Realtime API in November 2024, enabling developers to build fast speech-to-speech experiences into their applications using a WebSockets interface . This API offers a WebRTC connection method and supports features like the "reasoning_effort" parameter for o1 models. ?

Enhanced Features

Data Zones

Introduced in October 2024, Data Zones offer optimized global routing for both pay-as-you-go and provisioned throughput units (PTUs). This feature allows for deployment to specific regions like the US and EU, ensuring better throughput, reduced latency, and adherence to data sovereignty requirements. Data Zones provide a significant advantage for businesses operating in multiple regions or those with strict data residency requirements. ?

Provisioned Throughput Units (PTUs)

Enhancements to the Azure OpenAI Service Provisioned offering were announced in October 2024 . These enhancements include: ?

  • Reduced costs: Lowered the cost of provisioned throughput units, making them more accessible for smaller applications. This makes it easier for businesses of all sizes to leverage the power of dedicated capacity for their AI workloads. ?
  • Lowered minimums: Decreased the minimum PTU requirements, further increasing accessibility. This allows businesses to start with a smaller capacity and scale up as needed, optimizing their resource utilization. ?
  • Latency SLA: Introduced a 99% latency service level agreement (SLA) for token generations in provisioned throughput. This provides businesses with a guaranteed level of performance, ensuring a consistent and reliable experience for their users. ?

Fine-tuning

Fine-tuning capabilities have been expanded to include GPT-4, GPT-4o, and o1 series models. This allows for tailoring models to specific business needs. Additionally, fine-tuning billing is now based on the number of tokens in the training file, making it more cost-effective and easier to estimate costs. This shift in billing simplifies cost management and makes fine-tuning a more accessible option for businesses looking to customize models. ?

Furthermore, OpenAI launched a model distillation platform in November 2024 . This platform allows developers to fine-tune cost-efficient models using the outputs from larger, more powerful frontier models. This enables businesses to achieve a balance between performance and cost-effectiveness by leveraging the knowledge of larger models while deploying smaller, more efficient models for specific tasks. ?

Global Batch

Azure OpenAI global batch is now generally available. This feature is designed to handle large-scale and high-volume processing tasks efficiently, offering a 50% cost reduction compared to global standard with a 24-hour turnaround target. The Batch API also supports embeddings models, allowing for efficient processing of large embedding workloads . This makes it an ideal solution for businesses with large datasets or those requiring high-throughput processing of AI tasks. ?

Prompt Caching

Prompt caching reuses previous tokenization for prompts with similar initial characters, reducing overall compute and improving efficiency. This optimization technique can significantly speed up response times, especially for applications with repetitive or similar prompts. ?

Abuse Monitoring

New forms of abuse monitoring leverage LLMs to improve the efficiency of detecting potentially abusive use of the Azure OpenAI service. This feature enables abuse monitoring without human review of prompts and completions. This automated approach helps ensure responsible use of the service while minimizing the need for manual intervention. ?

Model Evaluation and Monitoring

OpenAI introduced the Evals feature in November 2024, allowing developers to create and run custom evaluations to measure model performance on specific tasks. This feature provides valuable insights into the strengths and weaknesses of different models, enabling developers to choose the best model for their specific needs and optimize their AI applications for better accuracy and efficiency. ?

Azure OpenAI Studio Updates

Generate in Playground

The "Generate in playground" feature, released in November 2024, allows developers to easily generate prompts, function definitions, and structured output schemas in the playground using the Generate button. This simplifies the process of experimenting with different prompts and configurations, making it easier to explore the capabilities of the models and fine-tune their behavior. ?

Logic Apps and Function Calling

Azure OpenAI Studio now allows developers to experiment with Logic Apps and Function Calling. Developers can import their REST APIs implemented in Logic Apps as functions, and the studio invokes the function (as a Logic Apps workflow) automatically based on the user prompt. This integration streamlines the process of connecting Azure OpenAI models to external services and data sources, enabling the development of more complex and dynamic AI applications. ?

User Interface Updates

As of September 19, 2024, the legacy Azure OpenAI Studio was replaced with an updated user interface. This new interface provides a more modern and intuitive experience for developers, making it easier to navigate the studio and access its various features. ?

Assistants API Updates

The Assistants API has received several updates, including:

  • File search results and ranking behavior: Developers can now include file search results used by the file search tool and customize ranking behavior . This provides more control over how the assistant interacts with external data sources and presents information to users. ?
  • Filtering messages by run_id: Developers can filter messages by run_id , allowing for more efficient management and analysis of conversation histories. ?
  • Integration with AutoGen: Azure OpenAI assistants are now integrated into AutoGen, a multi-agent conversation framework developed by Microsoft Research . This integration enables the creation of complex LLM workflows across a wide range of applications, allowing multiple Azure OpenAI assistants to collaborate and tackle complex tasks. ?
  • "tool_choice" parameter: This parameter allows developers to force the Assistant to use a specified tool, providing more control over the assistant's actions and ensuring that the desired tool is used for a specific task. ?
  • Support for temperature, top_p, and response_format parameters: These parameters provide more control over the assistant's response generation, allowing developers to fine-tune the randomness, diversity, and format of the output. ?
  • Streaming and polling support: Developers can now stream responses and use polling SDK helpers to receive object status updates without continuous polling. This improves the efficiency and responsiveness of applications that interact with the Assistants API. ?
  • Max completion and max prompt token support: This feature allows developers to manage token usage by setting limits on the number of tokens used for completions and prompts. This helps control costs and optimize resource utilization. ?
  • "max_num_results" parameter: This parameter allows developers to specify the maximum number of results that the file search tool should output . This helps manage the amount of information presented to the user and improves the efficiency of the file search process. ?

Embeddings API Updates

The Embeddings API has been updated with two new parameters:

  • "encoding_format" parameter: This parameter allows developers to specify the format to generate embeddings in, either "float" or "base64". This provides more flexibility in how embeddings are generated and used within applications. ?
  • "dimensions" parameter: This parameter allows developers to set the number of output embeddings. This is particularly useful for managing costs and performance, as larger embeddings require more compute, memory, and storage resources. ?

Chat Completions API Updates

Several updates have been made to the Chat Completions API, including:

  • "Predicted Outputs" feature: This feature allows developers to provide predicted text content to optimize operation efficiency for scenarios like code completion. This can significantly speed up response times for applications that can anticipate or predict the model's output. ?
  • "max_completion_tokens" parameter: This parameter supports o1-preview and o1-mini models by allowing developers to set a limit on the number of output tokens. This is essential for managing costs and ensuring that responses stay within desired length limits. ?
  • "parallel_tool_calls" parameter: This parameter enables parallel execution of tool calls, potentially improving the efficiency and responsiveness of applications that utilize multiple tools. ?
  • "completion_tokens_details" & "reasoning_tokens" parameters: These parameters provide more detailed information about token usage, including a breakdown of tokens used for reasoning. This helps developers understand how the model utilizes tokens and optimize their applications for better cost-efficiency. ?
  • "stream_options" & "include_usage" parameters: These parameters provide more control over streaming and usage tracking, allowing developers to fine-tune the behavior of the API and gather more detailed information about resource consumption. ?
  • Structured outputs support: The API now supports structured outputs, allowing developers to define the format of the model's response using JSON Schema. This improves the consistency and usability of responses, especially for applications that require structured data. ?
  • "timestamp_granularities" parameter: This parameter provides more control over the granularity of timestamps in audio responses. This can be useful for applications that require precise timing information or those that need to align audio with other data sources. ?
  • "audioWord" object: This object provides detailed information about individual words in audio responses, including their start and end times, confidence scores, and text content. This can be valuable for applications that perform speech analysis, transcription, or other audio processing tasks. ?

Other Updates

  • Admin API Key Rotations: This feature, launched in December 2024, enables customers to programmatically rotate their admin API keys. This enhances security by allowing for regular key updates and minimizing the risk of unauthorized access. ?
  • Usage API: Also launched in December 2024, this API enables customers to programmatically query activities and spending across OpenAI APIs. This provides a convenient way to track usage, monitor costs, and optimize resource allocation. ?
  • Large file upload API: This API allows developers to upload large files in multiple parts. This is essential for applications that work with large datasets or those that need to process large files efficiently. ?
  • Customer-managed key (CMK) support for Assistants: Provides enhanced security and control over data by allowing customers to manage their encryption keys . This is crucial for businesses with strict security and compliance requirements. ?
  • "user_security_context" for Microsoft Defender for Cloud integration: This parameter allows for integration with Microsoft Defender for Cloud, providing enhanced security monitoring and threat protection. This helps businesses protect their AI applications from potential security risks. ?
  • Additional TTS response_formats: wav & pcm: These formats provide more flexibility in how text-to-speech outputs are generated and used within applications. This can be useful for applications that require specific audio formats or those that need to integrate with different audio processing systems. ?

Azure OpenAI On Your Data updates

Azure OpenAI On Your Data has received several updates, including:

  • Full VPN and private endpoint support: This ensures secure access to data and models, meeting the needs of businesses with strict security and compliance requirements . ?
  • Elasticsearch vector database connection: Developers can now connect to an Elasticsearch vector database to be used with Azure OpenAI On Your Data . This expands the options for storing and retrieving data, providing more flexibility in how data is managed and used within AI applications. ?
  • Chunk size parameter for data ingestion: This parameter allows developers to set the maximum number of tokens of any given chunk of data in their index . This helps optimize data storage and retrieval, improving the efficiency of data processing. ?
  • On your data changes: These include Mongo DB integration, removal of the "role_information" parameter, addition of "rerank_score" to the citation object, removal of AML datasource, and AI Search vectorization integration improvements. These updates provide more flexibility and control over how data is integrated and used within Azure OpenAI. ?
  • Vector store chunking strategy parameters: These parameters provide more control over how data is chunked and stored in the vector database, allowing developers to optimize storage and retrieval efficiency. ?

Conclusion

Azure OpenAI continues to evolve with new models and features that enhance its capabilities and user experience. These updates provide users with more flexibility, efficiency, and control over their AI solutions. By staying informed about the latest additions, users can leverage the full potential of Azure OpenAI to drive innovation and achieve their business goals.

The recent updates to Azure OpenAI demonstrate Microsoft's commitment to providing a comprehensive and secure platform for developing and deploying AI solutions. The focus on responsible AI, with features like abuse monitoring and content safety, ensures that the service is used ethically and responsibly. The expansion of fine-tuning capabilities and the introduction of new models like o3-mini and GPT-4o provide developers with more options to tailor AI solutions to their specific needs. The enhancements to PTUs and the availability of Global Batch make Azure OpenAI more accessible and cost-effective for businesses of all sizes.

Overall, Azure OpenAI is well-positioned to be a leader in the AI space, offering a powerful combination of cutting-edge models, enterprise-grade features, and a commitment to responsible AI development. By continuing to innovate and expand its capabilities, Azure OpenAI empowers businesses to unlock the full potential of AI and transform their operations.

Robert Lienhard

Lead Global SAP Talent Attraction??Servant Leadership & Emotional Intelligence Advocate??Passionate about the human-centric approach in AI & Industry 5.0??Convinced Humanist & Libertarian??

1 个月

Kunal, a compelling and well-articulated viewpoint. The way you present your reflections makes this topic even more engaging. It’s refreshing to see such clarity and depth. Appreciate your thoughtful insights.

回复

要查看或添加评论,请登录

Kunal Sethi的更多文章

社区洞察

其他会员也浏览了