Highlights:
- The company is also introducing a new generation of its watsonx Code Assistant for application development and modernization.
- IBM stated that the new Granite models are intended to serve as enterprise “core tools” for tasks like retrieval-augmented generation (RAG), classification, summarization, agent training, entity extraction, and tool utilization.
Recently, strengthening its commitment to establishing a unique presence in enterprise artificial intelligence, IBM Corp. has launched a range of new Granite models and tools aimed at promoting their responsible use.
The company is also introducing a new generation of its watsonx Code Assistant designed for application development and modernization. These new features are being integrated into a multimodel platform for use by the company’s 160,000 consultants.
The new Granite 3.0 8B and 2B models are available in “Instruct” and “Guardian” variants, which are intended for training and risk/harm detection, respectively. Both models will be offered under an Apache 2.0 license, which Rob Thomas, IBM’s senior vice president of software and chief commercial officer, described as “the most permissive license for enterprises and partners to create value on top.” This open-source license enables deployment of the models for as little as USD 100 per server and includes intellectual property indemnification to instill confidence in enterprise customers when integrating their data with IBM’s models.
Thomas said, “We’ve gone from a world of ‘plus AI,’ where clients were running their business and adding AI on top of it, to a notion of AI first, which is companies building their business model based on AI.”
Thomas mentioned that IBM plans to spearhead the use of AI for information technology automation through both organic development and its acquisitions, including pending deals with infrastructure-focused companies like Turbonomic Inc., Apptio Inc., and HashiCorp Inc.
“The book of business that we have built on generative AI is now $2 billion-plus across technology and consulting. I’m not sure we’ve ever had a business that has scaled at this pace,” Thomas said.
The Instruct versions of Granite, designed for training, are available in 8 billion- and 2 billion-parameter configurations. They were trained on over 12 trillion tokens of data in 12 languages and 116 programming languages, enabling them to perform tasks such as coding, documentation, and translation.
By the end of the year, IBM announced its intention to expand the foundational models to a context length of 128,000 tokens with multimodality. This enhancement aims to improve the model’s capability to process much longer input sequences and manage multiple data types simultaneously. Context length refers to the number of tokens—such as words, symbols, or other units of input data—that the AI model can process and retain. Most typical models have context lengths ranging from 1,000 to 8,000 tokens.
Enterprise Workhorses
IBM stated that the new Granite models are designed as enterprise “workhorses” for tasks such as retrieval-augmented generation (RAG), classification, summarization, agent training, entity extraction, and tool utilization. They can be trained with enterprise data to provide task-specific performance comparable to much larger models, at a cost up to 60 times lower. Internal benchmarks demonstrated that the Granite 8B model outperformed similar models from Google LLC and Mistral AI SAS while matching the performance of comparable models from Meta Platforms Inc.
An accompanying technical report and responsible use guide offer comprehensive documentation of the training datasets used for the models, along with details on the filtering, cleansing, and curation processes that were implemented, as well as comparative benchmark data.
An updated version of the pre-trained Granite models that IBM launched earlier this year has been trained on three times more data and offers enhanced modeling flexibility, including support for external variables and rolling forecasts.
The Granite Guardian 3.0 models are designed to enhance safety by evaluating user prompts and model responses for various risks. “You can concatenate both on the input before you make the inference query and the output to prevent the core model from jailbreaks and to prevent violence, profanity, et cetera. We’ve done everything possible to make it as safe as possible,” said Dario Gil, Senior Vice President and Director of Research at IBM.
Jailbreaks refer to malicious efforts to circumvent the restrictions or safety measures implemented on an AI system, enabling it to function in unintended or potentially harmful manners. Guardian also conducts RAG-specific evaluations, including context relevance, answer relevance, and “groundedness,” which measures the degree to which the model is informed by and connected to real-world data, facts, or context.
AI on the Edge
A series of smaller models known as Granite Accelerators and Mixture of Experts are designed for low-latency and CPU-only applications. MoE is a machine learning architecture that integrates several specialized models, dynamically selecting and activating only a subset of them to improve efficiency.
Gil said, “Accelerator allows you to implement speculative decoding so you can achieve twice the throughput of the core model with no loss of quality.” The MoE model can be trained on 10 trillion tokens but utilizes only 800 million during inference, optimizing efficiency for edge applications.
The Instruct and Guardian versions of the Granite 8B and 2B models are now available for commercial use on IBM’s watsonx platform. Additionally, select Granite 3.0 models will be accessible on partner platforms such as Nvidia Corp.’s NIM stack and Google’s Vertex. The complete Granite 3.0 model suite, along with the updated Time Series models, can be downloaded from HuggingFace Inc.’s open-source platform and Red Hat Enterprise Linux.
The new Granite 3.0-based watsonx Code Assistant offers support for C, C , Go, Java, and Python, along with enhanced application modernization features for enterprise Java Applications. According to IBM, the assistant has delivered a 90% increase in the speed of code documentation for specific tasks within its software development operations. The coding features can be accessed via a Visual Studio Code extension named IBM Granite.Code.
Enhanced Agents
New tools for developers feature agentic frameworks, integrations with existing environments, and low-code automation for common use cases like retrieval-augmented generation (RAG) and agents.
As agentic AI—systems capable of autonomous behavior or decision-making—emerges as the next major wave in AI development, IBM announced that it is equipping its consulting division with a multimodal agentic platform. The new Consulting Advantage for Cloud Transformation and Management, along with Consulting Advantage for Business Operations, will feature domain-specific AI agents, applications, and methods trained on IBM’s intellectual property and best practices. These tools will enable consultants to apply AI and cloud solutions more effectively in their clients’ projects.
Approximately 80,000 IBM consultants are currently utilizing Consulting Advantage, with the majority deploying only one or two agents at a time, according to Mohamad Ali, Senior Vice President and Head of IBM Consulting. As usage expands, IBM Consulting will need to support more than 1.5 million agents, making Granite’s cost-efficiency “absolutely essential because we will continue to scale this platform and we needed to be very cost-efficient,” he said.