IBM watsonx.ai
LiteLLM supports all IBM watsonx.ai foundational models and embeddings.
Environment Variablesโ
os.environ["WATSONX_URL"] = "" # (required) Base URL of your WatsonX instance
# (required) either one of the following:
os.environ["WATSONX_APIKEY"] = "" # IBM cloud API key
os.environ["WATSONX_TOKEN"] = "" # IAM auth token
# optional - can also be passed as params to completion() or embedding()
os.environ["WATSONX_PROJECT_ID"] = "" # Project ID of your WatsonX instance
os.environ["WATSONX_DEPLOYMENT_SPACE_ID"] = "" # ID of your deployment space to use deployed models
os.environ["WATSONX_ZENAPIKEY"] = "" # Zen API key (use for long-term api token)
See here for more information on how to get an access token to authenticate to watsonx.ai.
Usageโ
import os
from litellm import completion
os.environ["WATSONX_URL"] = ""
os.environ["WATSONX_APIKEY"] = ""
## Call WATSONX `/text/chat` endpoint - supports function calling
response = completion(
model="watsonx/meta-llama/llama-3-1-8b-instruct",
messages=[{ "content": "what is your favorite colour?","role": "user"}],
project_id="<my-project-id>" # or pass with os.environ["WATSONX_PROJECT_ID"]
)
## Call WATSONX `/text/generation` endpoint - not all models support /chat route.
response = completion(
model="watsonx/ibm/granite-13b-chat-v2",
messages=[{ "content": "what is your favorite colour?","role": "user"}],
project_id="<my-project-id>"
)
Usage - Streamingโ
import os
from litellm import completion
os.environ["WATSONX_URL"] = ""
os.environ["WATSONX_APIKEY"] = ""
os.environ["WATSONX_PROJECT_ID"] = ""
response = completion(
model="watsonx/meta-llama/llama-3-1-8b-instruct",
messages=[{ "content": "what is your favorite colour?","role": "user"}],
stream=True
)
for chunk in response:
print(chunk)
Example Streaming Output Chunkโ
{
"choices": [
{
"finish_reason": null,
"index": 0,
"delta": {
"content": "I don't have a favorite color, but I do like the color blue. What's your favorite color?"
}
}
],
"created": null,
"model": "watsonx/ibm/granite-13b-chat-v2",
"usage": {
"prompt_tokens": null,
"completion_tokens": null,
"total_tokens": null
}
}
Usage - Models in deployment spacesโ
Models that have been deployed to a deployment space (e.g.: tuned models) can be called using the deployment/<deployment_id> format (where <deployment_id> is the ID of the deployed model in your deployment space).
The ID of your deployment space must also be set in the environment variable WATSONX_DEPLOYMENT_SPACE_ID or passed to the function as space_id=<deployment_space_id>.
import litellm
response = litellm.completion(
model="watsonx/deployment/<deployment_id>",
messages=[{"content": "Hello, how are you?", "role": "user"}],
space_id="<deployment_space_id>"
)
Usage - Embeddingsโ
LiteLLM also supports making requests to IBM watsonx.ai embedding models. The credential needed for this is the same as for completion.
from litellm import embedding
response = embedding(
model="watsonx/ibm/slate-30m-english-rtrvr",
input=["What is the capital of France?"],
project_id="<my-project-id>"
)
print(response)
# EmbeddingResponse(model='ibm/slate-30m-english-rtrvr', data=[{'object': 'embedding', 'index': 0, 'embedding': [-0.037463713, -0.02141933, -0.02851813, 0.015519324, ..., -0.0021367231, -0.01704561, -0.001425816, 0.0035238306]}], object='list', usage=Usage(prompt_tokens=8, total_tokens=8))
OpenAI Proxy Usageโ
Here's how to call IBM watsonx.ai with the LiteLLM Proxy Server
1. Save keys in your environmentโ
export WATSONX_URL=""
export WATSONX_APIKEY=""
export WATSONX_PROJECT_ID=""
2. Start the proxyโ
- CLI
- config.yaml
$ litellm --model watsonx/meta-llama/llama-3-8b-instruct
# Server running on http://0.0.0.0:4000
model_list:
- model_name: llama-3-8b
litellm_params:
# all params accepted by litellm.completion()
model: watsonx/meta-llama/llama-3-8b-instruct
api_key: "os.environ/WATSONX_API_KEY" # does os.getenv("WATSONX_API_KEY")
3. Test itโ
- Curl Request
- OpenAI v1.0.0+
- Langchain
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "llama-3-8b",
"messages": [
{
"role": "user",
"content": "what is your favorite colour?"
}
]
}
'
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="llama-3-8b", messages=[
{
"role": "user",
"content": "what is your favorite colour?"
}
])
print(response)
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000", # set openai_api_base to the LiteLLM Proxy
model = "llama-3-8b",
temperature=0.1
)
messages = [
SystemMessage(
content="You are a helpful assistant that im using to make a test request to."
),
HumanMessage(
content="test from litellm. tell me why it's amazing in 1 sentence"
),
]
response = chat(messages)
print(response)
Authenticationโ
Passing credentials as parametersโ
You can also pass the credentials as parameters to the completion and embedding functions.
import os
from litellm import completion
response = completion(
model="watsonx/ibm/granite-13b-chat-v2",
messages=[{ "content": "What is your favorite color?","role": "user"}],
url="",
api_key="",
project_id=""
)
Using Zen API Keyโ
You can use a Zen API key for long-term authentication instead of generating IAM tokens. Pass it either as an environment variable or as a parameter:
import os
from litellm import completion
# Option 1: Set as environment variable
os.environ["WATSONX_ZENAPIKEY"] = "your-zen-api-key"
response = completion(
model="watsonx/ibm/granite-13b-chat-v2",
messages=[{"content": "What is your favorite color?", "role": "user"}],
project_id="your-project-id"
)
# Option 2: Pass as parameter
response = completion(
model="watsonx/ibm/granite-13b-chat-v2",
messages=[{"content": "What is your favorite color?", "role": "user"}],
zen_api_key="your-zen-api-key",
project_id="your-project-id"
)
Using with LiteLLM Proxy via OpenAI client:
import openai
client = openai.OpenAI(
api_key="sk-1234", # LiteLLM proxy key
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="watsonx/ibm/granite-3-3-8b-instruct",
messages=[{"role": "user", "content": "What is your favorite color?"}],
max_tokens=2048,
extra_body={
"project_id": "your-project-id",
"zen_api_key": "your-zen-api-key"
}
)
See IBM documentation for more information on generating Zen API keys.
Supported IBM watsonx.ai Modelsโ
Here are some examples of models available in IBM watsonx.ai that you can use with LiteLLM:
| Mode Name | Command |
|---|---|
| Flan T5 XXL | completion(model=watsonx/google/flan-t5-xxl, messages=messages) |
| Flan Ul2 | completion(model=watsonx/google/flan-ul2, messages=messages) |
| Mt0 XXL | completion(model=watsonx/bigscience/mt0-xxl, messages=messages) |
| Gpt Neox | completion(model=watsonx/eleutherai/gpt-neox-20b, messages=messages) |
| Mpt 7B Instruct2 | completion(model=watsonx/ibm/mpt-7b-instruct2, messages=messages) |
| Starcoder | completion(model=watsonx/bigcode/starcoder, messages=messages) |
| Llama 2 70B Chat | completion(model=watsonx/meta-llama/llama-2-70b-chat, messages=messages) |
| Llama 2 13B Chat | completion(model=watsonx/meta-llama/llama-2-13b-chat, messages=messages) |
| Granite 13B Instruct | completion(model=watsonx/ibm/granite-13b-instruct-v1, messages=messages) |
| Granite 13B Chat | completion(model=watsonx/ibm/granite-13b-chat-v1, messages=messages) |
| Flan T5 XL | completion(model=watsonx/google/flan-t5-xl, messages=messages) |
| Granite 13B Chat V2 | completion(model=watsonx/ibm/granite-13b-chat-v2, messages=messages) |
| Granite 13B Instruct V2 | completion(model=watsonx/ibm/granite-13b-instruct-v2, messages=messages) |
| Elyza Japanese Llama 2 7B Instruct | completion(model=watsonx/elyza/elyza-japanese-llama-2-7b-instruct, messages=messages) |
| Mixtral 8X7B Instruct V01 Q | completion(model=watsonx/ibm-mistralai/mixtral-8x7b-instruct-v01-q, messages=messages) |
For a list of all available models in watsonx.ai, see here.
Supported IBM watsonx.ai Embedding Modelsโ
| Model Name | Function Call |
|---|---|
| Slate 30m | embedding(model="watsonx/ibm/slate-30m-english-rtrvr", input=input) |
| Slate 125m | embedding(model="watsonx/ibm/slate-125m-english-rtrvr", input=input) |
For a list of all available embedding models in watsonx.ai, see here.