Skip to main content
Version: 1.0.7

LLM Gateway API

LLM Gateway API Documentation

Version: 1.0.7

Table of Contents

  1. Introduction
  2. Providers
  3. Additional Configuration
  4. Code Examples

Introduction

Access and build applications seamlessly with your preferred model from a comprehensive pool of LLMs available through the LLM Gateway, supporting a diverse range of capabilities, including text generation, multimodal capabilities, image generation, speech-to-text transcription, and embedding generation. Our platform ensures your applications are safeguarded with robust guardrails while enhancing prompt responses through advanced caching mechanisms. Simply generate an LM-API key via the Kadal platform, and you're ready to deploy. To accelerate your development, we also offer a curated collection of demo implementation scripts tailored for all supported models.

Architecture

llm_gateway_architecture

Providers

Azure OpenAI

ModelsContext LengthPrice (M Tokens)Active / DeprecatedModel Support
gpt-35-turbo16KInput: $1, Output: $2
Active
text
gpt-35-turbo-110616KInput: $1, Output: $2
Active
text
gpt-4128KInput: $10, Output: $30
Active
text
gpt-4-0125-preview128KInput: $10, Output: $30
Active
text
gpt-4o128KInput: $2.75, Output: $11
Active
text, image
gpt4-vision-preview128KInput: $10, Output: $30
Active
text, image
gpt-4-vision-preview128KInput: $10, Output: $30
Active
text, image
gpt-4o-mini128KInput: $0.165, Output: $0.66
Active
text, image
o1-preview128KInput: $16.50, Output: $66
Active
text, image
o1-mini128KInput: $3.30, Output: $13.20
Active
text, image

Vertex AI

ModelsContext LengthPrice (k Character)Active / DeprecatedModel Support
gemini-1.5-flashMax input tokens: 1,048,576 Max output tokens: 8,192Input: $0.00001875, Output: $0.000075
Active
text, image
gemini-1.5-proMax input tokens: 2,097,152 Max output tokens: 8,192Input: $0.0003125, Output: $0.00125
Active
text, image
gemini-1.0-proMax input tokens: 32,760 Max output tokens: 8,192Input: $0.000125, Output: $0.000375
Active
text, image

AWS Bedrock

ModelsContext LengthPrice (k Tokens)Active / DeprecatedModel Support
anthropic.claude-3-haiku-20240307-v1:0200kInput: $0.00025, Output: $0.00125
Active
text, image
anthropic.claude-3-sonnet-20240229-v1:0200kInput: $0.003, Output: $0.015
Active
text, image
anthropic.claude-3-5-sonnet-20240620-v1:0200kInput: $0.003, Output: $0.015
Active
text, image
anthropic.claude-3-opus-20240229-v1:0200kInput: $0.015, Output: $0.075
Active
text, image
meta.llama3-8b-instruct-v1:0128kInput: $0.0003, Output: $0.0003
Active
text
meta.llama3-70b-instruct-v1:0128kInput: $0.00265, Output: $0.0035
Active
text
meta.llama3-1-405b-instruct-v1:0128kInput: $0.0024, Output: $0.0024
Active
text
meta.llama3-1-70b-instruct-v1:0128kInput: $0.00072, Output: $0.00072
Active
text
meta.llama3-1-8b-instruct-v1:0128kInput: $0.00022, Output: $0.00022
Active
text
us.meta.llama3-2-90b-instruct-v1:0128kInput: $0.00072, Output: $0.00072
Active
text, image
us.meta.llama3-2-11b-instruct-v1:0128kInput: $0.00016, Output: $0.00016
Active
text, image
us.amazon.nova-micro-v1:0128kInput: $0.000035, Output: $0.00014
Active
text
us.amazon.nova-lite-v1:0300kInput: $0.00006, Output: $0.00024
Active
text, image, Docs, Video
us.amazon.nova-pro-v1:0300kInput: $0.0008, Output: $0.0032
Active
text, image, Docs, Video

Additional Configuration Options

Individual User/Group

Configuration Options

is_cache
  • Default: 1 (caching is enabled).
  • Set to 0: Disables caching for API responses, ensuring each request fetches fresh results.
use_guard
  • Default: 1 (guardrails are enabled for input or output by default).
  • Set to 0: Completely disables guardrails.
  • Set to 2: Explicitly enables both input and output guardrails, providing maximum safeguards.
app_name
  • A string specifying the application context.
  • Include this only when the application is DSPY.
Additional settings
lm_key = 'YOUR_LM_KEY'
userdata = {
"is_cache": 0,
"use_guard": 2,
"app_name": "DSPY",
}
User_LM_Key = f"{lm_key}/{userdata}"

For Kadal Tenent:

is_cache
  • Default: 1 (caching is enabled).
  • Set to 0: Disables caching for API responses, ensuring each request fetches fresh results.
use_guard
  • Default: 1 (guardrails are enabled for input or output by default).
  • Set to 0: Completely disables guardrails.
  • Set to 2: Explicitly enables both input and output guardrails, providing maximum safeguards.
app_name
  • A string specifying the application context.
  • Include this only when the application is DSPY.

Bot_Id

  • A string representing the unique identifier for the bot.
  • Example: "cf72e8cb-ef5b-4c7f-9792-685e7100a94f".

Query_Id

  • A string or integer representing the unique identifier for the current query.
  • Example: "24".

word_count_sent

  • The number of words sent in the request.
  • Example: 50.

word_count_received

  • The number of words received in the response.
  • Example: 100.

toxicity

  • Indicates the toxicity level of the response.
  • Example: "Toxic".

sentiment

  • Indicates the sentiment of the response.
  • Example: "Positive".

app_name

  • A string specifying the application context.
  • Include this key only when the application is DSPY.

userid

  • A string or integer representing the unique identifier for the user.
  • Example: "100".

is_system

  • A boolean value indicating if the request is system-generated.
  • Default: True.
Additional settings
lm_key = 'YOUR_LM_KEY'
userdata = {
userdata = {
"is_cache":0,
"use_guard":0,
"Bot_Id": "cf72e8b-ef5b-4c7f-9792-685e7100a94f",
"Query_Id": "24",
"word_count_sent": 50,
"word_count_received": 100,
"toxicity": "Toxic",
"sentiment": "Positive",
"app_name": "DSPY", # use only when the app is DSPY
"userid": "100",
"is_system": False
}
}
Tenent_LM_Key = f"{lm_key}/{userdata}"

Code Examples

Azure OpenAI code Examples

Azure OpenAI
from openai import AzureOpenAI
azure_endpoint = 'https://api.kadal.ai/proxy/api/v1/azure'
lm_key = 'YOUR_LM_KEY'
api_version="2024-02-15-preview"

client = AzureOpenAI(azure_endpoint=azure_endpoint,
api_version= api_version,
api_key=lm_key)

response = client.chat.completions.create(
model = "gpt-4o-mini",
temperature = 0.7,
messages=[
{"role": "user", "content": """Hello World"""}
])
print(response)
Python requests
import requests  # Or use httpx
import json

# Azure OpenAI Configuration
azure_endpoint = 'https://api.kadal.ai/proxy/api/v1/azure'
lm_key = 'YOUR_LM_KEY'
headers = {
'Content-Type': 'application/json',
'api-key': lm_key
}

# Request Body
body = {
'messages': [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Hello! How can I improve my productivity?'}
],
'model': 'gpt-4',
'temperature': 0.7
}

# Sending the Request
try:
response = requests.post(
f'{azure_endpoint}/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview',
headers=headers,
json=body
)

# Check for successful response
if response.status_code == 200:
print("Response:")
print(json.dumps(response.json(), indent=4))
else:
print(f"Error: {response.status_code} - {response.text}")

except Exception as e:
print(f"An error occurred: {e}")
cURL method
curl --location 'https://api.kadal.ai/proxy/api/v1/azure/openai/deployments/gpt-4o/chat/completions?api-version=2024-02-15-preview' \
--header 'Content-Type: application/json' \
--header 'api-key: <YOUR_LM_KEY>' \
--data '{"model":"gpt-4o",
"temperature":0.7,
"messages":[
{"role": "user", "content": "Hello World"}
]}'

Multimodel Azure OpenAI Models

Azure OpenAI
import base64  # Importing the base64 module to encode the image into Base64 format

# Function to encode an image into Base64
def encode_image(image_path):
"""
Encodes the image at the given path into a Base64 string.
Args:
image_path (str): Path to the image file to be encoded.
Returns:
str: Base64-encoded string of the image.
"""
with open(image_path, "rb") as image_file: # Open the image file in binary mode
# Read the file, encode its content to Base64, and decode it to a UTF-8 string
return base64.b64encode(image_file.read()).decode('utf-8')

# Encode the image at the specified path
base64_image = encode_image("sample.png")

# Importing necessary modules
from openai import AzureOpenAI # AzureOpenAI SDK for interacting with Azure OpenAI services

# Define Azure OpenAI endpoint and API key
azure_endpoint = 'https://api.kadal.ai/proxy/api/v1/azure'
lm_key = 'YOUR_LM_KEY'

# Specify the API version to use
api_version = "2024-02-15-preview"

# Initialize the AzureOpenAI client
client = AzureOpenAI(
azure_endpoint=azure_endpoint, # Azure OpenAI endpoint
api_version=api_version, # API version
api_key=lm_key # Azure OpenAI API key for authentication
)

# Prepare the chat request
response = client.chat.completions.create(
model="gpt-4o", # Specify the model to use (e.g., GPT-4 optimized)
temperature=0.7, # Control the randomness of the response (higher = more random)
messages=[
{
"role": "user", # Role of the message sender
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url", # Specify the type as an image URL
"image_url": { # Provide the image URL in Base64 format
"url": f"data:image/jpeg;base64,{base64_image}", # Embed the Base64-encoded image
},
},
],
}
]
)

# Print the response from the Azure OpenAI service
print(response)


Embedding models

Azure OpenAI
from openai import AzureOpenAI  # Importing the AzureOpenAI class

azure_endpoint = 'https://api.kadal.ai/proxy/api/v1/azure'
lm_key = 'YOUR_LM_KEY'
api_version = "2024-02-15-preview"

# Initialize the AzureOpenAI client
client = AzureOpenAI(
azure_endpoint=azure_endpoint,
api_version=api_version,
api_key=lm_key
)


# Generate embeddings
response = client.embeddings.create(
input="Your text string",
model="text-embedding-ada-002"
)

print(response.data[0].embedding)

Dall-e Models

Azure OpenAI
from openai import AzureOpenAI  # Importing the AzureOpenAI class

azure_endpoint = 'https://api.kadal.ai/proxy/api/v1/azure'
lm_key = 'YOUR_LM_KEY'
api_version = "2024-02-15-preview"


# Initializing the AzureOpenAI client with the endpoint, API version, and API key
client = AzureOpenAI(
azure_endpoint=azure_endpoint,
api_version=api_version,
api_key=lm_key
)

response = client.images.generate(
model="dall-e-3", # Specifying the model to use (DALL-E 3)
prompt="generate a cat", # The text prompt describing the image to generate
size="1024x1024", # The desired size of the image
quality="standard", # The quality setting for the generated image
n=1, # Number of images to generate
)

# Extracting the URL of the generated image from the response
image_url = response.data[0].url
print(image_url) # Printing the URL of the generated image

Note:- For dall-e-3, only n=1 is supported.

Whisper Models

Azure OpenAI
from openai import AzureOpenAI  # Importing the AzureOpenAI class

azure_endpoint = 'https://api.kadal.ai/proxy/api/v1/azure'
lm_key = 'YOUR_LM_KEY'
api_version = "2024-02-15-preview"

# Initializing the AzureOpenAI client with the endpoint, API version, and API key
client = AzureOpenAI(
azure_endpoint=azure_endpoint,
api_version=api_version,
api_key=LM_PROXY_KEY
)

# Opening the audio file that you want to transcribe
audio_file = open("sample.mp3", "rb")

# Making a request to transcribe the audio using the Whisper model
transcription = client.audio.transcriptions.create(
model="whisper", # Specifying the model to use (Whisper for audio transcription)
temperature=0, # Setting temperature to 0 for deterministic output (less randomness)
response_format="verbose_json", # The format in which you want the transcription response
file=audio_file # Passing the audio file to be transcribed
)

# Printing the transcribed text from the response
print(transcription.text)

Gemini Models

Generate Content

Python Requests
import requests  # Importing the requests library to make HTTP requests

# API key for authentication with the service
lm_key = 'YOUR_LM_KEY'
# Base URL for the Gateway server handling API requests
LLM_Gateway_SERVER_URL = 'https://api.kadal.ai/proxy/api/v1/'

# Name of the model you want to use
model_name = 'gemini-1.5-pro'

# URL for the API endpoint to generate content using the specified model
url = f"{LLM_Gateway_SERVER_URL}/proxy/api/v1/gemini/models/{model_name}:generateContent"

# Headers required for the API request
headers = {
"Content-Type": "application/json", # Setting the content type to JSON
"api-key": f"{lm_key}", # Including the API key for authentication
}

# The data payload for the API request
data = {
"contents": [
{
"role": "user", # Role of the content, indicating this is from the user
"parts": [{"text": "Hello World"}] # The text input from the user
}
]
}

# Making a POST request to the API to generate content
response = requests.post(url, headers=headers, json=data)

# Printing the JSON response received from the API
print(response.json())

Image requests

Python Requests
import base64  # Importing the base64 module to encode the image into Base64 format
import requests # Importing the requests library to make HTTP requests

# Function to encode an image into Base64
def encode_image(image_path):
"""
Encodes the image at the given path into a Base64 string.
Args:
image_path (str): Path to the image file to be encoded.
Returns:
str: Base64-encoded string of the image.
"""
with open(image_path, "rb") as image_file: # Open the image file in binary mode
# Read the file, encode its content to Base64, and decode it to a UTF-8 string
return base64.b64encode(image_file.read()).decode('utf-8')

# Encode the image at the specified path
base64_image = encode_image("sample.png")
# API key for authentication with the service
lm_key = 'YOUR_LM_KEY'
# Base URL for the Gateway server handling API requests
LLM_Gateway_SERVER_URL = 'https://api.kadal.ai/proxy/api/v1/'

# Name of the model you want to use
model_name = 'gemini-1.5-pro'

# URL for the API endpoint to generate content using the specified model
url = f"{LLM_Gateway_SERVER_URL}/proxy/api/v1/gemini/models/{model_name}:generateContent"


headers = {
"Content-Type": "application/json",
"api-key": f"{lm_key}",
}

data={
"contents": [
{
"role": "user",
"parts": [
{
"inlineData": {
"mimeType": "image/jpg", #PNG - image/png, JPEG - image/jpeg, WebP - image/webp
"data": content
}
},
{
"text": "What's in this image?"
}
]
}
]}

# Making a POST request to the API to generate content
response = requests.post(url, headers=headers, json=data)

# Printing the JSON response received from the API
print(response.json())

Bedrock Anthorpic Models

Python Requests
import requests  # Importing the requests library to make HTTP requests

# Base URL for the Gateway server handling API requests
LLM_Gateway_SERVER_URL = 'https://api.kadal.ai/proxy/api/v1/'

# API key for authentication with the service
lm_key = 'YOUR_LM_KEY'

# Identifier for the specific model you want to use
modelId = "anthropic.claude-3-haiku-20240307-v1:0"

# Headers required for the API request
headers = {
"api-key": f"{lm_key}", # Including the API key for authentication
}

# The request body containing parameters for the API call
body = {
"anthropic_version": "bedrock-2023-05-31", # Specifies the version of the Anthropic API
"max_tokens": 512, # The maximum number of tokens to generate in the response
"messages": [
{
"role": "user", # Role indicating this message is from the user
"content": "say my name bot." # The text input from the user
}
]
}

# Making a POST request to the API to generate a response from the model
response = requests.post(
f"{LLM_Gateway_SERVER_URL}/bedrock/v1/messages/anthropic?modelId={modelId}",
headers=headers,
json=body
)

# Check for a successful response from the API
if response.status_code == 200:
data = response.json() # Parse the JSON response
print(data) # Print the response data
else:
# Print an error message if the API call fails
print(f"Error: API call failed with status code {response.status_code}, {response.text}")

Bedrock Llama Models

Python Requests
import requests
# Base URL for the Gateway server handling API requests
LLM_Gateway_SERVER_URL = 'https://api.kadal.ai/proxy/api/v1/'

# API key for authentication with the service
lm_key = 'Your_LM_Key'

# Identifier for the specific model you want to use
modelId = "meta.llama3-1-8b-instruct-v1:0"

headers = {
"api-key": f"{lm_key}", # Including the API key for authentication
}

# The request body containing parameters for the API call
body = {
'messages': [{'role': 'user', 'content': [{'text': 'Hello World'}]}],
'inferenceConfig': {'maxTokens': 512, 'temperature': 0.5, 'topP': 0.9},
'system': [{'text': 'This is a valid system text'}],
}

# Making a POST request to the API to generate a response from the model
response = requests.post(
f"{LLM_Gateway_SERVER_URL}/bedrock/v1/messages/llama?modelId={modelId}",
headers=headers,
json=body
)
# Check for successful response
if response.status_code == 200:
data = response.json()
print(data)
else:
print(f"Error: API call failed with status code {response.status_code} , {response.text}")

Bedrock Nova Models

text generation:

Python Requests
import requests
import json

# Base URL for the Gateway server handling API requests
LLM_Gateway_SERVER_URL = 'https://api.kadal.ai/proxy/api/v1/'

# API key for authentication with the service
lm_key = 'Your_LM_Key'


modelId = "us.amazon.nova-lite-v1:0"

body = json.dumps(
{
"messages": [
{"role": "user", "content": [{"text": "Hello World"}]},
],
"inferenceConfig": {"maxTokens": 512, "temperature": 0.5, "topP": 0.9},
"system": [{"text": "This is a valid system text "}],
}
)
headers = {"api-key": lm_key}

response = requests.post(
f"{LLM_Gateway_SERVER_URL}/bedrock/v1/messages/nova?modelId={modelId}",
headers=headers,
json=json.loads(body),
)
# Check for successful response
if response.status_code == 200:
data = response.json()
print(data)
else:
print(
f"Error: API call failed with status code {response.status_code} , {response.text}"
)

multimodel:

Python Requests
import requests
import base64
import json

# Base URL for the Gateway server handling API requests
LLM_Gateway_SERVER_URL = 'https://api.kadal.ai/proxy/api/v1/'

# API key for authentication with the service
lm_key = 'Your_LM_Key'

modelId = "us.amazon.nova-lite-v1:0"

IMAGE_NAME = "sample.png"

with open(IMAGE_NAME, "rb") as f:
image = f.read()
encoded_image = base64.b64encode(image).decode("utf-8")
user_message = "what in this image describe the image also."


body = json.dumps(
{
"messages": [
{
"role": "user",
"content": [
{"text": user_message},
{"image": {"format": "png", "source": {"bytes": encoded_image}}},
],
}
],
"inferenceConfig": {"maxTokens": 512, "temperature": 0.5, "topP": 0.9},
"system": [{"text": "This is a valid system text "}],
}
)

headers = {"api-key": lm_key}

response = requests.post(
f"{LLM_Gateway_SERVER_URL}/bedrock/v1/messages/nova?modelId={modelId}",
headers=headers,
json=json.loads(body),
)
# Check for successful response
if response.status_code == 200:
data = response.json()
print(data)
else:
print(
f"Error: API call failed with status code {response.status_code} , {response.text}"
)

documents Input

Python Requests
import requests
import base64
import json

# Base URL for the Gateway server handling API requests
LLM_Gateway_SERVER_URL = 'https://api.kadal.ai/proxy/api/v1/'

# API key for authentication with the service
lm_key = 'Your_LM_Key'
modelId = "us.amazon.nova-pro-v1:0"
Docs_NAME = "sample.pdf"

with open(Docs_NAME, "rb") as file:
doc_bytes = file.read()
encoded_docs = base64.b64encode(doc_bytes).decode("utf-8")
user_message = "tell me the title of the document"


body = json.dumps(
{
"messages": [
{
"role": "user",
"content": [
{
"document": {
"format": "pdf",
"name": "DocumentPDFmessages",
"source": {"bytes": encoded_docs},
}
},
{"text": user_message},
],
},
],
"inferenceConfig": {"maxTokens": 512, "temperature": 0.5, "topP": 0.9},
"system": [{"text": "This is a valid system text "}],
}
)

headers = {"api-key": lm_key}

response = requests.post(
f"{LLM_Gateway_SERVER_URL}/bedrock/v1/messages/nova?modelId={modelId}",
headers=headers,
json=json.loads(body),
)
# Check for successful response
if response.status_code == 200:
data = response.json()
print(data)
else:
print(
f"Error: API call failed with status code {response.status_code} , {response.text}"
)

Video Input

Python Requests
import requests
import time
import base64
import json

# Base URL for the Gateway server handling API requests
LLM_Gateway_SERVER_URL = 'https://api.kadal.ai/proxy/api/v1/'

# API key for authentication with the service
lm_key = 'Your_LM_Key'

modelId = "us.amazon.nova-pro-v1:0"

Video_Name = "sample.mp4"

with open(Video_Name, "rb") as video:
video_bytes = video.read()
encoded_video = base64.b64encode(video_bytes).decode("utf-8")
user_message = "what in this video."


body = json.dumps(
{
"messages": [
{
"role": "user",
"content": [
{"text": user_message},
{"video": {"format": "mp4", "source": {"bytes": encoded_video}}},
],
},
],
"inferenceConfig": {"maxTokens": 512, "temperature": 0.5, "topP": 0.9},
"system": [{"text": "This is a valid system text "}],
}
)
headers = ({"api-key": lm_key},)
response = requests.post(
f"{LLM_Gateway_SERVER_URL}/bedrock/v1/messages/nova?modelId={modelId}",
headers=headers,
json=json.loads(body),
)
# Check for successful response
if response.status_code == 200:
data = response.json()
print(data)
else:
print(
f"Error: API call failed with status code {response.status_code} , {response.text}"
)

Responses:

Status CodeDescriptionContent-Type
200Successfulapplication/json
400Bad Requestapplication/json
401Unauthorizedapplication/json
403Forbiddenapplication/json
404Resource Not Foundapplication/json
422Validation Errorapplication/json
500Internal Server Errorapplication/json