LLM Gateway

Vision Support

Learn how to send images to vision-enabled models using URLs or inline base64 data.

Vision Support

LLMGateway supports vision-enabled models that can analyze and describe images. You can provide images via HTTPS URLs or inline base64-encoded data.

Vision-Enabled Models

You can find all vision-enabled models on our models page with vision filter. These models can process both text and image content in the same request.

Image Formats

Using HTTPS URLs

You can provide any publicly accessible HTTPS URL pointing to an image:

curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What do you see in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg"
            }
          }
        ]
      }
    ]
  }'

Using Base64 Inline Data

You can also provide images as base64-encoded data URIs:

curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe this image"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "..."
            }
          }
        ]
      }
    ]
  }'

Content Array Format

When using vision models, the content field should be an array containing both text and image content blocks:

  • Text content: {"type": "text", "text": "Your message"}
  • Image content: {"type": "image_url", "image_url": {"url": "image_url_or_data_uri"}}

Multiple Images

You can include multiple images in a single request:

curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Compare these two images"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image1.jpg"
            }
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image2.jpg"
            }
          }
        ]
      }
    ]
  }'

Simple String Content

For vision models, you can still use simple string content for text-only messages. The array format is only required when including images.

curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Hello! How can you help me today?"
      }
    ]
  }'

Supported Image Types

Vision models typically support common image formats including:

  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • WebP (.webp)
  • GIF (.gif)

The specific formats supported may vary by model provider. Check the individual model documentation for format limitations and file size restrictions.

Error Handling

If an image URL is inaccessible or the image format is unsupported, the gateway will handle the error gracefully and may substitute a placeholder or error message in the request to the underlying model.

Vision Support