uform-gen2-qwen-500m Beta
Image-to-Text • unumUForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.
Usage
Workers - TypeScript
  export interface Env {  AI: Ai;}
export default {  async fetch(request: Request, env: Env): Promise<Response> {    const res = await fetch("https://cataas.com/cat");    const blob = await res.arrayBuffer();    const input = {      image: [...new Uint8Array(blob)],      prompt: "Generate a caption for this image",      max_tokens: 512,    };    const response = await env.AI.run(      "@cf/unum/uform-gen2-qwen-500m",      input      );    return new Response(JSON.stringify(response));  },} satisfies ExportedHandler<Env>;Parameters
Input
-  
0stringBinary string representing the image contents.
 -  
1object-  
temperaturenumberControls the randomness of the output; higher values produce more random results.
 -  
promptstringThe input text prompt for the model to generate a response.
 -  
rawbooleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
 -  
imageone of-  
0arrayAn array of integers that represent the image data constrained to 8-bit unsigned integer values
-  
itemsnumberA value between 0 and 255
 
 -  
 -  
1stringBinary string representing the image contents.
 
 -  
 -  
max_tokensinteger default 512The maximum number of tokens to generate in the response.
 
 -  
 
Output
-  
descriptionstring 
API Schemas
The following schemas are based on JSON Schema
{    "oneOf": [        {            "type": "string",            "format": "binary",            "description": "Binary string representing the image contents."        },        {            "type": "object",            "properties": {                "temperature": {                    "type": "number",                    "description": "Controls the randomness of the output; higher values produce more random results."                },                "prompt": {                    "type": "string",                    "description": "The input text prompt for the model to generate a response."                },                "raw": {                    "type": "boolean",                    "default": false,                    "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."                },                "image": {                    "oneOf": [                        {                            "type": "array",                            "description": "An array of integers that represent the image data constrained to 8-bit unsigned integer values",                            "items": {                                "type": "number",                                "description": "A value between 0 and 255"                            }                        },                        {                            "type": "string",                            "format": "binary",                            "description": "Binary string representing the image contents."                        }                    ]                },                "max_tokens": {                    "type": "integer",                    "default": 512,                    "description": "The maximum number of tokens to generate in the response."                }            },            "required": [                "image"            ]        }    ]}{    "type": "object",    "contentType": "application/json",    "properties": {        "description": {            "type": "string"        }    }}