Module huggingface

avi0ra/huggingface

1.1.0

Hugging Face Connector for Ballerina

Connects Ballerina applications to the Hugging Face Inference API for running state-of-the-art machine learning models hosted on the Hugging Face Hub.

This package provides a robust, typed Client equipped with strongly-typed request and response records supporting 17+ AI/ML operations. Built for production, it features a generic inferModel helper for unmapped models, a native Retrieval-Augmented Generation (RAG) pipeline, comprehensive stateful Conversation management, robust batch inference execution, streaming chat completions, server-side model wait (waitForModel), automatic retry heuristics (exponential backoff) for cold-starting models, and rich multi-modal helpers for images and audio.

Supported AI Capabilities

Capability	Resource Path	Example Model
Chat Completion	`/v1/chat/completions`	`Qwen/Qwen2.5-7B-Instruct`
Streaming Chat	`/v1/chat/completions/streamed`	`Qwen/Qwen2.5-7B-Instruct`
Text Generation	`/hf-inference/models/{model}`	`openai-community/gpt2`
Fill Mask	`/hf-inference/models/{model}/fill-mask`	`google-bert/bert-base-uncased`
Text Classification	`/hf-inference/models/{model}/text-classification`	`distilbert-base-uncased-finetuned-sst-2-english`
Token Classification (NER)	`/hf-inference/models/{model}/token-classification`	`dslim/bert-base-NER`
Feature Extraction	`/hf-inference/models/{model}/feature-extraction`	`intfloat/multilingual-e5-large`
Sentence Similarity	`/hf-inference/models/{model}/sentence-similarity`	`sentence-transformers/all-MiniLM-L6-v2`
Question Answering	`/hf-inference/models/{model}/question-answering`	`deepset/roberta-base-squad2`
Summarization	`/hf-inference/models/{model}/summarization`	`facebook/bart-large-cnn`
Translation	`/hf-inference/models/{model}/translation`	`Helsinki-NLP/opus-mt-en-fr`
Zero-Shot Classification	`/hf-inference/models/{model}/zero-shot-classification`	`facebook/bart-large-mnli`
Text-to-Image	`/hf-inference/models/{model}/text-to-image`	`black-forest-labs/FLUX.1-schnell`
Text-to-Speech	`/hf-inference/models/{model}/text-to-speech`	`facebook/mms-tts-eng`
Image Classification	`/hf-inference/models/{model}/image-classification`	`google/vit-base-patch16-224`
Image Captioning (Image-to-Text)	`/hf-inference/models/{model}/image-to-text`	`Salesforce/blip-image-captioning-large`
Automatic Speech Recognition	`/hf-inference/models/{model}/automatic-speech-recognition`	`openai/whisper-large-v3-turbo`
Batch Operations	`/hf-inference/models/{model}/.../batch`	Any compatible model

Any model available on the Hugging Face Hub can be used — not just the examples above. Browse by task at huggingface.co/models.

Setup

1. Get a Hugging Face token

Create a free account at huggingface.co
Go to Settings → Access Tokens
Click New token, choose Read type, enable Inference Providers under the Inference section
Copy the token

2. Add the connector


bal add avi0ra/huggingface

3. Configure the token

In Config.toml:


token = "<YOUR_HF_TOKEN>"

Or via environment variable:


export HF_TOKEN="<YOUR_HF_TOKEN>"

Quickstart

Chat Completion


import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:ChatCompletionResponse resp = check hf->/v1/chat/completions.post({
        model: "Qwen/Qwen2.5-7B-Instruct",
        messages: [{role: "user", content: "What is Ballerina?"}],
        maxTokens: 100,
        topP: 0.9
    });

    io:println(resp?.choices);
    io:println("Tokens used: ", resp?.usage?.totalTokens);
}

Streaming Chat Completion

Iterate chunks from the parsed SSE response:


import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    stream<huggingface:ChatCompletionChunk, error?> tokenStream =
        check hf->/v1/chat/completions/streamed.post({
            model: "Qwen/Qwen2.5-7B-Instruct",
            messages: [{role: "user", content: "Count from 1 to 5."}],
            maxTokens: 50
        });

    check from huggingface:ChatCompletionChunk chunk in tokenStream do {
        huggingface:ChatCompletionChunkChoice[]? choices = chunk?.choices;
        if choices is huggingface:ChatCompletionChunkChoice[] && choices.length() > 0 {
            string? content = choices[0].delta?.content;
            if content is string {
                io:print(content);
            }
        }
    };
    io:println();
}

Note: The current implementation collects the full SSE response before returning the stream.

Stateful Chat Conversation

Maintain cross-turn chat history automatically using the Conversation class:


import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:Conversation conv = new (
        hf,
        "Qwen/Qwen2.5-7B-Instruct",
        systemPrompt = "You are a helpful assistant."
    );

    string reply1 = check conv.chat("What is Ballerina?");
    io:println("Assistant: ", reply1);

    string reply2 = check conv.chat("Who created it?");
    io:println("Assistant: ", reply2);

    io:println("Turns completed: ", conv.turnCount());
}

RAG Pipeline

End-to-end Retrieval Augmented Generation in a single function call:


import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:RagDocument[] documents = [
        {
            id: "doc1",
            content: "Ballerina is an open-source language for cloud-native integration by WSO2.",
            metadata: {"source": "ballerina.io"}
        },
        {
            id: "doc2",
            content: "WSO2 is a Sri Lankan technology company founded in 2005.",
            metadata: {"source": "wso2.com"}
        }
    ];

    huggingface:RagResult result = check huggingface:ragQuery(
        hf,
        "Who created Ballerina?",
        documents
    );

    io:println("Answer: ", result.answer);
    io:println("Sources used: ", result.sources.length());
    io:println("Top relevance score: ", result.scores[0]);
}

Auto-Retry for Cold Models

Models on the free tier go cold after inactivity and return 503 while loading. The connector retries automatically with exponential backoff:


huggingface:Client hf = check new (
    {auth: {token}},
    retryConfig = {
        maxRetries: 5,
        initialDelay: 2.0,
        maxDelay: 30.0
    }
);

Eliminating Cold-Start Latency with `waitForModel`


huggingface:Client hf = check new ({
    auth: {token},
    waitForModel: true,
    timeout: 120
});

Setting waitForModel: true sends x-wait-for-model: true on every request so the server waits for cold models to load instead of immediately returning 503. This eliminates most retry cycles. Pair it with a higher timeout to cover the model load time.

Load images and audio from files or URLs directly:


// Image from file
huggingface:ImageClassificationResult[] res =
    check hf->/hf\-inference/models/["google/vit-base-patch16-224"]/image\-classification/file.post(
        "path/to/image.jpg"
    );

// Image captioning from URL
huggingface:ImageToTextResult[] captions =
    check hf->/hf\-inference/models/["Salesforce/blip-image-captioning-large"]/image\-to\-text/url.post(
        "https://example.com/photo.jpg"
    );

// Audio from file
huggingface:AutomaticSpeechRecognitionResponse resp =
    check hf->/hf\-inference/models/["openai/whisper-large-v3-turbo"]/automatic\-speech\-recognition/file.post(
        "path/to/audio.flac"
    );

All Supported Operations

Fill Mask


huggingface:FillMaskResult[] res =
    check hf->/hf\-inference/models/["google-bert/bert-base-uncased"]/fill\-mask.post({
        inputs: "Paris is the [MASK] of France."
    });
io:println(res[0]?.tokenStr, " (", res[0]?.score, ")");

Text Classification


huggingface:ClassificationLabel[][] res =
    check hf->/hf\-inference/models/["distilbert-base-uncased-finetuned-sst-2-english"]/text\-classification.post({
        inputs: "Ballerina makes integration elegant!"
    });
io:println(res[0][0]?.label, " (", res[0][0]?.score, ")");

Token Classification (NER)


huggingface:TokenClassificationEntity[] entities =
    check hf->/hf\-inference/models/["dslim/bert-base-NER"]/token\-classification.post({
        inputs: "WSO2 is based in Sri Lanka."
    });
io:println(entities);

Feature Extraction (Embeddings)


float[] embeddings =
    check hf->/hf\-inference/models/["intfloat/multilingual-e5-large"]/feature\-extraction.post({
        inputs: "Ballerina cloud-native integration."
    });
io:println("Dimensions: ", embeddings.length());

Sentence Similarity


float[] scores =
    check hf->/hf\-inference/models/["sentence-transformers/all-MiniLM-L6-v2"]/sentence\-similarity.post({
        inputs: {
            source_sentence: "What is Ballerina?",
            sentences: ["Ballerina is a cloud-native language.", "Python is for data science."]
        }
    });
io:println("Scores: ", scores);

Question Answering


huggingface:QuestionAnsweringResponse ans =
    check hf->/hf\-inference/models/["deepset/roberta-base-squad2"]/question\-answering.post({
        inputs: {
            question: "What is Ballerina?",
            context: "Ballerina is an open-source language for cloud-native integration by WSO2."
        }
    });
io:println(ans?.answer);

Summarization


huggingface:SummarizationResult[] res =
    check hf->/hf\-inference/models/["facebook/bart-large-cnn"]/summarization.post({
        inputs: "Ballerina is a modern open-source programming language designed for cloud-native integration...",
        parameters: {maxLength: 40, minLength: 15}
    });
io:println(res[0].summaryText);

Translation


huggingface:TranslationResult[] res =
    check hf->/hf\-inference/models/["Helsinki-NLP/opus-mt-en-fr"]/translation.post({
        inputs: "Hello, how are you?"
    });
io:println(res[0].translationText);

Zero-Shot Classification


huggingface:ZeroShotClassificationResponse res =
    check hf->/hf\-inference/models/["facebook/bart-large-mnli"]/zero\-shot\-classification.post({
        inputs: "Ballerina is a programming language for cloud integration.",
        parameters: {candidateLabels: ["technology", "sports", "politics"]}
    });
io:println(res);

Text-to-Image Generation


byte[] imageBytes =
    check hf->/hf\-inference/models/["black-forest-labs/FLUX.1-schnell"]/text\-to\-image.post({
        inputs: "A robot writing Ballerina code"
    });
check io:fileWriteBytes("output.png", imageBytes);

Text-to-Speech


byte[] audioBytes =
    check hf->/hf\-inference/models/["facebook/mms-tts-eng"]/text\-to\-speech.post({
        inputs: "Hello from Ballerina!"
    });
check io:fileWriteBytes("speech.wav", audioBytes);

Image Classification


byte[] payload = check io:fileReadBytes("image.jpg");
huggingface:ImageClassificationResult[] res =
    check hf->/hf\-inference/models/["google/vit-base-patch16-224"]/image\-classification.post(payload);
io:println(res[0]?.label, " (", res[0]?.score, ")");

Image Captioning (Image-to-Text)


byte[] payload = check io:fileReadBytes("photo.jpg");
huggingface:ImageToTextResult[] captions =
    check hf->/hf\-inference/models/["Salesforce/blip-image-captioning-large"]/image\-to\-text.post(payload);
io:println(captions[0]?.generatedText);

Automatic Speech Recognition


huggingface:AutomaticSpeechRecognitionResponse resp =
    check hf->/hf\-inference/models/["openai/whisper-large-v3-turbo"]/automatic\-speech\-recognition/file.post(
        "audio.flac"
    );
io:println(resp?.text);

Universal Model Runner

The ModelRunner class works with any Hugging Face model. Provide the model ID and it auto-detects the pipeline task from the Hub, then routes every call to the correct typed endpoint.


// Summarisation — just name the model
huggingface:ModelRunner runner = new (hf, "facebook/bart-large-cnn");
io:println("Task: ", runner.getPipelineTag()); // "summarization"

json summary = check runner.run(
    "Ballerina is a modern open-source language designed for cloud-native integration."
);
io:println(summary);

// NER — same API, different model
huggingface:ModelRunner ner = new (hf, "dslim/bert-base-NER");
json entities = check ner.run("WSO2 is based in Sri Lanka.");

// Translation
huggingface:ModelRunner xlat = new (hf, "Helsinki-NLP/opus-mt-en-fr");
json translated = check xlat.run("Hello, how are you?");

// Question Answering — structured JSON input
huggingface:ModelRunner qa = new (hf, "deepset/roberta-base-squad2");
json answer = check qa.runWithJson({
    inputs: {question: "What is Ballerina?", context: "Ballerina is..."}
});

// Image classification from file
huggingface:ModelRunner clf = new (hf, "google/vit-base-patch16-224");
json labels = check clf.runImageFile("photo.jpg");

// Image generation — returns raw bytes
huggingface:ModelRunner img = new (hf, "black-forest-labs/FLUX.1-schnell");
byte[] png = check img.generateMedia("A robot writing Ballerina code");
check io:fileWriteBytes("output.png", png);

// ASR from audio file
huggingface:ModelRunner whisper = new (hf, "openai/whisper-large-v3-turbo");
json transcript = check whisper.runAudioFile("audio.flac", huggingface:AUDIO_FLAC);

`ModelRunner` method reference

Method	Input	Output	Auto-routed tasks
`run(string)`	Plain text	`json`	text-generation, fill-mask, text-classification, token-classification, feature-extraction, summarization, translation
`runWithJson(json)`	Custom JSON	`json`	question-answering, zero-shot-classification, sentence-similarity, chat-completion
`runBytes(byte[], contentType)`	Binary	`json`	image-classification, image-to-text, automatic-speech-recognition
`generateMedia(string)`	Prompt	`byte[]`	text-to-image, text-to-speech
`runImageFile(path)`	File path	`json`	Same as `runBytes`
`runImageUrl(url)`	Public URL	`json`	Same as `runBytes`
`runAudioFile(path)`	File path	`json`	ASR
`runAudioUrl(url)`	Public URL	`json`	ASR

One-shot convenience functions


// Auto-detect + run in one line
json result = check huggingface:autoRun(hf, "facebook/bart-large-cnn", "Long article...");

// With structured JSON payload
json answer = check huggingface:autoRunJson(hf, "deepset/roberta-base-squad2", {
    inputs: {question: "What is Ballerina?", context: "Ballerina is..."}
});

// Binary media generation
byte[] png = check huggingface:autoGenerateMedia(
    hf, "black-forest-labs/FLUX.1-schnell", "A robot coding in Ballerina"
);

Tip: Reuse a ModelRunner instance for repeated calls — the Hub lookup only happens once at construction. autoRun() and friends perform the lookup on every call.

Generic Inference Helper

Call any Hugging Face model not covered by the typed operations. Now includes cold-start retry:


json result = check huggingface:inferModel(
    hf,
    "openai-community/gpt2",
    {inputs: "Ballerina is designed for"}
);
io:println(result);

Using Custom Models

The connector works with any model on the Hugging Face Hub. Pass any model ID as long as it matches the task:


check hf->/hf\-inference/models/["Helsinki-NLP/opus-mt-en-si"]/translation.post({
    inputs: "Hello"
});

Browse available models by task:

Model Metadata & Batch Helpers

Retrieve model information and check inference availability:


huggingface:ModelInfo info = check huggingface:getModelInfo(hf, "gpt2");
io:println("Downloads: ", info.downloads);

huggingface:ModelAvailability availability = check huggingface:checkModelAvailability(hf, "gpt2");
io:println("Available for inference: ", availability.available);

Run batch inference efficiently:


json[] batchResults = check huggingface:batchInfer(
    hf,
    ["Hello world", "Ballerina is great"],
    "openai-community/gpt2"
);

Compute semantic similarity with embedding-based scoring:


float[] scores = check huggingface:sentenceSimilarity(
    hf,
    "What is Ballerina?",
    ["Ballerina is a cloud-native language.", "Python is for data science."]
);
io:println("Scores: ", scores);

Changelog

1.1.0

Added ModelRunner class — universal model runner that auto-detects the pipeline task from the Hub and routes to the correct typed endpoint. Works with any Hugging Face model.
Added autoRun(), autoRunJson(), autoGenerateMedia() convenience functions.
Added waitForModel flag to ConnectionConfig — sends x-wait-for-model: true header to eliminate most cold-start 503 round-trips.
Added fill-mask endpoint for BERT-style masked token prediction.
Added image-to-text (captioning) endpoint with bytes, file, and URL variants.
Added text-to-speech endpoint for audio synthesis.
Added sentence-similarity typed endpoint.
Added sentenceSimilarity embedding-based helper function.
Added topP, stop, seed, frequencyPenalty, presencePenalty to ChatCompletionRequest.
Added doSample, topK, topP, repetitionPenalty to TextGenerationParameters.
Added guidanceScale, negativePrompt, seed to TextToImageParameters.
Added UsageStats type and usage field to ChatCompletionResponse.
Fixed inferModel and batchInfer to use postWithRetry — now honour retry config on 503.
Fixed RetryConfig validation: maxRetries >= 1 and initialDelay <= maxDelay enforced at init.
Fixed SSE streaming parser to handle \r\n line endings.
Increased default timeout in ConnectionConfig from 30 s to 60 s.

1.0.0

Added stateful Conversation class for automated chat history management.
Added batch inference operations (batchInfer and typed /batch endpoints).
Added Model Metadata APIs (getModelInfo, checkModelAvailability).
Upgraded ragQuery to use batch embeddings and RagConfig.

0.3.0

Added streaming chat completions via /v1/chat/completions/streamed.
Added RAG pipeline helper ragQuery (initial version).
Added automatic retry with exponential backoff for cold-starting models (503).
Added image classification from file path and URL.
Added ASR from file path and URL.
Introduced RetryConfig, RagDocument, RagResult, ImageContentType, AudioContentType types.
Improved generic inferModel helper with rich error handling.

0.2.0

Initial release of the avi0ra/huggingface connector.
Native support for 12 AI/ML inference operations.
Generic inferModel helper.

Issues and contributions

Report issues at github.com/HasithaErandika/module-ballerinax-huggingface/issues.

For Ballerina community support: Discord · Stack Overflow #ballerina

Functions

autoGenerateMedia

Isolated Function

function autoGenerateMedia(Client hfClient, string model, string prompt) returns byte[]|error

Auto-detect a model's task and generate binary media (image or audio) from text.

Works for text-to-image and text-to-speech models. Returns raw binary bytes.


byte[] png = check huggingface:autoGenerateMedia(
    hfClient,
    "stabilityai/stable-diffusion-xl-base-1.0",
    "A Ballerina dancer on the moon"
);
check io:fileWriteBytes("art.png", png);

Parameters

hfClient Client - A configured Client instance

model string - Any Hugging Face model ID for a generative media task

prompt string - Text prompt for the media generation model

Return Type

byte[]|error - Raw binary bytes (PNG, WAV, etc.) or an error

autoRun

Isolated Function

function autoRun(Client hfClient, string model, string input) returns json|error

Auto-detect a model's task and run text inference in a single call.

This is a convenience shorthand for:


huggingface:ModelRunner runner = new (hfClient, model);
json result = check runner.run(input);

For repeated calls against the same model, prefer creating a ModelRunner instance once and reusing it — that avoids the Hub metadata fetch on every invocation.


json result = check huggingface:autoRun(hfClient, "facebook/bart-large-cnn",
    "Ballerina is a modern open-source programming language for cloud-native integration.");
io:println(result);

Parameters

hfClient Client - A configured Client instance

model string - Any Hugging Face model ID

input string - Plain text input

Return Type

json|error - JSON inference result or an error

autoRunJson

Isolated Function

function autoRunJson(Client hfClient, string model, json payload) returns json|error

Auto-detect a model's task and run inference with a custom JSON payload.

Equivalent to creating a ModelRunner and calling runWithJson(payload).


json answer = check huggingface:autoRunJson(hfClient, "deepset/roberta-base-squad2", {
    inputs: {
        question: "What is Ballerina?",
        context: "Ballerina is an open-source language for cloud-native integration by WSO2."
    }
});

Parameters

hfClient Client - A configured Client instance

model string - Any Hugging Face model ID

payload json - Custom JSON payload for the inference endpoint

Return Type

json|error - Raw JSON response or an error

batchInfer

Isolated Function

function batchInfer(Client hfClient, string[] inputs, string model, map<string|string[]> headers) returns json[]|error

Perform batch inference on multiple inputs in a single API call.

More efficient than calling inferModel repeatedly when processing large numbers of inputs against the same model.

Respects the client's RetryConfig — automatically retries on HTTP 503.

Parameters

hfClient Client - A configured Client instance

inputs string[] - Array of input strings to process in one request

model string - The model ID

headers map<string|string[]> (default {}) - Optional additional headers

Return Type

json[]|error - Array of JSON results one per input, or an error

checkModelAvailability

Isolated Function

function checkModelAvailability(Client hfClient, string model) returns ModelAvailability|error

Check whether a model is available on the Hugging Face Inference API.

Returns a ModelAvailability record with availability status and metadata. Does not throw an error if the model is not found — returns available: false.

Parameters

hfClient Client - A configured Client instance

model string - The model ID to check

Return Type

ModelAvailability|error - A ModelAvailability record, or an error if the Hub API fails

getModelInfo

Isolated Function

function getModelInfo(Client hfClient, string model) returns ModelInfo|error

Retrieve metadata for a model from the Hugging Face Hub API.

Works for public models without authentication. For private models, the request may return HTTP 401 — ensure the model is accessible with your token.

Parameters

hfClient Client - A configured Client instance

model string - The model ID (e.g. "gpt2", "facebook/bart-large-cnn")

Return Type

ModelInfo|error - A ModelInfo record with model details, or an error

inferModel

Isolated Function

function inferModel(Client hfClient, string model, json payload, map<string|string[]> headers) returns json|error

Perform a generic inference call against any Hugging Face model.

Useful when the model or endpoint does not match one of the strongly-typed operations in the generated client. The task is determined automatically by the model — no suffix needed in the URL.

Respects the client's RetryConfig — automatically retries on HTTP 503 (model cold-start) with exponential backoff.

Parameters

hfClient Client - A configured Client instance

model string - The model ID (e.g. "gpt2", "Qwen/Qwen2.5-7B-Instruct")

payload json - JSON payload sent to the inference endpoint

headers map<string|string[]> (default {}) - Optional additional HTTP headers

Return Type

json|error - The raw JSON response or an error

ragQuery

Isolated Function

function ragQuery(Client hfClient, string query, RagDocument[] documents, RagConfig config) returns RagResult|error

Retrieval Augmented Generation (RAG) pipeline.

Embeds the query and all documents, ranks documents by cosine similarity, filters by similarity threshold, then generates a grounded answer using the top-K documents as context. Uses batch embedding for efficiency.

Basic usage


huggingface:RagDocument[] docs = [
    {id: "1", content: "Ballerina is created by WSO2."},
    {id: "2", content: "Python is used for data science."}
];
huggingface:RagResult result = check huggingface:ragQuery(hfClient, "Who made Ballerina?", docs);
io:println(result.answer);

Parameters

hfClient Client - A configured Client instance

query string - The natural language question to answer

documents RagDocument[] - The corpus of documents to search through

config RagConfig (default {}) - RAG configuration (models, topK, threshold, system prompt)

Return Type

RagResult|error - A RagResult with the answer, source documents, and scores, or an error

sentenceSimilarity

Isolated Function

function sentenceSimilarity(Client hfClient, string sourceSentence, string[] sentences, string embeddingModel) returns float[]|error

Compute semantic similarity scores between a source sentence and candidate sentences.

Uses embedding-based cosine similarity so the result is purely numerical — no LLM generation is involved. Suitable for semantic search, deduplication, and ranking.

Example


float[] scores = check huggingface:sentenceSimilarity(
    hfClient,
    "What is Ballerina?",
    ["Ballerina is a cloud-native language.", "Python is for data science."]
);
io:println("Scores: ", scores);  // e.g., [0.91, 0.23]

Parameters

hfClient Client - A configured Client instance

sourceSentence string - The reference sentence to compare against

sentences string[] - Candidate sentences to score

embeddingModel string (default "intfloat/multilingual-e5-large") - Embedding model ID (default: "intfloat/multilingual-e5-large")

Return Type

float[]|error - A float array of cosine similarity scores (one per candidate sentence), or an error

Classes

huggingface: Conversation

Isolated

A stateful conversation manager that maintains full chat history across turns.

Handles message history automatically so callers only need to provide the next user message and receive the assistant reply. Thread-safe via lock statements.

Basic usage


huggingface:Conversation conv = new (hfClient, "katanemo/Arch-Router-1.5B:hf-inference");
string reply1 = check conv.chat("What is Ballerina?");
string reply2 = check conv.chat("Who created it?");
io:println("Turns: ", conv.turnCount());
conv.reset();

With system prompt


huggingface:Conversation conv = new (
    hfClient,
    "katanemo/Arch-Router-1.5B:hf-inference",
    systemPrompt = "You are a helpful Ballerina programming assistant.",
    maxTokens = 150
);
string reply = check conv.chat("How do I write a REST service?");

Constructor

Creates a new Conversation with the given client and model.

init (Client hfClient, string model, string systemPrompt, int maxTokens)

hfClient Client - A configured Client instance

model string - The chat model ID to use for generation

systemPrompt string "" - Optional system prompt to set assistant behaviour

maxTokens int 200 - Maximum tokens per response (default: 200)

chat

Isolated Function

function chat(string userMessage) returns string|error

Send a user message and receive the assistant reply.

The conversation history is updated automatically after each call.

Parameters

userMessage string - The user message to send

Return Type

string|error - The assistant reply as a plain string, or an error

getHistory

Isolated Function

function getHistory() returns ChatMessage[]

Get the full conversation history including all turns.

Return Type

ChatMessage[] - Ordered array of all messages in the conversation

snapshot

Isolated Function

function snapshot() returns ConversationSnapshot

Get a snapshot of the current conversation state.

Return Type

ConversationSnapshot - A ConversationSnapshot record with history, model, and turn count

reset

Isolated Function

function reset()

Reset the conversation history.

If a system prompt was provided at initialization it is preserved. All user and assistant messages are cleared.

turnCount

Isolated Function

function turnCount() returns int

Get the number of completed user/assistant exchange pairs.

Return Type

int - Number of turns (each user message counts as one turn)

huggingface: ModelRunner

Isolated

A universal model runner that auto-detects the pipeline task of any Hugging Face model and routes inference requests to the correct typed endpoint.

Create once per model — the Hub metadata fetch only happens at construction time. Reuse the same instance for all subsequent calls to avoid repeated network round-trips.

Basic example — auto-detected task


huggingface:ModelRunner runner = new (hfClient, "facebook/bart-large-cnn");
io:println("Detected task: ", runner.getPipelineTag()); // "summarization"

json result = check runner.run(
    "Ballerina is a modern open-source language designed for cloud-native integration."
);
io:println(result); // [{"summary_text": "..."}]

Any model — zero boilerplate


// NER model
huggingface:ModelRunner ner = new (hfClient, "dslim/bert-base-NER");
json entities = check ner.run("WSO2 is based in Sri Lanka.");

// Embedding model
huggingface:ModelRunner emb = new (hfClient, "intfloat/multilingual-e5-large");
json vector = check emb.run("Cloud-native integration");

// Translation model
huggingface:ModelRunner xlat = new (hfClient, "Helsinki-NLP/opus-mt-en-fr");
json translated = check xlat.run("Hello, how are you?");

// Image generation
huggingface:ModelRunner img = new (hfClient, "stabilityai/stable-diffusion-xl-base-1.0");
byte[] png = check img.generateMedia("A robot writing Ballerina code");

Constructor

Creates a new ModelRunner for the given Hugging Face model.

Fetches the model's pipeline tag from the Hub to enable automatic task routing. If the Hub is unreachable or the model is unknown, the runner silently falls back to generic inference for all run() calls — no error is returned.

init (Client hfClient, string model)

hfClient Client - A configured Client instance

model string - Any Hugging Face model ID (e.g. "facebook/bart-large-cnn")

getPipelineTag

Isolated Function

function getPipelineTag() returns string

Returns the pipeline task detected from the Hub (e.g. "summarization"). Returns an empty string when the task could not be determined.

Return Type

string - The pipeline tag string, or "" if unknown

getModelId

Isolated Function

function getModelId() returns string

Returns the model ID this runner was constructed for.

Return Type

string - The model ID string

getAuthor

Isolated Function

function getAuthor() returns string?

Returns the model author/organisation cached at construction time, or ().

Return Type

string? - Author string or ()

getDownloads

Isolated Function

function getDownloads() returns int?

Returns the download count cached at construction time, or ().

Return Type

int? - Download count or ()

describe

Isolated Function

function describe() returns string

Returns a human-readable description of this runner — useful for debugging.

Return Type

string - A description string, e.g. "ModelRunner[model=gpt2, task=text-generation]"

run

Isolated Function

function run(string input) returns json|error

Run inference with a plain-text input and receive a JSON result.

The pipeline tag detected at construction time determines which typed endpoint is called. Unknown or unsupported tasks fall back to inferModel() automatically.

Pipeline tag	Endpoint used
`text-generation`, `text2text-generation`	`/hf-inference/models/{model}`
`fill-mask`	`.../fill-mask`
`text-classification`, `sentiment-analysis`	`.../text-classification`
`token-classification`, `ner`	`.../token-classification`
`feature-extraction`, `sentence-embeddings`	`.../feature-extraction`
`summarization`	`.../summarization`
`translation*` (any variant)	`.../translation`
`sentence-similarity`	`.../sentence-similarity`
`text-to-image`, `image-generation`	returns descriptive error — use generateMedia()
`text-to-speech`, `audio-generation`	returns descriptive error — use generateMedia()
(anything else)	generic inferModel()

For tasks that require structured input (question-answering, zero-shot, etc.), use runWithJson() instead.

Parameters

input string - Plain text input for the model

Return Type

json|error - JSON inference result, or an error

runWithJson

Isolated Function

function runWithJson(json payload) returns json|error

Run inference with a fully custom JSON payload and receive a JSON result.

Use this when the task requires structured input that cannot be expressed as a plain string:

Task	Example payload
Question Answering	`{inputs: {question: "What?", context: "..."}}`
Zero-Shot Classification	`{inputs: "...", parameters: {candidate_labels: [...]}}`
Sentence Similarity	`{inputs: {source_sentence: "...", sentences: [...]}}`
Any other model	Any valid JSON accepted by the model's endpoint


// Question answering
json answer = check runner.runWithJson({
    inputs: {question: "What is Ballerina?", context: "Ballerina is ..."}
});

// Zero-shot classification
json labels = check runner.runWithJson({
    inputs: "Ballerina is a programming language.",
    parameters: {candidate_labels: ["technology", "sports", "food"]}
});

Parameters

payload json - JSON payload to send to the model's inference endpoint

Return Type

json|error - Raw JSON response or an error

runBytes

Isolated Function

function runBytes(byte[] data, string contentType) returns json|error

Run inference on raw binary data (image or audio bytes) and receive a JSON result.

Routes to the correct typed endpoint based on the detected pipeline tag:

Pipeline tag	Returns
`image-classification`	`ImageClassificationResult[]` as JSON
`image-to-text`, `image-captioning`	`ImageToTextResult[]` as JSON
`automatic-speech-recognition`	`AutomaticSpeechRecognitionResponse` as JSON
(any other)	descriptive error — use runWithJson() or the typed client


byte[] imgBytes = check io:fileReadBytes("photo.jpg");
json result = check runner.runBytes(imgBytes, huggingface:IMAGE_JPEG);
io:println(result);

Parameters

data byte[] - Raw binary input (image or audio bytes)

contentType string (default IMAGE_JPEG) - MIME type of the binary data (default: image/jpeg)

Return Type

json|error - JSON inference result, or an error

generateMedia

Isolated Function

function generateMedia(string prompt) returns byte[]|error

Generate binary media (image or audio) from a text prompt.

Pipeline tag	Output
`text-to-image`, `image-generation`	PNG/JPEG image bytes
`text-to-speech`, `audio-generation`	WAV/MP3 audio bytes


byte[] png = check runner.generateMedia("A robot writing Ballerina code");
check io:fileWriteBytes("output.png", png);

Parameters

prompt string - Text prompt describing the media to generate

Return Type

byte[]|error - Raw binary bytes or an error

runImageFile

Isolated Function

function runImageFile(string filePath, string contentType) returns json|error

Run inference on an image loaded from a local file path.

Convenience wrapper around runBytes() that handles file I/O.

Parameters

filePath string - Path to the image file (JPEG, PNG, etc.)

contentType string (default IMAGE_JPEG) - MIME type of the image (default: image/jpeg)

Return Type

json|error - JSON inference result or an error

runImageUrl

Isolated Function

function runImageUrl(string imageUrl, string contentType) returns json|error

Run inference on an image fetched from a public URL.

Parameters

imageUrl string - Public URL of the image

contentType string (default IMAGE_JPEG) - MIME type of the image (default: image/jpeg)

Return Type

json|error - JSON inference result or an error

runAudioFile

Isolated Function

function runAudioFile(string filePath, string contentType) returns json|error

Transcribe audio loaded from a local file path (ASR models only).

Parameters

filePath string - Path to the audio file (WAV, FLAC, MP3, etc.)

contentType string (default AUDIO_FLAC) - MIME type of the audio (default: audio/flac)

Return Type

json|error - JSON AutomaticSpeechRecognitionResponse or an error

runAudioUrl

Isolated Function

function runAudioUrl(string audioUrl, string contentType) returns json|error

Transcribe audio fetched from a public URL (ASR models only).

Parameters

audioUrl string - Public URL of the audio file

contentType string (default AUDIO_FLAC) - MIME type of the audio (default: audio/flac)

Return Type

json|error - JSON AutomaticSpeechRecognitionResponse or an error

Clients

huggingface: Client

Isolated

Client for the Hugging Face Inference API.

Provides type-safe access to Hugging Face hosted models including chat completion, text generation, classification, embeddings, image generation, speech recognition, and more. Supports automatic retries for cold-starting models.


huggingface:Client hf = check new ({auth: {token: "<HF_TOKEN>"}});
ChatCompletionResponse resp = check hf->/v1/chat/completions.post({
    model: "Qwen/Qwen2.5-7B-Instruct",
    messages: [{role: "user", content: "Hello!"}]
});

Constructor

Initializes the Hugging Face Inference API client.

init (ConnectionConfig config, string serviceUrl, RetryConfig retryConfig)

config ConnectionConfig - Connection configuration including authentication credentials

serviceUrl string "https://router.huggingface.co" - Base URL of the Hugging Face Inference API

retryConfig RetryConfig {} - Retry settings for handling cold-starting models (HTTP 503)

post v1/chat/completions

Isolated FunctionResource Function

function post v1/chat/completions(ChatCompletionRequest payload, map<string|string[]> headers) returns ChatCompletionResponse|error

Generates a chat completion using a conversational model.

Parameters

payload ChatCompletionRequest - Chat completion request body containing messages and model ID

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ChatCompletionResponse|error - A ChatCompletionResponse with the generated reply, or an error

post v1/chat/completions/streamed

Isolated FunctionResource Function

function post v1/chat/completions/streamed(ChatCompletionRequest payload, map<string|string[]> headers) returns stream<ChatCompletionChunk, error?>|error

Streaming Chat Completion — returns all SSE chunks parsed from the complete response body.

Note: The Hugging Face Inference API response is processed in full before the stream is returned. All chunks are therefore available immediately upon return rather than arriving incrementally. Iterate the returned stream normally to access each token chunk.

Parameters

payload ChatCompletionRequest - Chat completion request body

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

stream<ChatCompletionChunk, error?>|error - A stream of ChatCompletionChunk records, or an error

post hf-inference/models/[string model]

Isolated FunctionResource Function

function post hf\-inference/models/[string model](TextGenerationRequest payload, map<string|string[]> headers) returns TextGenerationResult[]|error

Generates text from a prompt using a language model.

Parameters

payload TextGenerationRequest - Text generation request body with the prompt and parameters

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

TextGenerationResult[]|error - An array of TextGenerationResult, or an error

post hf-inference/models/[string model]/fill-mask

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/fill\-mask(FillMaskRequest payload, map<string|string[]> headers) returns FillMaskResult[]|error

Predicts the masked token(s) in a sentence (e.g., BERT-style fill-mask).

The input must contain a [MASK] token, e.g. "Paris is the [MASK] of France.".

Parameters

payload FillMaskRequest - Fill-mask request body containing the masked sentence

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

FillMaskResult[]|error - An array of FillMaskResult predictions sorted by score, or an error

post hf-inference/models/[string model]/text-classification

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/text\-classification(TextClassificationRequest payload, map<string|string[]> headers) returns ClassificationLabel[][]|error

Classifies text into predefined categories (e.g., sentiment analysis).

Parameters

payload TextClassificationRequest - Text classification request body

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ClassificationLabel[][]|error - A nested array of ClassificationLabel results, or an error

post hf-inference/models/[string model]/token-classification

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/token\-classification(TokenClassificationRequest payload, map<string|string[]> headers) returns TokenClassificationEntity[]|error

Performs token-level classification such as Named Entity Recognition (NER).

Parameters

payload TokenClassificationRequest - Token classification request body

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

TokenClassificationEntity[]|error - An array of TokenClassificationEntity records, or an error

post hf-inference/models/[string model]/feature-extraction

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/feature\-extraction(FeatureExtractionRequest payload, map<string|string[]> headers) returns float[]|error

Extracts feature embeddings from text using an embedding model.

Parameters

payload FeatureExtractionRequest - Feature extraction request body

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

float[]|error - A float array representing the embedding vector, or an error

post hf-inference/models/[string model]/sentence-similarity

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/sentence\-similarity(SentenceSimilarityRequest payload, map<string|string[]> headers) returns float[]|error

Computes similarity scores between a source sentence and a list of candidate sentences.

Parameters

payload SentenceSimilarityRequest - Sentence similarity request body with source and candidate sentences

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

float[]|error - A float array of similarity scores (one per candidate sentence), or an error

post hf-inference/models/[string model]/text-classification/batch

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/text\-classification/batch(string[]|BatchTextClassificationRequest payload, map<string|string[]> headers) returns ClassificationLabel[][]|error

Classifies multiple texts into predefined categories.

Parameters

payload string[]|BatchTextClassificationRequest - Batch text classification request body or array of strings

headers map<string|string[]> (default {}) - Optional HTTP headers

Return Type

ClassificationLabel[][]|error - A nested array of ClassificationLabel results for each input, or an error

post hf-inference/models/[string model]/feature-extraction/batch

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/feature\-extraction/batch(string[]|BatchFeatureExtractionRequest payload, map<string|string[]> headers) returns float[][]|error

Extracts feature embeddings from multiple texts.

Parameters

payload string[]|BatchFeatureExtractionRequest - Batch feature extraction request body or array of strings

headers map<string|string[]> (default {}) - Optional HTTP headers

Return Type

float[][]|error - An array of float arrays representing the embedding vectors, or an error

post hf-inference/models/[string model]/token-classification/batch

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/token\-classification/batch(string[]|BatchTokenClassificationRequest payload, map<string|string[]> headers) returns TokenClassificationEntity[][]|error

Performs token-level classification on multiple texts.

Parameters

payload string[]|BatchTokenClassificationRequest - Batch token classification request body or array of strings

headers map<string|string[]> (default {}) - Optional HTTP headers

Return Type

TokenClassificationEntity[][]|error - An array of TokenClassificationEntity arrays for each input, or an error

post hf-inference/models/[string model]/text-to-image

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/text\-to\-image(TextToImageRequest payload, map<string|string[]> headers) returns byte[]|error

Generates an image from a text prompt using a diffusion model.

Tip: Set timeout to 120 or higher in ConnectionConfig for large or high-step generations, as image synthesis can take 30–120 seconds.

Parameters

payload TextToImageRequest - Text-to-image request body with prompt and optional parameters

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

byte[]|error - Raw image bytes (typically PNG), or an error

post hf-inference/models/[string model]/text-to-speech

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/text\-to\-speech(TextToSpeechRequest payload, map<string|string[]> headers) returns byte[]|error

Synthesises speech audio from the provided text.

Returns raw audio bytes whose format depends on the model (commonly WAV or MP3).

Parameters

payload TextToSpeechRequest - Text-to-speech request body

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

byte[]|error - Raw audio bytes, or an error

post hf-inference/models/[string model]/question-answering

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/question\-answering(QuestionAnsweringRequest payload, map<string|string[]> headers) returns QuestionAnsweringResponse|error

Extracts an answer from a context paragraph given a question.

Parameters

payload QuestionAnsweringRequest - Question answering request body with question and context

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

QuestionAnsweringResponse|error - A QuestionAnsweringResponse with the extracted answer, or an error

post hf-inference/models/[string model]/summarization

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/summarization(SummarizationRequest payload, map<string|string[]> headers) returns SummarizationResult[]|error

Generates a summary of the given text.

Parameters

payload SummarizationRequest - Summarization request body with text and optional length parameters

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

SummarizationResult[]|error - An array of SummarizationResult records, or an error

post hf-inference/models/[string model]/translation

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/translation(TranslationRequest payload, map<string|string[]> headers) returns TranslationResult[]|error

Translates text from one language to another.

Parameters

payload TranslationRequest - Translation request body

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

TranslationResult[]|error - An array of TranslationResult records, or an error

post hf-inference/models/[string model]/zero-shot-classification

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/zero\-shot\-classification(ZeroShotClassificationRequest payload, map<string|string[]> headers) returns ZeroShotClassificationResponse|error

Classifies text against a set of candidate labels without prior training.

Parameters

payload ZeroShotClassificationRequest - Zero-shot classification request body with candidate labels

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ZeroShotClassificationResponse|error - A ZeroShotClassificationResponse with scores per label, or an error

post hf-inference/models/[string model]/image-classification

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/image\-classification(byte[] payload, string contentType, map<string|string[]> headers) returns ImageClassificationResult[]|error

Classifies an image provided as raw bytes.

Parameters

payload byte[] - Raw image bytes (JPEG, PNG, etc.)

contentType string (default IMAGE_JPEG) - Image MIME type (default: image/jpeg)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ImageClassificationResult[]|error - An array of ImageClassificationResult records, or an error

post hf-inference/models/[string model]/image-classification/file

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/image\-classification/file(string filePath, string contentType, map<string|string[]> headers) returns ImageClassificationResult[]|error

Classifies an image loaded from a local file path.

Parameters

filePath string - Absolute or relative path to the image file

contentType string (default IMAGE_JPEG) - Image MIME type (default: image/jpeg)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ImageClassificationResult[]|error - An array of ImageClassificationResult records, or an error

post hf-inference/models/[string model]/image-classification/url

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/image\-classification/url(string imageUrl, string contentType, map<string|string[]> headers) returns ImageClassificationResult[]|error

Classifies an image fetched from a public URL.

Parameters

imageUrl string - Public URL of the image to classify

contentType string (default IMAGE_JPEG) - Image MIME type (default: image/jpeg)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ImageClassificationResult[]|error - An array of ImageClassificationResult records, or an error

post hf-inference/models/[string model]/image-to-text

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/image\-to\-text(byte[] payload, string contentType, map<string|string[]> headers) returns ImageToTextResult[]|error

Generates a textual caption or description for an image provided as raw bytes.

Parameters

payload byte[] - Raw image bytes (JPEG, PNG, etc.)

contentType string (default IMAGE_JPEG) - Image MIME type (default: image/jpeg)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ImageToTextResult[]|error - An array of ImageToTextResult records, or an error

post hf-inference/models/[string model]/image-to-text/file

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/image\-to\-text/file(string filePath, string contentType, map<string|string[]> headers) returns ImageToTextResult[]|error

Generates a caption for an image loaded from a local file path.

Parameters

filePath string - Absolute or relative path to the image file

contentType string (default IMAGE_JPEG) - Image MIME type (default: image/jpeg)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ImageToTextResult[]|error - An array of ImageToTextResult records, or an error

post hf-inference/models/[string model]/image-to-text/url

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/image\-to\-text/url(string imageUrl, string contentType, map<string|string[]> headers) returns ImageToTextResult[]|error

Generates a caption for an image fetched from a public URL.

Parameters

imageUrl string - Public URL of the image to caption

contentType string (default IMAGE_JPEG) - Image MIME type (default: image/jpeg)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ImageToTextResult[]|error - An array of ImageToTextResult records, or an error

post hf-inference/models/[string model]/automatic-speech-recognition

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/automatic\-speech\-recognition(byte[] payload, string contentType, map<string|string[]> headers) returns AutomaticSpeechRecognitionResponse|error

Transcribes audio to text using a speech recognition model.

Parameters

payload byte[] - Raw audio bytes

contentType string (default AUDIO_FLAC) - Audio MIME type (default: audio/flac)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

AutomaticSpeechRecognitionResponse|error - An AutomaticSpeechRecognitionResponse with the transcribed text, or an error

post hf-inference/models/[string model]/automatic-speech-recognition/file

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/automatic\-speech\-recognition/file(string filePath, string contentType, map<string|string[]> headers) returns AutomaticSpeechRecognitionResponse|error

Transcribes audio loaded from a local file path.

Parameters

filePath string - Absolute or relative path to the audio file

contentType string (default AUDIO_FLAC) - Audio MIME type (default: audio/flac)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

AutomaticSpeechRecognitionResponse|error - An AutomaticSpeechRecognitionResponse with the transcribed text, or an error

post hf-inference/models/[string model]/automatic-speech-recognition/url

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/automatic\-speech\-recognition/url(string audioUrl, string contentType, map<string|string[]> headers) returns AutomaticSpeechRecognitionResponse|error

Transcribes audio fetched from a public URL.

Parameters

audioUrl string - Public URL of the audio file to transcribe

contentType string (default AUDIO_FLAC) - Audio MIME type (default: audio/flac)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

AutomaticSpeechRecognitionResponse|error - An AutomaticSpeechRecognitionResponse with the transcribed text, or an error

Enums

huggingface: AudioContentType

Supported audio content types for speech recognition and synthesis.

Members

AUDIO_FLAC

AUDIO_WAV

AUDIO_MPEG

AUDIO_OGG

AUDIO_WEBM

AUDIO_M4A

huggingface: ImageContentType

Supported image content types for vision tasks.

Members

IMAGE_JPEG

IMAGE_PNG

IMAGE_WEBP

IMAGE_BMP

IMAGE_GIF

IMAGE_TIFF

Records

huggingface: AutomaticSpeechRecognitionResponse

Response from the automatic speech recognition endpoint.

Fields

text? string - The transcribed text from the audio input

huggingface: BatchFeatureExtractionRequest

Request body for batch feature extraction.

Fields

inputs string[] - Array of input strings to process

huggingface: BatchTextClassificationRequest

Request body for batch text classification.

Fields

inputs string[] - Array of input strings to process

huggingface: BatchTokenClassificationRequest

Request body for batch token classification.

Fields

inputs string[] - Array of input strings to process

huggingface: ChatCompletionChoice

A single completion choice returned by the chat API.

Fields

finishReason? string - Why the model stopped generating (e.g., "stop", "length")

index? int - The index of this choice in the list of choices

message? ChatMessage - The generated message content

huggingface: ChatCompletionChunk

A single chunk in a streaming chat completion response.

Fields

id? string - Unique identifier shared across all chunks of the same completion

'object? string - The object type (typically "chat.completion.chunk")

created? int - Unix timestamp when the chunk was created

model? string - The model that generated this chunk

choices? ChatCompletionChunkChoice[] - The list of chunk choices

huggingface: ChatCompletionChunkChoice

A single choice within a streaming chat completion chunk.

Fields

index? int - The index of this choice

delta? ChatCompletionChunkDelta - The incremental content for this chunk

finishReason? string? - Present only in the final chunk (e.g., "stop")

huggingface: ChatCompletionChunkDelta

Delta content in a streaming chat completion chunk.

Fields

role? string - The role of the message author (present only in the first chunk)

content? string - A token fragment of the generated content

huggingface: ChatCompletionRequest

Request body for the chat completion endpoint.

Fields

model string - The model ID to use (e.g., "Qwen/Qwen2.5-7B-Instruct")

messages ChatMessage[] - The conversation history as an array of messages

maxTokens? int - Maximum number of tokens to generate

temperature? float - Sampling temperature (0.0 = deterministic, higher = more random)

topP? float - Nucleus sampling probability mass (e.g., 0.9 means top-90% tokens are sampled)

stop? string[]|string - One or more sequences at which to stop generation

seed? int - Random seed for reproducible outputs

frequencyPenalty? float - Penalise token frequency to reduce repetition (range: -2.0 to 2.0)

presencePenalty? float - Penalise new tokens based on whether they appear in the text so far (range: -2.0 to 2.0)

huggingface: ChatCompletionResponse

Response from the chat completion endpoint.

Fields

id? string - Unique identifier for the completion

model? string - The model that generated the response

created? int - Unix timestamp when the completion was created

choices? ChatCompletionChoice[] - The list of generated completion choices

usage? UsageStats - Token usage statistics for this request

huggingface: ChatMessage

A single message in a chat conversation.

Fields

role string - The role of the message author (e.g., "user", "assistant", "system")

content string - The text content of the message

huggingface: ClassificationLabel

A classification label with its confidence score.

Fields

score? float - Confidence score between 0.0 and 1.0

label? string - The predicted label name

huggingface: ConnectionConfig

Closed record

Provides configurations for controlling the behaviours when communicating with the Hugging Face Inference API.

Fields

auth BearerTokenConfig - Bearer token configuration for API authentication

httpVersion HttpVersion(default http:HTTP_2_0) - HTTP protocol version (default: HTTP/2)

http1Settings ClientHttp1Settings(default {}) - HTTP/1.x specific configurations

http2Settings ClientHttp2Settings(default {}) - HTTP/2 specific configurations

timeout decimal(default 60) - Request timeout in seconds (default: 60; increase for image generation which can take 1–2 min)

forwarded string(default "disable") - Handling mode for Forwarded/X-Forwarded headers

followRedirects? FollowRedirects - Redirect following configuration

poolConfig? PoolConfiguration - Connection pool configuration

cache CacheConfig(default {}) - HTTP response cache configuration

compression Compression(default http:COMPRESSION_AUTO) - Request/response compression setting

circuitBreaker? CircuitBreakerConfig - Circuit breaker configuration for fault tolerance

retryConfig? RetryConfig - HTTP-level retry configuration (separate from model loading retries)

cookieConfig? CookieConfig - Cookie handling configuration

responseLimits ResponseLimitConfigs(default {}) - Response size limit configurations

secureSocket? ClientSecureSocket - SSL/TLS configuration for HTTPS connections

proxy? ProxyConfig - HTTP proxy configuration

socketConfig ClientSocketConfig(default {}) - Low-level socket configuration

validation boolean(default true) - Whether to validate constraints on request/response payloads

laxDataBinding boolean(default true) - Whether to use relaxed data binding for responses

waitForModel boolean(default false) - When true, sends x-wait-for-model: true on every request so the server waits for a cold model to load instead of returning HTTP 503. This eliminates most retry cycles at the cost of a longer initial request. Defaults to false.

huggingface: ConversationSnapshot

Closed record

A snapshot of the current conversation state.

Fields

history ChatMessage[] - All messages in the conversation including system, user, and assistant turns

model string - The model being used for this conversation

turnCount int - Number of user/assistant exchange pairs

huggingface: FeatureExtractionRequest

Request body for the feature extraction (embeddings) endpoint.

Fields

inputs string - The text to generate embeddings for

huggingface: FillMaskRequest

Request body for the fill-mask endpoint.

Fields

inputs string - A string with a [MASK] token to be filled in (e.g., "Paris is the [MASK] of France.")

huggingface: FillMaskResult

A single fill-mask prediction result.

Fields

score? float - Confidence score for this candidate token

token? int - Token ID of the predicted token

tokenStr? string - String form of the predicted token

sequence? string - The full input string with [MASK] replaced by this prediction

huggingface: ImageClassificationResult

A single image classification result.

Fields

score? float - Confidence score for the predicted class

label? string - The predicted class label

huggingface: ImageToTextResult

A single image-to-text (captioning) result.

Fields

generatedText? string - The generated text description of the image

huggingface: ModelAvailability

Closed record

Summary of model availability for inference.

Fields

modelId string - The model ID checked

available boolean - Whether the model is available on the Inference API

pipelineTag string? - The task type this model performs (e.g. "text-classification")

downloads int? - Number of downloads — indicates popularity

huggingface: ModelInfo

Metadata about a Hugging Face model retrieved from the Hub API.

Fields

modelId? string - The unique model identifier

pipelineTag? string - The primary task category of the model

'private? boolean - Whether the model is private

downloads? int - Total download count

likes? int - Total likes count

tags? string[] - List of tags associated with the model

author? string - Model author or organization

createdAt? string - Model creation timestamp

lastModified? string - Model last modified timestamp

huggingface: QuestionAnsweringInputs

The question and context pair for question answering.

Fields

question string - The question to answer

context string - The context paragraph from which to extract the answer

huggingface: QuestionAnsweringRequest

Request body for the question answering endpoint.

Fields

inputs QuestionAnsweringInputs - The question and context pair

huggingface: QuestionAnsweringResponse

Response from the question answering endpoint.

Fields

score? float - Confidence score of the extracted answer

answer? string - The extracted answer text

'start? int - Start character offset of the answer in the context

end? int - End character offset of the answer in the context

huggingface: RagConfig

Closed record

Configuration for the RAG pipeline.

Fields

embeddingModel string(default "intfloat/multilingual-e5-large") - Embedding model ID (default: intfloat/multilingual-e5-large)

generationModel string(default "Qwen/Qwen2.5-7B-Instruct") - Generation model ID (default: Qwen/Qwen2.5-7B-Instruct)

topK int(default 3) - Number of top documents to use as context (default: 3)

similarityThreshold float(default 0.0) - Minimum cosine similarity score to include a document (default: 0.0)

systemPrompt string(default "") - Optional system prompt to guide the generation model

maxTokens int(default 300) - Maximum tokens for the generated answer (default: 300)

huggingface: RagDocument

Closed record

A document with its content and optional metadata for RAG operations.

Fields

id string - Unique identifier for the document

content string - The document text content used for embedding and context

metadata? map<string> - Optional key-value metadata (e.g., source URL, author)

huggingface: RagResult

Closed record

Result from a RAG query including the answer and source documents used.

Fields

answer string - The generated answer grounded in the source documents

sources RagDocument[] - The top-K most relevant documents used as context

scores float[] - Cosine similarity scores corresponding to each source document

huggingface: RetryConfig

Closed record

Configuration for automatic retry behaviour when a model is cold-starting (HTTP 503).

Validation rules (enforced at Client initialisation):

initialDelay must be greater than 0.
maxRetries must be at least 1.
initialDelay must not exceed maxDelay.

Fields

maxRetries int(default 5) - Maximum number of retry attempts (default: 5)

initialDelay decimal(default 2.0) - Initial delay in seconds before the first retry (default: 2.0)

maxDelay decimal(default 30.0) - Maximum delay in seconds between retries after exponential backoff (default: 30.0)

huggingface: SentenceSimilarityInputs

Input structure for the sentence similarity endpoint.

Fields

source_sentence string - The reference sentence to compare against

sentences string[] - The candidate sentences to score against the source

huggingface: SentenceSimilarityRequest

Request body for the sentence similarity endpoint.

Fields

inputs SentenceSimilarityInputs - The source sentence and candidate sentences to compare

huggingface: SummarizationParameters

Parameters for controlling summarization behaviour.

Fields

minLength? int - Minimum length of the generated summary in tokens

maxLength? int - Maximum length of the generated summary in tokens

huggingface: SummarizationRequest

Request body for the summarization endpoint.

Fields

inputs string - The text to summarize

parameters? SummarizationParameters - Optional summarization parameters

huggingface: SummarizationResult

A single summarization result.

Fields

summaryText? string - The generated summary text

huggingface: TextClassificationRequest

Request body for the text classification endpoint.

Fields

inputs string - The text to classify

huggingface: TextGenerationParameters

Parameters for controlling text generation behaviour.

Fields

maxNewTokens? int - Maximum number of new tokens to generate

temperature? float - Sampling temperature (0.0 = deterministic, higher = more random)

returnFullText? boolean - If true, returns the prompt concatenated with the generated text

doSample? boolean - Whether to use sampling; if false uses greedy decoding

topK? int - Keep only the top-K tokens with highest probability for sampling

topP? float - Nucleus sampling: keep tokens whose cumulative probability exceeds topP

repetitionPenalty? float - Penalty for repeating tokens (> 1.0 discourages repetition)

huggingface: TextGenerationRequest

Request body for the text generation endpoint.

Fields

inputs string - The text prompt to continue generating from

parameters? TextGenerationParameters - Optional generation parameters

huggingface: TextGenerationResult

A single text generation result.

Fields

generatedText? string - The generated continuation text

huggingface: TextToImageParameters

Parameters for the text-to-image generation endpoint.

Fields

width? int - Width of the generated image in pixels

height? int - Height of the generated image in pixels

numInferenceSteps? int - Number of diffusion inference steps (higher = better quality, slower)

guidanceScale? float - Classifier-free guidance scale — higher values follow the prompt more closely

negativePrompt? string - Text describing what to exclude from the generated image

seed? int - Random seed for reproducible image generation

huggingface: TextToImageRequest

Request body for the text-to-image generation endpoint.

Fields

inputs string - The text prompt describing the image to generate

parameters? TextToImageParameters - Optional image generation parameters

huggingface: TextToSpeechRequest

Request body for the text-to-speech endpoint.

Fields

inputs string - The text to synthesise into speech

huggingface: TokenClassificationEntity

A named entity recognised by token classification.

Fields

score? float - Confidence score of the entity detection

entityGroup? string - The entity category (e.g., "PER", "ORG", "LOC")

'start? int - Start character offset of the entity in the input text

end? int - End character offset of the entity in the input text

word? string - The entity text as it appears in the input

huggingface: TokenClassificationRequest

Request body for the token classification (NER) endpoint.

Fields

inputs string - The text to analyse for named entities

huggingface: TranslationRequest

Request body for the translation endpoint.

Fields

inputs string - The text to translate

huggingface: TranslationResult

A single translation result.

Fields

translationText? string - The translated text

huggingface: UsageStats

Token usage statistics returned by the chat completion endpoint.

Fields

promptTokens? int - Number of tokens in the input messages

completionTokens? int - Number of tokens in the generated output

totalTokens? int - Total tokens consumed (prompt + completion)

huggingface: ZeroShotClassificationItem

A single zero-shot classification result with label and score.

Fields

label? string - The candidate label

score? float - Confidence score for this label

huggingface: ZeroShotClassificationRequest

Request body for the zero-shot classification endpoint.

Fields

inputs string - The text to classify

parameters ZeroShotClassificationRequestParameters - Parameters including candidate labels

huggingface: ZeroShotClassificationRequestParameters

Parameters for the zero-shot classification endpoint.

Fields

candidateLabels string[] - The list of candidate labels to classify against

Array types

huggingface: ZeroShotClassificationResponse

ZeroShotClassificationItem[]

ZeroShotClassificationResponse

Response from the zero-shot classification endpoint (array of scored labels).

Import

import avi0ra/huggingface;

Other versions

1.1.0

1.0.1 1.0.0 0.3.0 0.2.1

Metadata

Released date: 20 days ago

Version: 1.1.0

License: Apache-2.0

Compatibility

Platform: any

Ballerina version: 2201.13.1

GraalVM compatible: Yes

Pull count

Total: 186

Current verison: 3

Weekly downloads

Source repository

Keywords

huggingface

llm

inference

machine-learning

nlp

rag

streaming

Contributors

Dependencies

ballerina/regex/1.3.2 ballerina/os/1.10.1 ballerina/data.jsondata/1.1.3

Cookie policy

Delete policy

functions

classes

clients

enums

records

arrayTypes

avi0ra/huggingface

Hugging Face Connector for Ballerina

Supported AI Capabilities

Setup

1. Get a Hugging Face token

2. Add the connector

3. Configure the token

Quickstart

Chat Completion

Streaming Chat Completion

Stateful Chat Conversation

RAG Pipeline

Auto-Retry for Cold Models

Eliminating Cold-Start Latency with waitForModel

Multi-Modal Helpers

All Supported Operations

Universal Model Runner

ModelRunner method reference

One-shot convenience functions

Generic Inference Helper

Using Custom Models

Model Metadata & Batch Helpers

Changelog

1.1.0

1.0.0

0.3.0

0.2.0

Issues and contributions

Functions

autoGenerateMedia

Parameters

Return Type

autoRun

Parameters

Return Type

autoRunJson

Parameters

Return Type

batchInfer

Parameters

Return Type

checkModelAvailability

Parameters

Return Type

getModelInfo

Parameters

Return Type

inferModel

Parameters

Return Type

ragQuery

Basic usage

Parameters

Return Type

sentenceSimilarity

Example

Parameters

Return Type

Classes

huggingface: Conversation

Basic usage

With system prompt

Constructor

chat

Parameters

Return Type

getHistory

Return Type

snapshot

Return Type

reset

turnCount

Return Type

huggingface: ModelRunner

Eliminating Cold-Start Latency with `waitForModel`

`ModelRunner` method reference