Note: There is a newer version (1.1.0) of this package available. Click here to view docs for the latest version.

Module huggingface

avi0ra/huggingface

1.0.1

Hugging Face Connector for Ballerina

Connects Ballerina applications to the Hugging Face Inference API for running state-of-the-art machine learning models hosted on the Hugging Face Hub.

This package provides a robust, typed Client equipped with strongly-typed request and response records supporting 14+ AI/ML operations. Built for production, it features a generic inferModel helper for unmapped models, a native Retrieval-Augmented Generation (RAG) pipeline, comprehensive stateful Conversation management, robust batch inference execution, real-time streaming chat completions, automatic retry heuristics (exponential backoff) for cold-starting models, and rich multi-modal helper utilities for sourcing images and audio paths.

Supported AI Capabilities

Capability	Resource Path	Example Model
Chat Completion	`/v1/chat/completions`	`katanemo/Arch-Router-1.5B:hf-inference`
Streaming Chat	`/v1/chat/completions/streamed`	`katanemo/Arch-Router-1.5B:hf-inference`
Text Generation	`/hf-inference/models/{model}`	`openai-community/gpt2`
Text Classification	`/hf-inference/models/{model}/text-classification`	`BAAI/bge-reranker-v2-m3`
Token Classification (NER)	`/hf-inference/models/{model}/token-classification`	`dslim/bert-base-NER`
Feature Extraction	`/hf-inference/models/{model}/feature-extraction`	`intfloat/multilingual-e5-large`
Question Answering	`/hf-inference/models/{model}/question-answering`	`deepset/roberta-base-squad2`
Summarization	`/hf-inference/models/{model}/summarization`	`facebook/bart-large-cnn`
Translation	`/hf-inference/models/{model}/translation`	`Helsinki-NLP/opus-mt-en-fr`
Zero-Shot Classification	`/hf-inference/models/{model}/zero-shot-classification`	`facebook/bart-large-mnli`
Text-to-Image	`/hf-inference/models/{model}/text-to-image`	`stabilityai/stable-diffusion-xl-base-1.0`
Image Classification	`/hf-inference/models/{model}/image-classification`	`google/vit-base-patch16-224`
Automatic Speech Recognition	`/hf-inference/models/{model}/automatic-speech-recognition`	`openai/whisper-large-v3-turbo`
Batch Operations	`/hf-inference/models/{model}/.../batch`	Any compatible model

Any model available on the Hugging Face Hub can be used — not just the examples above. Browse by task at huggingface.co/models.

Setup

1. Get a Hugging Face token

Create a free account at huggingface.co
Go to Settings → Access Tokens
Click New token, choose Read type, enable Inference Providers under the Inference section
Copy the token

2. Add the connector


bal add avi0ra/huggingface

3. Configure the token

In Config.toml:


token = "<YOUR_HF_TOKEN>"

Or via environment variable:


export HF_TOKEN="<YOUR_HF_TOKEN>"

Quickstart

Chat Completion


import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:ChatCompletionResponse resp = check hf->/v1/chat/completions.post({
        model: "katanemo/Arch-Router-1.5B:hf-inference",
        messages: [{role: "user", content: "What is Ballerina?"}],
        maxTokens: 100
    });

    io:println(resp?.choices);
}

Streaming Chat Completion

Tokens arrive in real time as the model generates them:


import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    stream<huggingface:ChatCompletionChunk, error?> tokenStream =
        check hf->/v1/chat/completions/streamed.post({
            model: "katanemo/Arch-Router-1.5B:hf-inference",
            messages: [{role: "user", content: "Count from 1 to 5."}],
            maxTokens: 50
        });

    check from huggingface:ChatCompletionChunk chunk in tokenStream do {
        huggingface:ChatCompletionChunkChoice[]? choices = chunk?.choices;
        if choices is huggingface:ChatCompletionChunkChoice[] && choices.length() > 0 {
            string? content = choices[0].delta?.content;
            if content is string {
                io:print(content);
            }
        }
    };
    io:println();
}

Stateful Chat Conversation

Maintain cross-turn chat history automatically using the Conversation class:


import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:Conversation conv = new (
        hf,
        "katanemo/Arch-Router-1.5B:hf-inference",
        systemPrompt = "You are a helpful assistant."
    );

    string reply1 = check conv.chat("What is Ballerina?");
    io:println("Assistant: ", reply1);

    string reply2 = check conv.chat("Who created it?");
    io:println("Assistant: ", reply2);

    io:println("Turns completed: ", conv.turnCount());
}

RAG Pipeline

End-to-end Retrieval Augmented Generation in a single function call:


import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:RagDocument[] documents = [
        {
            id: "doc1",
            content: "Ballerina is an open-source language for cloud-native integration by WSO2.",
            metadata: {"source": "ballerina.io"}
        },
        {
            id: "doc2",
            content: "WSO2 is a Sri Lankan technology company founded in 2005.",
            metadata: {"source": "wso2.com"}
        }
    ];

    huggingface:RagResult result = check huggingface:ragQuery(
        hf,
        "Who created Ballerina?",
        documents
    );

    io:println("Answer: ", result.answer);
    io:println("Sources used: ", result.sources.length());
    io:println("Top relevance score: ", result.scores[0]);
}

Auto-Retry for Cold Models

Models on the free tier go cold after inactivity and return 503 while loading. The connector retries automatically with exponential backoff:


huggingface:Client hf = check new (
    {auth: {token}},
    retryConfig = {
        maxRetries: 5,
        initialDelay: 2.0,
        maxDelay: 30.0
    }
);

Load images and audio from files or URLs directly:


// Image from file
huggingface:ImageClassificationResult[] res =
    check hf->/hf\-inference/models/["google/vit-base-patch16-224"]/image\-classification/file.post(
        "path/to/image.jpg"
    );

// Image from URL
huggingface:ImageClassificationResult[] res =
    check hf->/hf\-inference/models/["google/vit-base-patch16-224"]/image\-classification/url.post(
        "https://example.com/image.jpg"
    );

// Audio from file
huggingface:AutomaticSpeechRecognitionResponse resp =
    check hf->/hf\-inference/models/["openai/whisper-large-v3-turbo"]/automatic\-speech\-recognition/file.post(
        "path/to/audio.flac"
    );

All Supported Operations

Text Classification


huggingface:ClassificationLabel[][] res =
    check hf->/hf\-inference/models/["BAAI/bge-reranker-v2-m3"]/text\-classification.post({
        inputs: "Ballerina makes integration elegant!"
    });
io:println(res[0][0]?.label, " (", res[0][0]?.score, ")");

Token Classification (NER)


huggingface:TokenClassificationEntity[] entities =
    check hf->/hf\-inference/models/["dslim/bert-base-NER"]/token\-classification.post({
        inputs: "WSO2 is based in Sri Lanka."
    });
io:println(entities);

Feature Extraction (Embeddings)


float[] embeddings =
    check hf->/hf\-inference/models/["intfloat/multilingual-e5-large"]/feature\-extraction.post({
        inputs: "Ballerina cloud-native integration."
    });
io:println("Dimensions: ", embeddings.length());

Question Answering


huggingface:QuestionAnsweringResponse ans =
    check hf->/hf\-inference/models/["deepset/roberta-base-squad2"]/question\-answering.post({
        inputs: {
            question: "What is Ballerina?",
            context: "Ballerina is an open-source language for cloud-native integration by WSO2."
        }
    });
io:println(ans?.answer);

Summarization


huggingface:SummarizationResult[] res =
    check hf->/hf\-inference/models/["facebook/bart-large-cnn"]/summarization.post({
        inputs: "Ballerina is a modern open-source programming language designed for cloud-native integration...",
        parameters: {maxLength: 40, minLength: 15}
    });
io:println(res[0].summaryText);

Translation


huggingface:TranslationResult[] res =
    check hf->/hf\-inference/models/["Helsinki-NLP/opus-mt-en-fr"]/translation.post({
        inputs: "Hello, how are you?"
    });
io:println(res[0].translationText);

Zero-Shot Classification


huggingface:ZeroShotClassificationResponse res =
    check hf->/hf\-inference/models/["facebook/bart-large-mnli"]/zero\-shot\-classification.post({
        inputs: "Ballerina is a programming language for cloud integration.",
        parameters: {candidateLabels: ["technology", "sports", "politics"]}
    });
io:println(res);

Text-to-Image Generation


byte[] imageBytes =
    check hf->/hf\-inference/models/["stabilityai/stable-diffusion-xl-base-1.0"]/text\-to\-image.post({
        inputs: "A robot writing Ballerina code",
        parameters: {width: 512, height: 512, numInferenceSteps: 4}
    });
check io:fileWriteBytes("output.png", imageBytes);

Image Classification


byte[] payload = check io:fileReadBytes("image.jpg");
huggingface:ImageClassificationResult[] res =
    check hf->/hf\-inference/models/["google/vit-base-patch16-224"]/image\-classification.post(payload);
io:println(res[0]?.label, " (", res[0]?.score, ")");

Automatic Speech Recognition


huggingface:AutomaticSpeechRecognitionResponse resp =
    check hf->/hf\-inference/models/["openai/whisper-large-v3-turbo"]/automatic\-speech\-recognition/file.post(
        "audio.flac"
    );
io:println(resp?.text);

Generic Inference Helper

Call any Hugging Face model not covered by the typed operations:


json result = check huggingface:inferModel(
    hf,
    "openai-community/gpt2",
    {inputs: "Ballerina is designed for"}
);
io:println(result);

Using Custom Models

The connector works with any model on the Hugging Face Hub. Pass any model ID as long as it matches the task:


check hf->/hf\-inference/models/["Helsinki-NLP/opus-mt-en-si"]/translation.post({
    inputs: "Hello"
});

Browse available models by task:

Model Metadata & Batch Helpers

Retrieve model information and check inference availability:


huggingface:ModelInfo info = check huggingface:getModelInfo(hf, "gpt2");
io:println("Downloads: ", info.downloads);

huggingface:ModelAvailability availability = check huggingface:checkModelAvailability(hf, "gpt2");
io:println("Available for inference: ", availability.available);

Run batch inference efficiently:


json[] batchResults = check huggingface:batchInfer(
    hf,
    ["Hello world", "Ballerina is great"],
    "openai-community/gpt2"
);

Changelog

1.0.0

Added stateful Conversation class for automated chat history management.
Added batch inference operations (batchInfer and typed/batch endpoints).
Added Model Metadata APIs (getModelInfo, checkModelAvailability).
Upgraded ragQuery to use batch embeddings and RagConfig.

0.3.0

Added streaming chat completions via /v1/chat/completions/streamed.
Added RAG pipeline helper ragQuery (initial version).
Added automatic retry with exponential backoff for cold-starting models (503).
Added image classification from file path and URL.
Added ASR from file path and URL.
Introduced RetryConfig, RagDocument, RagResult, ImageContentType, AudioContentType types.
Improved generic inferModel helper with rich error handling.

0.2.0

Initial release of the avi0ra/huggingface connector.
Native support for 12 AI/ML inference operations.
Generic inferModel helper.

Issues and contributions

Report issues at github.com/HasithaErandika/module-ballerinax-huggingface/issues.

For Ballerina community support: Discord · Stack Overflow #ballerina

Functions

batchInfer

Isolated Function

function batchInfer(Client hfClient, string[] inputs, string model, map<string|string[]> headers) returns json[]|error

Perform batch inference on multiple inputs in a single API call.

More efficient than calling inferModel repeatedly when processing large numbers of inputs against the same model.

Parameters

hfClient Client - A configured Client instance

inputs string[] - Array of input strings to process in one request

model string - The model ID

headers map<string|string[]> (default {}) - Optional additional headers

Return Type

json[]|error - Array of JSON results one per input, or an error

checkModelAvailability

Isolated Function

function checkModelAvailability(Client hfClient, string model) returns ModelAvailability|error

Check whether a model is available on the Hugging Face Inference API.

Returns a ModelAvailability record with availability status and metadata. Does not throw an error if the model is not found — returns available: false.

Parameters

hfClient Client - A configured Client instance

model string - The model ID to check

Return Type

ModelAvailability|error - A ModelAvailability record, or an error if the Hub API fails

getModelInfo

Isolated Function

function getModelInfo(Client hfClient, string model) returns ModelInfo|error

Retrieve metadata for a model from the Hugging Face Hub API.

Parameters

hfClient Client - A configured Client instance

model string - The model ID (e.g. "gpt2", "facebook/bart-large-cnn")

Return Type

ModelInfo|error - A ModelInfo record with model details, or an error

inferModel

Isolated Function

function inferModel(Client hfClient, string model, json payload, map<string|string[]> headers) returns json|error

Perform a generic inference call against any Hugging Face model.

Useful when the model or endpoint does not match one of the strongly-typed operations in the generated client. The task is determined automatically by the model — no suffix needed in the URL.

Parameters

hfClient Client - A configured Client instance

model string - The model ID (e.g. "gpt2", "meta-llama/Llama-3.2-3B-Instruct")

payload json - JSON payload sent to the inference endpoint

headers map<string|string[]> (default {}) - Optional additional HTTP headers

Return Type

json|error - The raw JSON response or an error

ragQuery

Isolated Function

function ragQuery(Client hfClient, string query, RagDocument[] documents, RagConfig config) returns RagResult|error

Retrieval Augmented Generation (RAG) pipeline.

Embeds the query and all documents, ranks documents by cosine similarity, filters by similarity threshold, then generates a grounded answer using the top-K documents as context. Uses batch embedding for efficiency.

Basic usage


huggingface:RagDocument[] docs = [
    {id: "1", content: "Ballerina is created by WSO2."},
    {id: "2", content: "Python is used for data science."}
];
huggingface:RagResult result = check huggingface:ragQuery(hfClient, "Who made Ballerina?", docs);
io:println(result.answer);

Parameters

hfClient Client - A configured Client instance

query string - The natural language question to answer

documents RagDocument[] - The corpus of documents to search through

config RagConfig (default {}) - RAG configuration (models, topK, threshold, system prompt)

Return Type

RagResult|error - A RagResult with the answer, source documents, and scores, or an error

Classes

huggingface: Conversation

Isolated

A stateful conversation manager that maintains full chat history across turns.

Handles message history automatically so callers only need to provide the next user message and receive the assistant reply. Thread-safe via lock statements.

Basic usage


huggingface:Conversation conv = new (hfClient, "katanemo/Arch-Router-1.5B:hf-inference");
string reply1 = check conv.chat("What is Ballerina?");
string reply2 = check conv.chat("Who created it?");
io:println("Turns: ", conv.turnCount());
conv.reset();

With system prompt


huggingface:Conversation conv = new (
    hfClient,
    "katanemo/Arch-Router-1.5B:hf-inference",
    systemPrompt = "You are a helpful Ballerina programming assistant.",
    maxTokens = 150
);
string reply = check conv.chat("How do I write a REST service?");

Constructor

Creates a new Conversation with the given client and model.

init (Client hfClient, string model, string systemPrompt, int maxTokens)

hfClient Client - A configured Client instance

model string - The chat model ID to use for generation

systemPrompt string "" - Optional system prompt to set assistant behaviour

maxTokens int 200 - Maximum tokens per response (default: 200)

chat

Isolated Function

function chat(string userMessage) returns string|error

Send a user message and receive the assistant reply.

The conversation history is updated automatically after each call.

Parameters

userMessage string - The user message to send

Return Type

string|error - The assistant reply as a plain string, or an error

getHistory

Isolated Function

function getHistory() returns ChatMessage[]

Get the full conversation history including all turns.

Return Type

ChatMessage[] - Ordered array of all messages in the conversation

snapshot

Isolated Function

function snapshot() returns ConversationSnapshot

Get a snapshot of the current conversation state.

Return Type

ConversationSnapshot - A ConversationSnapshot record with history, model, and turn count

reset

Isolated Function

function reset()

Reset the conversation history.

If a system prompt was provided at initialization it is preserved. All user and assistant messages are cleared.

turnCount

Isolated Function

function turnCount() returns int

Get the number of completed user/assistant exchange pairs.

Return Type

int - Number of turns (each user message counts as one turn)

Clients

huggingface: Client

Isolated

Client for the Hugging Face Inference API.

Provides type-safe access to Hugging Face hosted models including chat completion, text generation, classification, embeddings, image generation, speech recognition, and more. Supports automatic retries for cold-starting models.


huggingface:Client hf = check new ({auth: {token: "<HF_TOKEN>"}});
ChatCompletionResponse resp = check hf->/v1/chat/completions.post({
    model: "meta-llama/Llama-3.2-3B-Instruct",
    messages: [{role: "user", content: "Hello!"}]
});

Constructor

Initializes the Hugging Face Inference API client.

init (ConnectionConfig config, string serviceUrl, RetryConfig retryConfig)

config ConnectionConfig - Connection configuration including authentication credentials

serviceUrl string "https://router.huggingface.co" - Base URL of the Hugging Face Inference API

retryConfig RetryConfig {} - Retry settings for handling cold-starting models (HTTP 503)

post v1/chat/completions

Isolated FunctionResource Function

function post v1/chat/completions(ChatCompletionRequest payload, map<string|string[]> headers) returns ChatCompletionResponse|error

Generates a chat completion using a conversational model.

Parameters

payload ChatCompletionRequest - Chat completion request body containing messages and model ID

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ChatCompletionResponse|error - A ChatCompletionResponse with the generated reply, or an error

post v1/chat/completions/streamed

Isolated FunctionResource Function

function post v1/chat/completions/streamed(ChatCompletionRequest payload, map<string|string[]> headers) returns stream<ChatCompletionChunk, error?>|error

Streaming Chat Completion — parses SSE chunks from the HuggingFace streaming response.

Note: Due to limitations of the Ballerina HTTP client, this implementation collects the full response body before parsing SSE data lines into a stream. All chunks are available immediately upon return; this is not true real-time streaming. A future version will use byte-stream-based chunked I/O for genuine incremental delivery.

Parameters

payload ChatCompletionRequest - Chat completion request body

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

stream<ChatCompletionChunk, error?>|error - A stream of ChatCompletionChunk records, or an error

post hf-inference/models/[string model]

Isolated FunctionResource Function

function post hf\-inference/models/[string model](TextGenerationRequest payload, map<string|string[]> headers) returns TextGenerationResult[]|error

Generates text from a prompt using a language model.

Parameters

payload TextGenerationRequest - Text generation request body with the prompt and parameters

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

TextGenerationResult[]|error - An array of TextGenerationResult, or an error

post hf-inference/models/[string model]/text-classification

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/text\-classification(TextClassificationRequest payload, map<string|string[]> headers) returns ClassificationLabel[][]|error

Classifies text into predefined categories (e.g., sentiment analysis).

Parameters

payload TextClassificationRequest - Text classification request body

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ClassificationLabel[][]|error - A nested array of ClassificationLabel results, or an error

post hf-inference/models/[string model]/token-classification

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/token\-classification(TokenClassificationRequest payload, map<string|string[]> headers) returns TokenClassificationEntity[]|error

Performs token-level classification such as Named Entity Recognition (NER).

Parameters

payload TokenClassificationRequest - Token classification request body

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

TokenClassificationEntity[]|error - An array of TokenClassificationEntity records, or an error

post hf-inference/models/[string model]/feature-extraction

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/feature\-extraction(FeatureExtractionRequest payload, map<string|string[]> headers) returns float[]|error

Extracts feature embeddings from text using an embedding model.

Parameters

payload FeatureExtractionRequest - Feature extraction request body

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

float[]|error - A float array representing the embedding vector, or an error

post hf-inference/models/[string model]/text-classification/batch

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/text\-classification/batch(string[]|BatchTextClassificationRequest payload, map<string|string[]> headers) returns ClassificationLabel[][]|error

Classifies multiple texts into predefined categories.

Parameters

payload string[]|BatchTextClassificationRequest - Batch text classification request body or array of strings

headers map<string|string[]> (default {}) - Optional HTTP headers

Return Type

ClassificationLabel[][]|error - A nested array of ClassificationLabel results for each input, or an error

post hf-inference/models/[string model]/feature-extraction/batch

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/feature\-extraction/batch(string[]|BatchFeatureExtractionRequest payload, map<string|string[]> headers) returns float[][]|error

Extracts feature embeddings from multiple texts.

Parameters

payload string[]|BatchFeatureExtractionRequest - Batch feature extraction request body or array of strings

headers map<string|string[]> (default {}) - Optional HTTP headers

Return Type

float[][]|error - An array of float arrays representing the embedding vectors, or an error

post hf-inference/models/[string model]/token-classification/batch

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/token\-classification/batch(string[]|BatchTokenClassificationRequest payload, map<string|string[]> headers) returns TokenClassificationEntity[][]|error

Performs token-level classification on multiple texts.

Parameters

payload string[]|BatchTokenClassificationRequest - Batch token classification request body or array of strings

headers map<string|string[]> (default {}) - Optional HTTP headers

Return Type

TokenClassificationEntity[][]|error - An array of TokenClassificationEntity arrays for each input, or an error

post hf-inference/models/[string model]/text-to-image

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/text\-to\-image(TextToImageRequest payload, map<string|string[]> headers) returns byte[]|error

Generates an image from a text prompt using a diffusion model.

Parameters

payload TextToImageRequest - Text-to-image request body with prompt and optional parameters

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

byte[]|error - Raw image bytes (typically PNG), or an error

post hf-inference/models/[string model]/question-answering

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/question\-answering(QuestionAnsweringRequest payload, map<string|string[]> headers) returns QuestionAnsweringResponse|error

Extracts an answer from a context paragraph given a question.

Parameters

payload QuestionAnsweringRequest - Question answering request body with question and context

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

QuestionAnsweringResponse|error - A QuestionAnsweringResponse with the extracted answer, or an error

post hf-inference/models/[string model]/summarization

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/summarization(SummarizationRequest payload, map<string|string[]> headers) returns SummarizationResult[]|error

Generates a summary of the given text.

Parameters

payload SummarizationRequest - Summarization request body with text and optional length parameters

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

SummarizationResult[]|error - An array of SummarizationResult records, or an error

post hf-inference/models/[string model]/translation

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/translation(TranslationRequest payload, map<string|string[]> headers) returns TranslationResult[]|error

Translates text from one language to another.

Parameters

payload TranslationRequest - Translation request body

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

TranslationResult[]|error - An array of TranslationResult records, or an error

post hf-inference/models/[string model]/zero-shot-classification

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/zero\-shot\-classification(ZeroShotClassificationRequest payload, map<string|string[]> headers) returns ZeroShotClassificationResponse|error

Classifies text against a set of candidate labels without prior training.

Parameters

payload ZeroShotClassificationRequest - Zero-shot classification request body with candidate labels

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ZeroShotClassificationResponse|error - A ZeroShotClassificationResponse with scores per label, or an error

post hf-inference/models/[string model]/image-classification

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/image\-classification(byte[] payload, string contentType, map<string|string[]> headers) returns ImageClassificationResult[]|error

Classifies an image provided as raw bytes.

Parameters

payload byte[] - Raw image bytes (JPEG, PNG, etc.)

contentType string (default IMAGE_JPEG) - Image MIME type (default: image/jpeg)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ImageClassificationResult[]|error - An array of ImageClassificationResult records, or an error

post hf-inference/models/[string model]/image-classification/file

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/image\-classification/file(string filePath, string contentType, map<string|string[]> headers) returns ImageClassificationResult[]|error

Classifies an image loaded from a local file path.

Parameters

filePath string - Absolute or relative path to the image file

contentType string (default IMAGE_JPEG) - Image MIME type (default: image/jpeg)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ImageClassificationResult[]|error - An array of ImageClassificationResult records, or an error

post hf-inference/models/[string model]/image-classification/url

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/image\-classification/url(string imageUrl, string contentType, map<string|string[]> headers) returns ImageClassificationResult[]|error

Classifies an image fetched from a public URL.

Parameters

imageUrl string - Public URL of the image to classify

contentType string (default IMAGE_JPEG) - Image MIME type (default: image/jpeg)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

ImageClassificationResult[]|error - An array of ImageClassificationResult records, or an error

post hf-inference/models/[string model]/automatic-speech-recognition

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/automatic\-speech\-recognition(byte[] payload, string contentType, map<string|string[]> headers) returns AutomaticSpeechRecognitionResponse|error

Transcribes audio to text using a speech recognition model.

Parameters

payload byte[] - Raw audio bytes

contentType string (default AUDIO_FLAC) - Audio MIME type (default: audio/flac)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

AutomaticSpeechRecognitionResponse|error - An AutomaticSpeechRecognitionResponse with the transcribed text, or an error

post hf-inference/models/[string model]/automatic-speech-recognition/file

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/automatic\-speech\-recognition/file(string filePath, string contentType, map<string|string[]> headers) returns AutomaticSpeechRecognitionResponse|error

Transcribes audio loaded from a local file path.

Parameters

filePath string - Absolute or relative path to the audio file

contentType string (default AUDIO_FLAC) - Audio MIME type (default: audio/flac)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

AutomaticSpeechRecognitionResponse|error - An AutomaticSpeechRecognitionResponse with the transcribed text, or an error

post hf-inference/models/[string model]/automatic-speech-recognition/url

Isolated FunctionResource Function

function post hf\-inference/models/[string model]/automatic\-speech\-recognition/url(string audioUrl, string contentType, map<string|string[]> headers) returns AutomaticSpeechRecognitionResponse|error

Transcribes audio fetched from a public URL.

Parameters

audioUrl string - Public URL of the audio file to transcribe

contentType string (default AUDIO_FLAC) - Audio MIME type (default: audio/flac)

headers map<string|string[]> (default {}) - Optional HTTP headers to include in the request

Return Type

AutomaticSpeechRecognitionResponse|error - An AutomaticSpeechRecognitionResponse with the transcribed text, or an error

Enums

huggingface: AudioContentType

Supported audio content types for speech recognition.

Members

AUDIO_FLAC

AUDIO_WAV

AUDIO_MPEG

AUDIO_OGG

AUDIO_WEBM

AUDIO_M4A

huggingface: ImageContentType

Supported image content types for vision tasks.

Members

IMAGE_JPEG

IMAGE_PNG

IMAGE_WEBP

IMAGE_BMP

IMAGE_GIF

IMAGE_TIFF

Records

huggingface: AutomaticSpeechRecognitionResponse

Response from the automatic speech recognition endpoint.

Fields

text? string - The transcribed text from the audio input

huggingface: BatchFeatureExtractionRequest

Request body for batch feature extraction.

Fields

inputs string[] - Array of input strings to process

huggingface: BatchTextClassificationRequest

Request body for batch text classification.

Fields

inputs string[] - Array of input strings to process

huggingface: BatchTokenClassificationRequest

Request body for batch token classification.

Fields

inputs string[] - Array of input strings to process

huggingface: ChatCompletionChoice

A single completion choice returned by the chat API.

Fields

finishReason? string - Why the model stopped generating (e.g., "stop", "length")

index? int - The index of this choice in the list of choices

message? ChatMessage - The generated message content

huggingface: ChatCompletionChunk

A single chunk in a streaming chat completion response.

Fields

id? string - Unique identifier shared across all chunks of the same completion

'object? string - The object type (typically "chat.completion.chunk")

created? int - Unix timestamp when the chunk was created

model? string - The model that generated this chunk

choices? ChatCompletionChunkChoice[] - The list of chunk choices

huggingface: ChatCompletionChunkChoice

A single choice within a streaming chat completion chunk.

Fields

index? int - The index of this choice

delta? ChatCompletionChunkDelta - The incremental content for this chunk

finishReason? string? - Present only in the final chunk (e.g., "stop")

huggingface: ChatCompletionChunkDelta

Delta content in a streaming chat completion chunk.

Fields

role? string - The role of the message author (present only in the first chunk)

content? string - A token fragment of the generated content

huggingface: ChatCompletionRequest

Request body for the chat completion endpoint.

Fields

maxTokens? int - Maximum number of tokens to generate

temperature? float - Sampling temperature (0.0 = deterministic, higher = more random)

messages ChatMessage[] - The conversation history as an array of messages

model string - The model ID to use (e.g., "katanemo/Arch-Router-1.5B:hf-inference")

huggingface: ChatCompletionResponse

Response from the chat completion endpoint.

Fields

id? string - Unique identifier for the completion

choices? ChatCompletionChoice[] - The list of generated completion choices

huggingface: ChatMessage

A single message in a chat conversation.

Fields

role string - The role of the message author (e.g., "user", "assistant", "system")

content string - The text content of the message

huggingface: ClassificationLabel

A classification label with its confidence score.

Fields

score? float - Confidence score between 0.0 and 1.0

label? string - The predicted label name

huggingface: ConnectionConfig

Closed record

Provides configurations for controlling the behaviours when communicating with the Hugging Face Inference API.

Fields

auth BearerTokenConfig - Bearer token configuration for API authentication

httpVersion HttpVersion(default http:HTTP_2_0) - HTTP protocol version (default: HTTP/2)

http1Settings ClientHttp1Settings(default {}) - HTTP/1.x specific configurations

http2Settings ClientHttp2Settings(default {}) - HTTP/2 specific configurations

timeout decimal(default 30) - Request timeout in seconds (default: 30)

forwarded string(default "disable") - Handling mode for Forwarded/X-Forwarded headers

followRedirects? FollowRedirects - Redirect following configuration

poolConfig? PoolConfiguration - Connection pool configuration

cache CacheConfig(default {}) - HTTP response cache configuration

compression Compression(default http:COMPRESSION_AUTO) - Request/response compression setting

circuitBreaker? CircuitBreakerConfig - Circuit breaker configuration for fault tolerance

retryConfig? RetryConfig - HTTP-level retry configuration (separate from model loading retries)

cookieConfig? CookieConfig - Cookie handling configuration

responseLimits ResponseLimitConfigs(default {}) - Response size limit configurations

secureSocket? ClientSecureSocket - SSL/TLS configuration for HTTPS connections

proxy? ProxyConfig - HTTP proxy configuration

socketConfig ClientSocketConfig(default {}) - Low-level socket configuration

validation boolean(default true) - Whether to validate constraints on request/response payloads

laxDataBinding boolean(default true) - Whether to use relaxed data binding for responses

huggingface: ConversationSnapshot

Closed record

A snapshot of the current conversation state.

Fields

history ChatMessage[] - All messages in the conversation including system, user, and assistant turns

model string - The model being used for this conversation

turnCount int - Number of user/assistant exchange pairs

huggingface: FeatureExtractionRequest

Request body for the feature extraction (embeddings) endpoint.

Fields

inputs string - The text to generate embeddings for

huggingface: ImageClassificationResult

A single image classification result.

Fields

score? float - Confidence score for the predicted class

label? string - The predicted class label

huggingface: ModelAvailability

Closed record

Summary of model availability for inference.

Fields

modelId string - The model ID checked

available boolean - Whether the model is available on the Inference API

pipelineTag string? - The task type this model performs (e.g. "text-classification")

downloads int? - Number of downloads — indicates popularity

huggingface: ModelInfo

Metadata about a Hugging Face model retrieved from the Hub API.

Fields

modelId? string - The unique model identifier

pipelineTag? string - The primary task category of the model

'private? boolean - Whether the model is private

downloads? int - Total download count

likes? int - Total likes count

tags? string[] - List of tags associated with the model

author? string - Model author or organization

createdAt? string - Model creation timestamp

lastModified? string - Model last modified timestamp

huggingface: QuestionAnsweringInputs

The question and context pair for question answering.

Fields

question string - The question to answer

context string - The context paragraph from which to extract the answer

huggingface: QuestionAnsweringRequest

Request body for the question answering endpoint.

Fields

inputs QuestionAnsweringInputs - The question and context pair

huggingface: QuestionAnsweringResponse

Response from the question answering endpoint.

Fields

score? float - Confidence score of the extracted answer

answer? string - The extracted answer text

'start? int - Start character offset of the answer in the context

end? int - End character offset of the answer in the context

huggingface: RagConfig

Closed record

Configuration for the RAG pipeline.

Fields

embeddingModel string(default "intfloat/multilingual-e5-large") - Embedding model ID (default: intfloat/multilingual-e5-large)

generationModel string(default "katanemo/Arch-Router-1.5B:hf-inference") - Generation model ID (default: katanemo/Arch-Router-1.5B:hf-inference)

topK int(default 3) - Number of top documents to use as context (default: 3)

similarityThreshold float(default 0.0) - Minimum cosine similarity score to include a document (default: 0.0)

systemPrompt string(default "") - Optional system prompt to guide the generation model

maxTokens int(default 300) - Maximum tokens for the generated answer (default: 300)

huggingface: RagDocument

Closed record

A document with its content and optional metadata for RAG operations.

Fields

id string - Unique identifier for the document

content string - The document text content used for embedding and context

metadata? map<string> - Optional key-value metadata (e.g., source URL, author)

huggingface: RagResult

Closed record

Result from a RAG query including the answer and source documents used.

Fields

answer string - The generated answer grounded in the source documents

sources RagDocument[] - The top-K most relevant documents used as context

scores float[] - Cosine similarity scores corresponding to each source document

huggingface: RetryConfig

Closed record

Configuration for automatic retry behaviour when a model is cold-starting (HTTP 503).

Fields

maxRetries int(default 5) - Maximum number of retry attempts (default: 5)

initialDelay decimal(default 2.0) - Initial delay in seconds before the first retry (default: 2.0)

maxDelay decimal(default 30.0) - Maximum delay in seconds between retries after exponential backoff (default: 30.0)

huggingface: SummarizationParameters

Parameters for controlling summarization behaviour.

Fields

minLength? int - Minimum length of the generated summary in tokens

maxLength? int - Maximum length of the generated summary in tokens

huggingface: SummarizationRequest

Request body for the summarization endpoint.

Fields

inputs string - The text to summarize

parameters? SummarizationParameters - Optional summarization parameters

huggingface: SummarizationResult

A single summarization result.

Fields

summaryText? string - The generated summary text

huggingface: TextClassificationRequest

Request body for the text classification endpoint.

Fields

inputs string - The text to classify

huggingface: TextGenerationParameters

Parameters for controlling text generation behaviour.

Fields

maxNewTokens? int - Maximum number of new tokens to generate

temperature? float - Sampling temperature (0.0 = deterministic, higher = more random)

returnFullText? boolean - If true, returns the prompt concatenated with the generated text

huggingface: TextGenerationRequest

Request body for the text generation endpoint.

Fields

inputs string - The text prompt to continue generating from

parameters? TextGenerationParameters - Optional generation parameters

huggingface: TextGenerationResult

A single text generation result.

Fields

generatedText? string - The generated continuation text

huggingface: TextToImageParameters

Parameters for the text-to-image generation endpoint.

Fields

width? int - Width of the generated image in pixels

height? int - Height of the generated image in pixels

numInferenceSteps? int - Number of diffusion inference steps (higher = better quality, slower)

huggingface: TextToImageRequest

Request body for the text-to-image generation endpoint.

Fields

inputs string - The text prompt describing the image to generate

parameters? TextToImageParameters - Optional image generation parameters

huggingface: TokenClassificationEntity

A named entity recognised by token classification.

Fields

score? float - Confidence score of the entity detection

entityGroup? string - The entity category (e.g., "PER", "ORG", "LOC")

'start? int - Start character offset of the entity in the input text

end? int - End character offset of the entity in the input text

word? string - The entity text as it appears in the input

huggingface: TokenClassificationRequest

Request body for the token classification (NER) endpoint.

Fields

inputs string - The text to analyse for named entities

huggingface: TranslationRequest

Request body for the translation endpoint.

Fields

inputs string - The text to translate

huggingface: TranslationResult

A single translation result.

Fields

translationText? string - The translated text

huggingface: ZeroShotClassificationItem

A single zero-shot classification result with label and score.

Fields

label? string - The candidate label

score? float - Confidence score for this label

huggingface: ZeroShotClassificationRequest

Request body for the zero-shot classification endpoint.

Fields

inputs string - The text to classify

parameters ZeroShotClassificationRequestParameters - Parameters including candidate labels

huggingface: ZeroShotClassificationRequestParameters

Parameters for the zero-shot classification endpoint.

Fields

candidateLabels string[] - The list of candidate labels to classify against

Array types

huggingface: ZeroShotClassificationResponse

ZeroShotClassificationItem[]

ZeroShotClassificationResponse

Response from the zero-shot classification endpoint (array of scored labels).

Import

import avi0ra/huggingface;

Other versions

1.1.0

1.0.1

1.0.0 0.3.0 0.2.1

Metadata

Released date: 3 months ago

Version: 1.0.1

License: Apache-2.0

Compatibility

Platform: any

Ballerina version: 2201.13.1

GraalVM compatible: Yes

Pull count

Total: 186

Current verison: 64

Weekly downloads

Source repository

Keywords

huggingface

llm

inference

machine-learning

nlp

rag

streaming

Contributors

Dependencies

ballerina/regex/1.3.2 ballerina/io/1.8.0 ballerina/os/1.10.1

Cookie policy

Delete policy

functions

classes

clients

enums

records

arrayTypes

avi0ra/huggingface

Hugging Face Connector for Ballerina

Supported AI Capabilities

Setup

1. Get a Hugging Face token

2. Add the connector

3. Configure the token

Quickstart

Chat Completion

Streaming Chat Completion

Stateful Chat Conversation

RAG Pipeline

Auto-Retry for Cold Models

Multi-Modal Helpers

All Supported Operations

Generic Inference Helper

Using Custom Models

Model Metadata & Batch Helpers

Changelog

1.0.0

0.3.0

0.2.0

Issues and contributions

Functions

batchInfer

Parameters

Return Type

checkModelAvailability

Parameters

Return Type

getModelInfo

Parameters

Return Type

inferModel

Parameters

Return Type

ragQuery

Basic usage

Parameters

Return Type

Classes

huggingface: Conversation

Basic usage

With system prompt

Constructor

chat

Parameters

Return Type

getHistory

Return Type

snapshot

Return Type

reset

turnCount

Return Type

Clients

huggingface: Client

Constructor

post v1/chat/completions

Parameters

Return Type

post v1/chat/completions/streamed

Parameters

Return Type

post hf-inference/models/[string model]

Parameters

Return Type

post hf-inference/models/[string model]/text-classification

Parameters

Return Type

post hf-inference/models/[string model]/token-classification

Parameters

Return Type

post hf-inference/models/[string model]/feature-extraction