Module huggingface

avi0ra/huggingface

1.1.0
Hugging Face Connector for Ballerina

Connects Ballerina applications to the Hugging Face Inference API for running state-of-the-art machine learning models hosted on the Hugging Face Hub.

This package provides a robust, typed Client equipped with strongly-typed request and response records supporting 17+ AI/ML operations. Built for production, it features a generic inferModel helper for unmapped models, a native Retrieval-Augmented Generation (RAG) pipeline, comprehensive stateful Conversation management, robust batch inference execution, streaming chat completions, server-side model wait (waitForModel), automatic retry heuristics (exponential backoff) for cold-starting models, and rich multi-modal helpers for images and audio.


Supported AI Capabilities

CapabilityResource PathExample Model
Chat Completion/v1/chat/completionsQwen/Qwen2.5-7B-Instruct
Streaming Chat/v1/chat/completions/streamedQwen/Qwen2.5-7B-Instruct
Text Generation/hf-inference/models/{model}openai-community/gpt2
Fill Mask/hf-inference/models/{model}/fill-maskgoogle-bert/bert-base-uncased
Text Classification/hf-inference/models/{model}/text-classificationdistilbert-base-uncased-finetuned-sst-2-english
Token Classification (NER)/hf-inference/models/{model}/token-classificationdslim/bert-base-NER
Feature Extraction/hf-inference/models/{model}/feature-extractionintfloat/multilingual-e5-large
Sentence Similarity/hf-inference/models/{model}/sentence-similaritysentence-transformers/all-MiniLM-L6-v2
Question Answering/hf-inference/models/{model}/question-answeringdeepset/roberta-base-squad2
Summarization/hf-inference/models/{model}/summarizationfacebook/bart-large-cnn
Translation/hf-inference/models/{model}/translationHelsinki-NLP/opus-mt-en-fr
Zero-Shot Classification/hf-inference/models/{model}/zero-shot-classificationfacebook/bart-large-mnli
Text-to-Image/hf-inference/models/{model}/text-to-imageblack-forest-labs/FLUX.1-schnell
Text-to-Speech/hf-inference/models/{model}/text-to-speechfacebook/mms-tts-eng
Image Classification/hf-inference/models/{model}/image-classificationgoogle/vit-base-patch16-224
Image Captioning (Image-to-Text)/hf-inference/models/{model}/image-to-textSalesforce/blip-image-captioning-large
Automatic Speech Recognition/hf-inference/models/{model}/automatic-speech-recognitionopenai/whisper-large-v3-turbo
Batch Operations/hf-inference/models/{model}/.../batchAny compatible model

Any model available on the Hugging Face Hub can be used — not just the examples above. Browse by task at huggingface.co/models.


Setup

1. Get a Hugging Face token

  1. Create a free account at huggingface.co
  2. Go to Settings → Access Tokens
  3. Click New token, choose Read type, enable Inference Providers under the Inference section
  4. Copy the token

2. Add the connector

Copy
bal add avi0ra/huggingface

3. Configure the token

In Config.toml:

Copy
token = "<YOUR_HF_TOKEN>"

Or via environment variable:

Copy
export HF_TOKEN="<YOUR_HF_TOKEN>"

Quickstart

Chat Completion

Copy
import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:ChatCompletionResponse resp = check hf->/v1/chat/completions.post({
        model: "Qwen/Qwen2.5-7B-Instruct",
        messages: [{role: "user", content: "What is Ballerina?"}],
        maxTokens: 100,
        topP: 0.9
    });

    io:println(resp?.choices);
    io:println("Tokens used: ", resp?.usage?.totalTokens);
}

Streaming Chat Completion

Iterate chunks from the parsed SSE response:

Copy
import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    stream<huggingface:ChatCompletionChunk, error?> tokenStream =
        check hf->/v1/chat/completions/streamed.post({
            model: "Qwen/Qwen2.5-7B-Instruct",
            messages: [{role: "user", content: "Count from 1 to 5."}],
            maxTokens: 50
        });

    check from huggingface:ChatCompletionChunk chunk in tokenStream do {
        huggingface:ChatCompletionChunkChoice[]? choices = chunk?.choices;
        if choices is huggingface:ChatCompletionChunkChoice[] && choices.length() > 0 {
            string? content = choices[0].delta?.content;
            if content is string {
                io:print(content);
            }
        }
    };
    io:println();
}

Note: The current implementation collects the full SSE response before returning the stream.

Stateful Chat Conversation

Maintain cross-turn chat history automatically using the Conversation class:

Copy
import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:Conversation conv = new (
        hf,
        "Qwen/Qwen2.5-7B-Instruct",
        systemPrompt = "You are a helpful assistant."
    );

    string reply1 = check conv.chat("What is Ballerina?");
    io:println("Assistant: ", reply1);

    string reply2 = check conv.chat("Who created it?");
    io:println("Assistant: ", reply2);

    io:println("Turns completed: ", conv.turnCount());
}

RAG Pipeline

End-to-end Retrieval Augmented Generation in a single function call:

Copy
import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:RagDocument[] documents = [
        {
            id: "doc1",
            content: "Ballerina is an open-source language for cloud-native integration by WSO2.",
            metadata: {"source": "ballerina.io"}
        },
        {
            id: "doc2",
            content: "WSO2 is a Sri Lankan technology company founded in 2005.",
            metadata: {"source": "wso2.com"}
        }
    ];

    huggingface:RagResult result = check huggingface:ragQuery(
        hf,
        "Who created Ballerina?",
        documents
    );

    io:println("Answer: ", result.answer);
    io:println("Sources used: ", result.sources.length());
    io:println("Top relevance score: ", result.scores[0]);
}

Auto-Retry for Cold Models

Models on the free tier go cold after inactivity and return 503 while loading. The connector retries automatically with exponential backoff:

Copy
huggingface:Client hf = check new (
    {auth: {token}},
    retryConfig = {
        maxRetries: 5,
        initialDelay: 2.0,
        maxDelay: 30.0
    }
);

Eliminating Cold-Start Latency with waitForModel

Copy
huggingface:Client hf = check new ({
    auth: {token},
    waitForModel: true,
    timeout: 120
});

Setting waitForModel: true sends x-wait-for-model: true on every request so the server waits for cold models to load instead of immediately returning 503. This eliminates most retry cycles. Pair it with a higher timeout to cover the model load time.

Multi-Modal Helpers

Load images and audio from files or URLs directly:

Copy
// Image from file
huggingface:ImageClassificationResult[] res =
    check hf->/hf\-inference/models/["google/vit-base-patch16-224"]/image\-classification/file.post(
        "path/to/image.jpg"
    );

// Image captioning from URL
huggingface:ImageToTextResult[] captions =
    check hf->/hf\-inference/models/["Salesforce/blip-image-captioning-large"]/image\-to\-text/url.post(
        "https://example.com/photo.jpg"
    );

// Audio from file
huggingface:AutomaticSpeechRecognitionResponse resp =
    check hf->/hf\-inference/models/["openai/whisper-large-v3-turbo"]/automatic\-speech\-recognition/file.post(
        "path/to/audio.flac"
    );

All Supported Operations

Fill Mask
Copy
huggingface:FillMaskResult[] res =
    check hf->/hf\-inference/models/["google-bert/bert-base-uncased"]/fill\-mask.post({
        inputs: "Paris is the [MASK] of France."
    });
io:println(res[0]?.tokenStr, " (", res[0]?.score, ")");
Text Classification
Copy
huggingface:ClassificationLabel[][] res =
    check hf->/hf\-inference/models/["distilbert-base-uncased-finetuned-sst-2-english"]/text\-classification.post({
        inputs: "Ballerina makes integration elegant!"
    });
io:println(res[0][0]?.label, " (", res[0][0]?.score, ")");
Token Classification (NER)
Copy
huggingface:TokenClassificationEntity[] entities =
    check hf->/hf\-inference/models/["dslim/bert-base-NER"]/token\-classification.post({
        inputs: "WSO2 is based in Sri Lanka."
    });
io:println(entities);
Feature Extraction (Embeddings)
Copy
float[] embeddings =
    check hf->/hf\-inference/models/["intfloat/multilingual-e5-large"]/feature\-extraction.post({
        inputs: "Ballerina cloud-native integration."
    });
io:println("Dimensions: ", embeddings.length());
Sentence Similarity
Copy
float[] scores =
    check hf->/hf\-inference/models/["sentence-transformers/all-MiniLM-L6-v2"]/sentence\-similarity.post({
        inputs: {
            source_sentence: "What is Ballerina?",
            sentences: ["Ballerina is a cloud-native language.", "Python is for data science."]
        }
    });
io:println("Scores: ", scores);
Question Answering
Copy
huggingface:QuestionAnsweringResponse ans =
    check hf->/hf\-inference/models/["deepset/roberta-base-squad2"]/question\-answering.post({
        inputs: {
            question: "What is Ballerina?",
            context: "Ballerina is an open-source language for cloud-native integration by WSO2."
        }
    });
io:println(ans?.answer);
Summarization
Copy
huggingface:SummarizationResult[] res =
    check hf->/hf\-inference/models/["facebook/bart-large-cnn"]/summarization.post({
        inputs: "Ballerina is a modern open-source programming language designed for cloud-native integration...",
        parameters: {maxLength: 40, minLength: 15}
    });
io:println(res[0].summaryText);
Translation
Copy
huggingface:TranslationResult[] res =
    check hf->/hf\-inference/models/["Helsinki-NLP/opus-mt-en-fr"]/translation.post({
        inputs: "Hello, how are you?"
    });
io:println(res[0].translationText);
Zero-Shot Classification
Copy
huggingface:ZeroShotClassificationResponse res =
    check hf->/hf\-inference/models/["facebook/bart-large-mnli"]/zero\-shot\-classification.post({
        inputs: "Ballerina is a programming language for cloud integration.",
        parameters: {candidateLabels: ["technology", "sports", "politics"]}
    });
io:println(res);
Text-to-Image Generation
Copy
byte[] imageBytes =
    check hf->/hf\-inference/models/["black-forest-labs/FLUX.1-schnell"]/text\-to\-image.post({
        inputs: "A robot writing Ballerina code"
    });
check io:fileWriteBytes("output.png", imageBytes);
Text-to-Speech
Copy
byte[] audioBytes =
    check hf->/hf\-inference/models/["facebook/mms-tts-eng"]/text\-to\-speech.post({
        inputs: "Hello from Ballerina!"
    });
check io:fileWriteBytes("speech.wav", audioBytes);
Image Classification
Copy
byte[] payload = check io:fileReadBytes("image.jpg");
huggingface:ImageClassificationResult[] res =
    check hf->/hf\-inference/models/["google/vit-base-patch16-224"]/image\-classification.post(payload);
io:println(res[0]?.label, " (", res[0]?.score, ")");
Image Captioning (Image-to-Text)
Copy
byte[] payload = check io:fileReadBytes("photo.jpg");
huggingface:ImageToTextResult[] captions =
    check hf->/hf\-inference/models/["Salesforce/blip-image-captioning-large"]/image\-to\-text.post(payload);
io:println(captions[0]?.generatedText);
Automatic Speech Recognition
Copy
huggingface:AutomaticSpeechRecognitionResponse resp =
    check hf->/hf\-inference/models/["openai/whisper-large-v3-turbo"]/automatic\-speech\-recognition/file.post(
        "audio.flac"
    );
io:println(resp?.text);

Universal Model Runner

The ModelRunner class works with any Hugging Face model. Provide the model ID and it auto-detects the pipeline task from the Hub, then routes every call to the correct typed endpoint.

Copy
// Summarisation — just name the model
huggingface:ModelRunner runner = new (hf, "facebook/bart-large-cnn");
io:println("Task: ", runner.getPipelineTag()); // "summarization"

json summary = check runner.run(
    "Ballerina is a modern open-source language designed for cloud-native integration."
);
io:println(summary);

// NER — same API, different model
huggingface:ModelRunner ner = new (hf, "dslim/bert-base-NER");
json entities = check ner.run("WSO2 is based in Sri Lanka.");

// Translation
huggingface:ModelRunner xlat = new (hf, "Helsinki-NLP/opus-mt-en-fr");
json translated = check xlat.run("Hello, how are you?");

// Question Answering — structured JSON input
huggingface:ModelRunner qa = new (hf, "deepset/roberta-base-squad2");
json answer = check qa.runWithJson({
    inputs: {question: "What is Ballerina?", context: "Ballerina is..."}
});

// Image classification from file
huggingface:ModelRunner clf = new (hf, "google/vit-base-patch16-224");
json labels = check clf.runImageFile("photo.jpg");

// Image generation — returns raw bytes
huggingface:ModelRunner img = new (hf, "black-forest-labs/FLUX.1-schnell");
byte[] png = check img.generateMedia("A robot writing Ballerina code");
check io:fileWriteBytes("output.png", png);

// ASR from audio file
huggingface:ModelRunner whisper = new (hf, "openai/whisper-large-v3-turbo");
json transcript = check whisper.runAudioFile("audio.flac", huggingface:AUDIO_FLAC);

ModelRunner method reference

MethodInputOutputAuto-routed tasks
run(string)Plain textjsontext-generation, fill-mask, text-classification, token-classification, feature-extraction, summarization, translation
runWithJson(json)Custom JSONjsonquestion-answering, zero-shot-classification, sentence-similarity, chat-completion
runBytes(byte[], contentType)Binaryjsonimage-classification, image-to-text, automatic-speech-recognition
generateMedia(string)Promptbyte[]text-to-image, text-to-speech
runImageFile(path)File pathjsonSame as runBytes
runImageUrl(url)Public URLjsonSame as runBytes
runAudioFile(path)File pathjsonASR
runAudioUrl(url)Public URLjsonASR

One-shot convenience functions

Copy
// Auto-detect + run in one line
json result = check huggingface:autoRun(hf, "facebook/bart-large-cnn", "Long article...");

// With structured JSON payload
json answer = check huggingface:autoRunJson(hf, "deepset/roberta-base-squad2", {
    inputs: {question: "What is Ballerina?", context: "Ballerina is..."}
});

// Binary media generation
byte[] png = check huggingface:autoGenerateMedia(
    hf, "black-forest-labs/FLUX.1-schnell", "A robot coding in Ballerina"
);

Tip: Reuse a ModelRunner instance for repeated calls — the Hub lookup only happens once at construction. autoRun() and friends perform the lookup on every call.


Generic Inference Helper

Call any Hugging Face model not covered by the typed operations. Now includes cold-start retry:

Copy
json result = check huggingface:inferModel(
    hf,
    "openai-community/gpt2",
    {inputs: "Ballerina is designed for"}
);
io:println(result);

Using Custom Models

The connector works with any model on the Hugging Face Hub. Pass any model ID as long as it matches the task:

Copy
check hf->/hf\-inference/models/["Helsinki-NLP/opus-mt-en-si"]/translation.post({
    inputs: "Hello"
});

Browse available models by task:


Model Metadata & Batch Helpers

Retrieve model information and check inference availability:

Copy
huggingface:ModelInfo info = check huggingface:getModelInfo(hf, "gpt2");
io:println("Downloads: ", info.downloads);

huggingface:ModelAvailability availability = check huggingface:checkModelAvailability(hf, "gpt2");
io:println("Available for inference: ", availability.available);

Run batch inference efficiently:

Copy
json[] batchResults = check huggingface:batchInfer(
    hf,
    ["Hello world", "Ballerina is great"],
    "openai-community/gpt2"
);

Compute semantic similarity with embedding-based scoring:

Copy
float[] scores = check huggingface:sentenceSimilarity(
    hf,
    "What is Ballerina?",
    ["Ballerina is a cloud-native language.", "Python is for data science."]
);
io:println("Scores: ", scores);

Changelog

1.1.0

  • Added ModelRunner class — universal model runner that auto-detects the pipeline task from the Hub and routes to the correct typed endpoint. Works with any Hugging Face model.
  • Added autoRun(), autoRunJson(), autoGenerateMedia() convenience functions.
  • Added waitForModel flag to ConnectionConfig — sends x-wait-for-model: true header to eliminate most cold-start 503 round-trips.
  • Added fill-mask endpoint for BERT-style masked token prediction.
  • Added image-to-text (captioning) endpoint with bytes, file, and URL variants.
  • Added text-to-speech endpoint for audio synthesis.
  • Added sentence-similarity typed endpoint.
  • Added sentenceSimilarity embedding-based helper function.
  • Added topP, stop, seed, frequencyPenalty, presencePenalty to ChatCompletionRequest.
  • Added doSample, topK, topP, repetitionPenalty to TextGenerationParameters.
  • Added guidanceScale, negativePrompt, seed to TextToImageParameters.
  • Added UsageStats type and usage field to ChatCompletionResponse.
  • Fixed inferModel and batchInfer to use postWithRetry — now honour retry config on 503.
  • Fixed RetryConfig validation: maxRetries >= 1 and initialDelay <= maxDelay enforced at init.
  • Fixed SSE streaming parser to handle \r\n line endings.
  • Increased default timeout in ConnectionConfig from 30 s to 60 s.

1.0.0

  • Added stateful Conversation class for automated chat history management.
  • Added batch inference operations (batchInfer and typed /batch endpoints).
  • Added Model Metadata APIs (getModelInfo, checkModelAvailability).
  • Upgraded ragQuery to use batch embeddings and RagConfig.

0.3.0

  • Added streaming chat completions via /v1/chat/completions/streamed.
  • Added RAG pipeline helper ragQuery (initial version).
  • Added automatic retry with exponential backoff for cold-starting models (503).
  • Added image classification from file path and URL.
  • Added ASR from file path and URL.
  • Introduced RetryConfig, RagDocument, RagResult, ImageContentType, AudioContentType types.
  • Improved generic inferModel helper with rich error handling.

0.2.0

  • Initial release of the avi0ra/huggingface connector.
  • Native support for 12 AI/ML inference operations.
  • Generic inferModel helper.

Issues and contributions

Report issues at github.com/HasithaErandika/module-ballerinax-huggingface/issues.

For Ballerina community support: Discord · Stack Overflow #ballerina

Import

import avi0ra/huggingface;Copy

Other versions

See more...

Metadata

Released date: 20 days ago

Version: 1.1.0

License: Apache-2.0


Compatibility

Platform: any

Ballerina version: 2201.13.1

GraalVM compatible: Yes


Pull count

Total: 186

Current verison: 3


Weekly downloads


Source repository


Keywords

huggingface

ai

llm

inference

machine-learning

nlp

rag

streaming


Contributors