Module huggingface

avi0ra/huggingface

1.0.0
Hugging Face Connector for Ballerina

Connects Ballerina applications to the Hugging Face Inference API for running state-of-the-art machine learning models hosted on the Hugging Face Hub.

This package provides a typed Client with strongly-typed request and response records for 12 AI/ML tasks, a generic inferModel helper for any model, a built-in RAG pipeline, stateful Conversation management, batch inference operations, streaming chat completions, automatic retry for cold-starting models, and multi-modal helpers for loading images and audio.


Supported AI Tasks

TaskResource PathExample Model
Chat Completion/v1/chat/completionskatanemo/Arch-Router-1.5B:hf-inference
Streaming Chat/v1/chat/completions/streamedkatanemo/Arch-Router-1.5B:hf-inference
Text Generation/hf-inference/models/{model}openai-community/gpt2
Text Classification/hf-inference/models/{model}/text-classificationBAAI/bge-reranker-v2-m3
Token Classification (NER)/hf-inference/models/{model}/token-classificationdslim/bert-base-NER
Feature Extraction/hf-inference/models/{model}/feature-extractionintfloat/multilingual-e5-large
Question Answering/hf-inference/models/{model}/question-answeringdeepset/roberta-base-squad2
Summarization/hf-inference/models/{model}/summarizationfacebook/bart-large-cnn
Translation/hf-inference/models/{model}/translationHelsinki-NLP/opus-mt-en-fr
Zero-Shot Classification/hf-inference/models/{model}/zero-shot-classificationfacebook/bart-large-mnli
Text-to-Image/hf-inference/models/{model}/text-to-imagestabilityai/stable-diffusion-xl-base-1.0
Image Classification/hf-inference/models/{model}/image-classificationgoogle/vit-base-patch16-224
Automatic Speech Recognition/hf-inference/models/{model}/automatic-speech-recognitionopenai/whisper-large-v3-turbo

Any model available on the Hugging Face Hub can be used — not just the examples above. Browse by task at huggingface.co/models.


Setup

1. Get a Hugging Face token

  1. Create a free account at huggingface.co
  2. Go to Settings → Access Tokens
  3. Click New token, choose Read type, enable Inference Providers under the Inference section
  4. Copy the token

2. Add the connector

Copy
bal add avi0ra/huggingface

3. Configure the token

In Config.toml:

Copy
token = "<YOUR_HF_TOKEN>"

Or via environment variable:

Copy
export HF_TOKEN="<YOUR_HF_TOKEN>"

Quickstart

Chat Completion

Copy
import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:ChatCompletionResponse resp = check hf->/v1/chat/completions.post({
        model: "katanemo/Arch-Router-1.5B:hf-inference",
        messages: [{role: "user", content: "What is Ballerina?"}],
        maxTokens: 100
    });

    io:println(resp?.choices);
}

Streaming Chat Completion

Tokens arrive in real time as the model generates them:

Copy
import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    stream<huggingface:ChatCompletionChunk, error?> tokenStream =
        check hf->/v1/chat/completions/streamed.post({
            model: "katanemo/Arch-Router-1.5B:hf-inference",
            messages: [{role: "user", content: "Count from 1 to 5."}],
            maxTokens: 50
        });

    check from huggingface:ChatCompletionChunk chunk in tokenStream do {
        huggingface:ChatCompletionChunkChoice[]? choices = chunk?.choices;
        if choices is huggingface:ChatCompletionChunkChoice[] && choices.length() > 0 {
            string? content = choices[0].delta?.content;
            if content is string {
                io:print(content);
            }
        }
    };
    io:println();
}

Stateful Chat Conversation

Maintain cross-turn chat history automatically using the Conversation class:

Copy
import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:Conversation conv = new (
        hf,
        "katanemo/Arch-Router-1.5B:hf-inference",
        systemPrompt = "You are a helpful assistant."
    );

    string reply1 = check conv.chat("What is Ballerina?");
    io:println("Assistant: ", reply1);

    string reply2 = check conv.chat("Who created it?");
    io:println("Assistant: ", reply2);

    io:println("Turns completed: ", conv.turnCount());
}

RAG Pipeline

End-to-end Retrieval Augmented Generation in a single function call:

Copy
import ballerina/io;
import ballerina/os;
import avi0ra/huggingface;

configurable string token = os:getEnv("HF_TOKEN");

public function main() returns error? {
    huggingface:Client hf = check new ({auth: {token}});

    huggingface:RagDocument[] documents = [
        {
            id: "doc1",
            content: "Ballerina is an open-source language for cloud-native integration by WSO2.",
            metadata: {"source": "ballerina.io"}
        },
        {
            id: "doc2",
            content: "WSO2 is a Sri Lankan technology company founded in 2005.",
            metadata: {"source": "wso2.com"}
        }
    ];

    huggingface:RagResult result = check huggingface:ragQuery(
        hf,
        "Who created Ballerina?",
        documents
    );

    io:println("Answer: ", result.answer);
    io:println("Sources used: ", result.sources.length());
    io:println("Top relevance score: ", result.scores[0]);
}

Auto-Retry for Cold Models

Models on the free tier go cold after inactivity and return 503 while loading. The connector retries automatically with exponential backoff:

Copy
huggingface:Client hf = check new (
    {auth: {token}},
    retryConfig = {
        maxRetries: 5,
        initialDelay: 2.0,
        maxDelay: 30.0
    }
);

Multi-Modal Helpers

Load images and audio from files or URLs directly:

Copy
// Image from file
huggingface:ImageClassificationResult[] res =
    check hf->/hf\-inference/models/["google/vit-base-patch16-224"]/image\-classification/file.post(
        "path/to/image.jpg"
    );

// Image from URL
huggingface:ImageClassificationResult[] res =
    check hf->/hf\-inference/models/["google/vit-base-patch16-224"]/image\-classification/url.post(
        "https://example.com/image.jpg"
    );

// Audio from file
huggingface:AutomaticSpeechRecognitionResponse resp =
    check hf->/hf\-inference/models/["openai/whisper-large-v3-turbo"]/automatic\-speech\-recognition/file.post(
        "path/to/audio.flac"
    );

All Supported Operations

Text Classification
Copy
huggingface:ClassificationLabel[][] res =
    check hf->/hf\-inference/models/["BAAI/bge-reranker-v2-m3"]/text\-classification.post({
        inputs: "Ballerina makes integration elegant!"
    });
io:println(res[0][0]?.label, " (", res[0][0]?.score, ")");
Token Classification (NER)
Copy
huggingface:TokenClassificationEntity[] entities =
    check hf->/hf\-inference/models/["dslim/bert-base-NER"]/token\-classification.post({
        inputs: "WSO2 is based in Sri Lanka."
    });
io:println(entities);
Feature Extraction (Embeddings)
Copy
float[] embeddings =
    check hf->/hf\-inference/models/["intfloat/multilingual-e5-large"]/feature\-extraction.post({
        inputs: "Ballerina cloud-native integration."
    });
io:println("Dimensions: ", embeddings.length());
Question Answering
Copy
huggingface:QuestionAnsweringResponse ans =
    check hf->/hf\-inference/models/["deepset/roberta-base-squad2"]/question\-answering.post({
        inputs: {
            question: "What is Ballerina?",
            context: "Ballerina is an open-source language for cloud-native integration by WSO2."
        }
    });
io:println(ans?.answer);
Summarization
Copy
huggingface:SummarizationResult[] res =
    check hf->/hf\-inference/models/["facebook/bart-large-cnn"]/summarization.post({
        inputs: "Ballerina is a modern open-source programming language designed for cloud-native integration...",
        parameters: {maxLength: 40, minLength: 15}
    });
io:println(res[0].summaryText);
Translation
Copy
huggingface:TranslationResult[] res =
    check hf->/hf\-inference/models/["Helsinki-NLP/opus-mt-en-fr"]/translation.post({
        inputs: "Hello, how are you?"
    });
io:println(res[0].translationText);
Zero-Shot Classification
Copy
huggingface:ZeroShotClassificationResponse res =
    check hf->/hf\-inference/models/["facebook/bart-large-mnli"]/zero\-shot\-classification.post({
        inputs: "Ballerina is a programming language for cloud integration.",
        parameters: {candidateLabels: ["technology", "sports", "politics"]}
    });
io:println(res);
Text-to-Image Generation
Copy
byte[] imageBytes =
    check hf->/hf\-inference/models/["stabilityai/stable-diffusion-xl-base-1.0"]/text\-to\-image.post({
        inputs: "A robot writing Ballerina code",
        parameters: {width: 512, height: 512, numInferenceSteps: 4}
    });
check io:fileWriteBytes("output.png", imageBytes);
Image Classification
Copy
byte[] payload = check io:fileReadBytes("image.jpg");
huggingface:ImageClassificationResult[] res =
    check hf->/hf\-inference/models/["google/vit-base-patch16-224"]/image\-classification.post(payload);
io:println(res[0]?.label, " (", res[0]?.score, ")");
Automatic Speech Recognition
Copy
huggingface:AutomaticSpeechRecognitionResponse resp =
    check hf->/hf\-inference/models/["openai/whisper-large-v3-turbo"]/automatic\-speech\-recognition/file.post(
        "audio.flac"
    );
io:println(resp?.text);

Generic Inference Helper

Call any Hugging Face model not covered by the typed operations:

Copy
json result = check huggingface:inferModel(
    hf,
    "openai-community/gpt2",
    {inputs: "Ballerina is designed for"}
);
io:println(result);

Using Custom Models

The connector works with any model on the Hugging Face Hub. Pass any model ID as long as it matches the task:

Copy
check hf->/hf\-inference/models/["Helsinki-NLP/opus-mt-en-si"]/translation.post({
    inputs: "Hello"
});

Browse available models by task:


Model Metadata & Batch Helpers

Retrieve model information and check inference availability:

Copy
huggingface:ModelInfo info = check huggingface:getModelInfo(hf, "gpt2");
io:println("Downloads: ", info.downloads);

huggingface:ModelAvailability availability = check huggingface:checkModelAvailability(hf, "gpt2");
io:println("Available for inference: ", availability.available);

Run batch inference efficiently:

Copy
json[] batchResults = check huggingface:batchInfer(
    hf,
    ["Hello world", "Ballerina is great"],
    "openai-community/gpt2"
);

Changelog

1.0.0

  • Added stateful Conversation class for automated chat history management
  • Added batch inference operations (batchInfer and typed/batch endpoints)
  • Added Model Metadata APIs (getModelInfo, checkModelAvailability)
  • Upgraded ragQuery to use batch embeddings and RagConfig

0.3.0

  • Added streaming chat completions via /v1/chat/completions/streamed
  • Added RAG pipeline helper ragQuery
  • Added automatic retry with exponential backoff for cold-starting models (503)
  • Added image classification from file path and URL
  • Added ASR from file path and URL
  • Introduced RetryConfig, RagDocument, RagResult, ImageContentType, AudioContentType types

0.2.x

  • Initial 12 AI/ML operations
  • Generic inferModel helper

Issues and contributions

Report issues at github.com/HasithaErandika/module-ballerinax-huggingface/issues.

For Ballerina community support: Discord · Stack Overflow #ballerina

Import

import avi0ra/huggingface;Copy

Other versions

See more...

Metadata

Released date: 3 months ago

Version: 1.0.0

License: Apache-2.0


Compatibility

Platform: any

Ballerina version: 2201.13.1

GraalVM compatible: Yes


Pull count

Total: 186

Current verison: 0


Weekly downloads


Source repository


Keywords

huggingface

ai

llm

inference

machine-learning

nlp

rag

streaming


Contributors