Note: There is a newer version (0.4.0) of this package available. Click here to view docs for the latest version.

wso2/pgvector

0.3.5

PgVector - PostgreSQL Vector Database Client for Ballerina

Overview

This module provides functionality to interact with PostgreSQL using the pgvector extension, enabling vector similarity search capabilities in Ballerina applications. It supports storing and searching high-dimensional vectors alongside metadata, making it ideal for AI/ML applications, semantic search, and recommendation systems.

Prerequisites

Before using this module, ensure you have:

PostgreSQL server with the pgvector extension installed.
PostgreSQL database credentials.
Knowledge of vector dimensions for your use case (e.g., 1536 for OpenAI embeddings).

To install the pgvector extension in PostgreSQL:


CREATE EXTENSION vector;

Features

Vector storage with metadata
Multiple similarity search types (Cosine, Euclidean, Inner Product)
Metadata filtering
Automatic index creation for performance
HNSW indexing support
Type-safe operations

Quick Start

1. Import the Module


import wso2/pgvector;

2. Initialize Vector Store


// Configure vector store
VectorStore vectorStore = check new(
    host = "localhost",
    user = "postgres",
    password = "password",
    database = "vectordb",
    vectorDimension = 1536  // dimension size of your vectors
);

3. Add Vectors


// Create vector data
VectorData data = {
    embedding: [0.1, 0.2, 0.3], // your vector embedding
    document: "Sample document text",
    metadata: {
        "name": "Example",
        "category": "test"
    }
};

// Add to vector store
VectorDataWithId result = check vectorStore.addVector(data);

4. Search Vectors


// Define search configuration
SearchConfig config = {
    similarityType: COSINE,  // COSINE, EUCLIDEAN, or INNER_PRODUCT
    limit: 10,
    threshold: 0.8,
    metadata: {
        "category": "test"
    }
};

// Perform similarity search
float[] queryVector = [0.1, 0.2, 0.3];
VectorDataWithId[] results = check vectorStore.searchVector(queryVector, config);

Types and Enums

SimilarityType

Defines the type of similarity measure to use:


public enum SimilarityType {
    EUCLIDEAN = "<->",    // L2 distance
    COSINE = "<=>",       // Cosine distance
    INNER_PRODUCT = "<#>" // Negative inner product
}

VectorData

Structure for vector data without an ID:


public type VectorData record {|
    float[] embedding;    // Vector embedding
    string document;      // Associated document text
    map<json> metadata?;  // Optional metadata
|};

VectorDataWithId

Structure for vector data with an ID:


public type VectorDataWithId record {|
    int id;
    *VectorData;
|};

SearchConfig

Configuration for vector search:


public type SearchConfig record {|
    SimilarityType similarityType = COSINE;
    int limit = 10;
    float? threshold = ();  // Optional threshold
    map<sql:Value> metadata = {};
|};

Advanced Usage

1. Custom Index Creation

The module automatically creates an HNSW index for better performance:


CREATE INDEX vector_store_embedding_idx
ON vector_store
USING hnsw(embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 100);

2. Metadata Filtering


// Search with metadata filters
map<sql:Value> metadata = {
    "category": "technology",
    "author": "John Doe"
};

VectorDataWithId[] results = check vectorStore.fetchVectorByMetadata(metadata);

3. Combining Search with Metadata


SearchConfig config = {
    similarityType: COSINE,
    limit: 5,
    threshold: 0.7,
    metadata: {
        "category": "technology"
    }
};

VectorDataWithId[] results = check vectorStore.searchVector(queryVector, config);

Error Handling

The module provides comprehensive error handling:


do {
    VectorDataWithId[] results = check vectorStore.searchVector(queryVector);
    // Process results
} on fail var e {
    // Handle errors
    log:printError("Error during vector search", e);
}

Best Practices

Close connections properly:


check vectorStore.close();

Use appropriate vector dimensions:
- OpenAI embeddings: 1536
- Custom embeddings: As per your model
Choose appropriate similarity measures:
- COSINE: Normalized similarity (recommended for most cases)
- EUCLIDEAN: Distance-based similarity
- INNER_PRODUCT: Dot product similarity
Index optimization:
- HNSW index is created by default
- Adjust m and ef_construction parameters based on your needs

Examples

Complete Example


import wso2/pgvector;

public function main() returns error? {
    // Initialize store
    VectorStore store = check new(
        host = "localhost",
        user = "postgres",
        password = "password",
        database = "vectordb",
        vectorDimension = 1536
    );

    // Add vector
    VectorData data = {
        embedding: [0.1, 0.2, 0.3],
        document: "Sample text",
        metadata: {
            "category": "test"
        }
    };
    VectorDataWithId added = check store.addVector(data);

    // Search vectors
    SearchConfig config = {
        similarityType: COSINE,
        limit: 10
    };
    VectorDataWithId[] results = check store.searchVector([0.1, 0.2, 0.3], config);

    // Close connection
    check store.close();
}

Use Cases

This module is ideal for:

Semantic search applications
AI/ML applications requiring vector similarity search
Recommendation systems
Document similarity analysis
Image similarity search (using image embeddings)

Classes

pgvector: VectorStore

Isolated

addVector

Isolated Function

function addVector(VectorData data) returns VectorDataWithId|error

Add data to vector store

Parameters

data VectorData - Data to added to vector storage

Return Type

VectorDataWithId|error - Data with ID if succesfull or an error

searchVector

Isolated Function

function searchVector(float[] queryVector, SearchConfig config) returns VectorDataWithDistance[]|error

Performs vector similarity search

Parameters

queryVector float[] - Query vector for similarity search

config SearchConfig (default {}) - Search configuration

Return Type

VectorDataWithDistance[]|error - List of vector data results or error

fetchVectorByMetadata

Isolated Function

function fetchVectorByMetadata(map<Value> metadata) returns VectorDataWithId[]|error

Fetch data by metadata filters

Parameters

metadata map<Value> (default {}) - Metadata filters. If empty, fetches all data.

Return Type

VectorDataWithId[]|error - List of vector data results or error

existsByMetadata

Isolated Function

function existsByMetadata(map<Value> metadata) returns boolean|error

Check if data exists based on metadata criteria

Parameters

metadata map<Value> - Metadata to check against

Return Type

boolean|error - Returns true if matching data exists, false otherwise, or error

execute

Isolated Function

function execute(ParameterizedQuery query) returns ExecutionResult|error

Parameters

query ParameterizedQuery -

close

Isolated Function

function close() returns error?

Close the database connection

Return Type

error? - Error if closing fails

deleteVectorsByMetadata

Isolated Function

function deleteVectorsByMetadata(map<Value> metadata) returns int|error

Delete vectors based on metadata criteria

Parameters

metadata map<Value> - Metadata criteria for deletion

Return Type

int|error - Number of rows deleted or error

Enums

pgvector: SimilarityType

Represents similarity search types for vector comparisons

Members

EUCLIDEAN_DISTANCE - L2 distance (Euclidean distance)

COSINE_DISTANCE - Cosine distance (1 - cosine similarity)

NEGATIVE_INNER_PRODUCT - Negative inner product

Records

pgvector: ConnectionConfig

Closed record

Connection configuration for the vector store

Fields

host string - Database host

user string - Database username

password string - Database password

database string - Database name

port int(default 5432) - Database port

pgvector: SearchConfig

Closed record

Search configuration for vector queries

Fields

similarityType SimilarityType(default COSINE_DISTANCE) - Type of similarity measure to use

'limit int(default 10) -

threshold float?(default ()) - Optional similarity threshold

metadata map<Value>(default {}) - Optional metadata filters

pgvector: VectorData

Closed record

Represents vector data without ID

Fields

embedding float[] - Vector embedding array

document string - Document text

metadata? map<json> - Optional metadata

pgvector: VectorDataWithDistance

Closed record

Represents vector data with the distance score

Fields

Fields Included from *VectorDataWithId

id int
embedding float[]
document string
metadata map<json>

distance float - Distance score

pgvector: VectorDataWithId

Closed record

Represents vector data with ID

Fields

id int - Unique identifier

Fields Included from *VectorData

embedding float[]
document string
metadata map<json>

Import

import wso2/pgvector;

Metadata

Released date: 3 months ago

Version: 0.3.5

Compatibility

Platform: any

Ballerina version: 2201.11.0

GraalVM compatible: Yes

Pull count

Total: 214

Current verison: 4

Weekly downloads

Other versions

0.4.0 0.3.12 0.3.11 0.3.10 0.3.9

Dependencies

ballerina/log/2.11.0 ballerina/sql/1.15.0 ballerinax/postgresql.driver/1.6.0

Cookie policy

Delete policy

classes

enums

records

wso2/pgvector

PgVector - PostgreSQL Vector Database Client for Ballerina

Overview

Prerequisites

Features

Quick Start

1. Import the Module

2. Initialize Vector Store

3. Add Vectors

4. Search Vectors

Types and Enums

SimilarityType

VectorData

VectorDataWithId

SearchConfig

Advanced Usage

1. Custom Index Creation

2. Metadata Filtering

3. Combining Search with Metadata

Error Handling

Best Practices

Examples

Complete Example

Use Cases

Classes

pgvector: VectorStore

addVector

Parameters

Return Type

searchVector

Parameters

Return Type

fetchVectorByMetadata

Parameters

Return Type

existsByMetadata

Parameters

Return Type

execute

Parameters

close

Return Type

deleteVectorsByMetadata

Parameters

Return Type

Enums

pgvector: SimilarityType

Members

Records

pgvector: ConnectionConfig

Fields

pgvector: SearchConfig

Fields

pgvector: VectorData

Fields

pgvector: VectorDataWithDistance

Fields

pgvector: VectorDataWithId

Fields