wso2/pgvector

0.3.5
PgVector - PostgreSQL Vector Database Client for Ballerina

Overview

This module provides functionality to interact with PostgreSQL using the pgvector extension, enabling vector similarity search capabilities in Ballerina applications. It supports storing and searching high-dimensional vectors alongside metadata, making it ideal for AI/ML applications, semantic search, and recommendation systems.

Prerequisites

Before using this module, ensure you have:

  1. PostgreSQL server with the pgvector extension installed.
  2. PostgreSQL database credentials.
  3. Knowledge of vector dimensions for your use case (e.g., 1536 for OpenAI embeddings).

To install the pgvector extension in PostgreSQL:

Copy
CREATE EXTENSION vector;

Features

  • Vector storage with metadata
  • Multiple similarity search types (Cosine, Euclidean, Inner Product)
  • Metadata filtering
  • Automatic index creation for performance
  • HNSW indexing support
  • Type-safe operations

Quick Start

1. Import the Module

Copy
import wso2/pgvector;

2. Initialize Vector Store

Copy
// Configure vector store
VectorStore vectorStore = check new(
    host = "localhost",
    user = "postgres",
    password = "password",
    database = "vectordb",
    vectorDimension = 1536  // dimension size of your vectors
);

3. Add Vectors

Copy
// Create vector data
VectorData data = {
    embedding: [0.1, 0.2, 0.3], // your vector embedding
    document: "Sample document text",
    metadata: {
        "name": "Example",
        "category": "test"
    }
};

// Add to vector store
VectorDataWithId result = check vectorStore.addVector(data);

4. Search Vectors

Copy
// Define search configuration
SearchConfig config = {
    similarityType: COSINE,  // COSINE, EUCLIDEAN, or INNER_PRODUCT
    limit: 10,
    threshold: 0.8,
    metadata: {
        "category": "test"
    }
};

// Perform similarity search
float[] queryVector = [0.1, 0.2, 0.3];
VectorDataWithId[] results = check vectorStore.searchVector(queryVector, config);

Types and Enums

SimilarityType

Defines the type of similarity measure to use:

Copy
public enum SimilarityType {
    EUCLIDEAN = "<->",    // L2 distance
    COSINE = "<=>",       // Cosine distance
    INNER_PRODUCT = "<#>" // Negative inner product
}

VectorData

Structure for vector data without an ID:

Copy
public type VectorData record {|
    float[] embedding;    // Vector embedding
    string document;      // Associated document text
    map<json> metadata?;  // Optional metadata
|};

VectorDataWithId

Structure for vector data with an ID:

Copy
public type VectorDataWithId record {|
    int id;
    *VectorData;
|};

SearchConfig

Configuration for vector search:

Copy
public type SearchConfig record {|
    SimilarityType similarityType = COSINE;
    int limit = 10;
    float? threshold = ();  // Optional threshold
    map<sql:Value> metadata = {};
|};

Advanced Usage

1. Custom Index Creation

The module automatically creates an HNSW index for better performance:

Copy
CREATE INDEX vector_store_embedding_idx
ON vector_store
USING hnsw(embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 100);

2. Metadata Filtering

Copy
// Search with metadata filters
map<sql:Value> metadata = {
    "category": "technology",
    "author": "John Doe"
};

VectorDataWithId[] results = check vectorStore.fetchVectorByMetadata(metadata);

3. Combining Search with Metadata

Copy
SearchConfig config = {
    similarityType: COSINE,
    limit: 5,
    threshold: 0.7,
    metadata: {
        "category": "technology"
    }
};

VectorDataWithId[] results = check vectorStore.searchVector(queryVector, config);

Error Handling

The module provides comprehensive error handling:

Copy
do {
    VectorDataWithId[] results = check vectorStore.searchVector(queryVector);
    // Process results
} on fail var e {
    // Handle errors
    log:printError("Error during vector search", e);
}

Best Practices

  • Close connections properly:

    Copy
    check vectorStore.close();
  • Use appropriate vector dimensions:

    • OpenAI embeddings: 1536
    • Custom embeddings: As per your model
  • Choose appropriate similarity measures:

    • COSINE: Normalized similarity (recommended for most cases)
    • EUCLIDEAN: Distance-based similarity
    • INNER_PRODUCT: Dot product similarity
  • Index optimization:

    • HNSW index is created by default
    • Adjust m and ef_construction parameters based on your needs

Examples

Complete Example

Copy
import wso2/pgvector;

public function main() returns error? {
    // Initialize store
    VectorStore store = check new(
        host = "localhost",
        user = "postgres",
        password = "password",
        database = "vectordb",
        vectorDimension = 1536
    );

    // Add vector
    VectorData data = {
        embedding: [0.1, 0.2, 0.3],
        document: "Sample text",
        metadata: {
            "category": "test"
        }
    };
    VectorDataWithId added = check store.addVector(data);

    // Search vectors
    SearchConfig config = {
        similarityType: COSINE,
        limit: 10
    };
    VectorDataWithId[] results = check store.searchVector([0.1, 0.2, 0.3], config);

    // Close connection
    check store.close();
}

Use Cases

This module is ideal for:

  • Semantic search applications
  • AI/ML applications requiring vector similarity search
  • Recommendation systems
  • Document similarity analysis
  • Image similarity search (using image embeddings)

Import

import wso2/pgvector;Copy

Metadata

Released date: 3 months ago

Version: 0.3.5


Compatibility

Platform: any

Ballerina version: 2201.11.0

GraalVM compatible: Yes


Pull count

Total: 214

Current verison: 4


Weekly downloads


Other versions

See more...