Module pgvector
wso2/pgvector
PgVector - PostgreSQL Vector Database Client for Ballerina
Overview
This module provides functionality to interact with PostgreSQL using the pgvector
extension, enabling vector similarity search capabilities in Ballerina applications. It supports storing and searching high-dimensional vectors alongside metadata, making it ideal for AI/ML applications, semantic search, and recommendation systems.
Prerequisites
Before using this module, ensure you have:
- PostgreSQL server with the
pgvector
extension installed. - PostgreSQL database credentials.
- Knowledge of vector dimensions for your use case (e.g., 1536 for OpenAI embeddings).
To install the pgvector
extension in PostgreSQL:
CREATE EXTENSION vector;
Features
- Vector storage with metadata
- Multiple similarity search types (Cosine, Euclidean, Inner Product)
- Metadata filtering
- Automatic index creation for performance
- HNSW indexing support
- Type-safe operations
Quick Start
1. Import the Module
import wso2/pgvector;
2. Initialize Vector Store
// Configure vector store VectorStore vectorStore = check new( host = "localhost", user = "postgres", password = "password", database = "vectordb", vectorDimension = 1536 // dimension size of your vectors );
3. Add Vectors
// Create vector data VectorData data = { embedding: [0.1, 0.2, 0.3], // your vector embedding document: "Sample document text", metadata: { "name": "Example", "category": "test" } }; // Add to vector store VectorDataWithId result = check vectorStore.addVector(data);
4. Search Vectors
// Define search configuration SearchConfig config = { similarityType: COSINE, // COSINE, EUCLIDEAN, or INNER_PRODUCT limit: 10, threshold: 0.8, metadata: { "category": "test" } }; // Perform similarity search float[] queryVector = [0.1, 0.2, 0.3]; VectorDataWithId[] results = check vectorStore.searchVector(queryVector, config);
Types and Enums
SimilarityType
Defines the type of similarity measure to use:
public enum SimilarityType { EUCLIDEAN = "<->", // L2 distance COSINE = "<=>", // Cosine distance INNER_PRODUCT = "<#>" // Negative inner product }
VectorData
Structure for vector data without an ID:
public type VectorData record {| float[] embedding; // Vector embedding string document; // Associated document text map<json> metadata?; // Optional metadata |};
VectorDataWithId
Structure for vector data with an ID:
public type VectorDataWithId record {| int id; *VectorData; |};
SearchConfig
Configuration for vector search:
public type SearchConfig record {| SimilarityType similarityType = COSINE; int limit = 10; float? threshold = (); // Optional threshold map<sql:Value> metadata = {}; |};
Advanced Usage
1. Custom Index Creation
The module automatically creates an HNSW index for better performance:
CREATE INDEX vector_store_embedding_idx ON vector_store USING hnsw(embedding vector_cosine_ops) WITH (m = 24, ef_construction = 100);
2. Metadata Filtering
// Search with metadata filters map<sql:Value> metadata = { "category": "technology", "author": "John Doe" }; VectorDataWithId[] results = check vectorStore.fetchVectorByMetadata(metadata);
3. Combining Search with Metadata
SearchConfig config = { similarityType: COSINE, limit: 5, threshold: 0.7, metadata: { "category": "technology" } }; VectorDataWithId[] results = check vectorStore.searchVector(queryVector, config);
Error Handling
The module provides comprehensive error handling:
do { VectorDataWithId[] results = check vectorStore.searchVector(queryVector); // Process results } on fail var e { // Handle errors log:printError("Error during vector search", e); }
Best Practices
-
Close connections properly:
check vectorStore.close();
-
Use appropriate vector dimensions:
- OpenAI embeddings: 1536
- Custom embeddings: As per your model
-
Choose appropriate similarity measures:
- COSINE: Normalized similarity (recommended for most cases)
- EUCLIDEAN: Distance-based similarity
- INNER_PRODUCT: Dot product similarity
-
Index optimization:
- HNSW index is created by default
- Adjust
m
andef_construction
parameters based on your needs
Examples
Complete Example
import wso2/pgvector; public function main() returns error? { // Initialize store VectorStore store = check new( host = "localhost", user = "postgres", password = "password", database = "vectordb", vectorDimension = 1536 ); // Add vector VectorData data = { embedding: [0.1, 0.2, 0.3], document: "Sample text", metadata: { "category": "test" } }; VectorDataWithId added = check store.addVector(data); // Search vectors SearchConfig config = { similarityType: COSINE, limit: 10 }; VectorDataWithId[] results = check store.searchVector([0.1, 0.2, 0.3], config); // Close connection check store.close(); }
Use Cases
This module is ideal for:
- Semantic search applications
- AI/ML applications requiring vector similarity search
- Recommendation systems
- Document similarity analysis
- Image similarity search (using image embeddings)