Integrator of Open Source Generative AI Software & Models

Generative AI & RAG

Chat with an LLM and get answers enriched by the documents and data you already have — through Retrieval Augmented Generation.

Databases & Data Pipelines

Extract, transform, and load (ETL) data from various sources into knowledge graph or vector embeddings to use with AI models.

On-Prem & Cloud Infrastructure

Host complete AI-powered applications, or AI endpoints for existing applications in secure, privacy-respecting environments.

AI Technology Stack

Frontend / AI Middleware

BionicGPT, LibreChat, or Open WebUI Chat Platform
Nextcloud Assistant 2.0 (MS365 Copilot Alternative)
LangChain Retrieval Augmented Generation (RAG)
Ollama, Hugging Face TGI/TEI Model Serving
LiteLLM AI API Endpoint Proxy
Airbyte or n8n Data Pipeline Automation

Databases for AI Applications

PostgreSQL & pgVector Extension
MongoDB & MongoDB Atlas
Supabase Platform-as-a-Service
Neo4j Graph Database
Redis Semantic Cache
Elasticsearch Vector Search

Compute Infrastructure & AIaaS

Amazon Web Services (Bedrock)
Google Cloud (Vertex AI)
Azure AI Services (AI Studio)
Paperspace by DigitalOcean
OctoAI Media & Text Gen Solution
Docker, Swarm, & Kubernetes

Foundation Open Source AI Models

LLaMA 3.2, 3.1, and 3

Llama 3.2, 3.1, and 3 are a suite of AI models by Meta, including Llama Chat, Instruct, Llama Guard, and Code Llama.

Mistral AI

Mistral is an AI model fine-tuned for chat applications, developed in Europe and the first to partner with Microsoft after OpenAI.

Falcon LLM

Falcon LLM was among the first models made available for research & commercial applications by Abu Dhabi’s Advanced Technology Research Council.

Google Gemma 2

Google Gemma 2 is the latest open access model made available by Google, based on its flagship Gemini model, but optimized for edge AI.

Microsoft Phi-3

Microsoft Phi-3 is a 3.8B parameter small language model (SLM) that can be deployed & fine-tuned with low resource usage but high quality results.

Technical Deep Dive Into Nextcloud Context Chat

Integrating Nextcloud AI Assistant with Inference Engines & APIs

AI Data Loss Prevention (DLP) with Microsoft Presidio

Private Enterprise GPT on Any Cloud with Inference APIs

OctoAI Acquired by NVIDIA (Shuts Down) – Predictions & Alternatives

Deploy n8n with Docker Compose for Automating AI Workflows

Open WebUI + Ollama with Azure Kubernetes Service & Ingress TLS

AI Document Data Pipelines with S3 or Azure Blob Storage

ELT Data Pipelines with Airbyte & BionicGPT for AI RAG

Nextcloud Assistant 2.0 – AI Text Gen with Phi-3 & Transcription with Whisper

AlloyDB Vector Database for Retrieval Augmented Generation

Llama 3 on Cloudflare Workers AI – AI at the Edge

Local Embeddings with Hugging Face Text Embedding Inference

Deploy Anthropic Claude 3 with AI Models-as-a-Service

Llama 3 AI Model Serving with Ollama & LiteLLM

RAG with any AI Model using Postgres pgVector + LibreChat

Run at the Edge, in the Datacenter, or Cloud

Self-Hosted AI Models

Integrate applications, such as chatbots, with local endpoints so that your users’ prompts and confidential data never leaves your environment. Enjoy predictable, flat costs of provisioning as many GPU instances as you require. Best for sensitive use cases, where data sovereignty is a concern.

AI Models as a Service (AI MaaS)

Integrate with external AI endpoints where you “pay as you go” with simple, per-token pricing and no up-front hardware cost. Take advantage of open models that are multiple times more cost effective than OpenAI GPT models. Best for general use cases and application prototyping.

Benefits of Composable AI Infrastructure

Generative AI (GenAI)

Generate images and text based on prompts using a Large Language Model (LLM).

Flat or Per-Token Pricing

Provision the number of GPUs you desire, or get started with no upfront hardware cost.

OpenAI Compatible API

Expose Hugging Face AI models with an OpenAI-compatible endpoint via LiteLLM Proxy.

Edge AI

Run AI models locally on the device where the application is used, or on IoT devices.

Secure by Default

Manage costs, and ensure appropriate use of AI applications with audit logs and integration with SSO.

Containerized Architecture

Deploy & scale with composable AI architecture, using pre-built stacks on Docker Swarm and Kubernetes.

Private AI

Ensure your proprietary data remains confidential, and prevent leaks of personally identifiable information (PII).

No Internet Connection Required

Enjoy the benefits of AI inferencing without transmitting any prompts or embedding data to an external provider.

Scalable Load Balancing

Scale out with multiple compute nodes for your AI endpoints as your inferencing demands grow.

Built on Open Software

Integrate an open source stack with open access AI models to build complete AI solutions with no licensing cost.

Use any Large Language Model

“Plug and play” AI models from repositories such as Hugging Face with one unified technology stack.

Integrate with Existing Apps

Use open AI models as a “drop-in replacement” for OpenAI’s API, with minimal to no refactoring.