AI Infrastructure

RAG with any AI Model using Postgres pgVector + LibreChat

The addition of the RAG API microservice to LibreChat in version 0.7.0, the most rapidly trending open source ChatGPT clone, swings the door open to chatting with PDFs and documents using any supported AI model, in a private, self-hosted environment.…
Read More

Serverless Deployment of AI Middleware, LiteLLM, with Google Cloud Run

AI middleware is an emerging term for the layer of the technology stack that facilitates the interfacing of AI end user applications with the Large Language Models and GPU-accelerated machines that drive them. Here are the major sub-categories of this…
Read More

Integrating Azure OpenAI with Search & Retrieval Plugins for RAG

If you have ever used ChatGPT Plus, OpenAI’s SaaS GenAI offering, you are likely familiar with the Browsing extension which retrieves current information from the Internet using Bing search to inform the GPT model’s response to the user’s prompts. One…
Read More

Run Open AI Models for Enterprise with a ChatGPT “Clone”

Are you looking for an alternative to ChatGPT Enterprise with no minimum number of seats or annual contract, with enterprise features such as single-sign on (SSO) through Google Workspace, OpenID, or Microsoft Entra ID? It is worth considering running an…
Read More

Proxies & Load Balancers for AI LLM Models (AI Middleware)

The Cambrianesque explosion of capable, open Large Language AI Models represents an opportunity to extend virtually any application with AI capabilities, but a strategy for managing multiple AI endpoints is clearly needed. Hosting open models in your own environment requires…
Read More