Generative AI

Nextcloud Assistant 2.0 is the self-hosted file sync & share and groupware suite’s answer to Microsoft 365 Copilot and Gemini for Google Workspace. Compared to Copilot and Gemini from its Big Tech brethren, Nextcloud Assistant has a number of unmatched…

AlloyDB Vector Database for Retrieval Augmented Generation

AI Infrastructure

AlloyDB, Embeddings Models, Generative AI, Google Cloud, LibreChat, PostgreSQL, Retrieval Augmented Generation, Vector Databases

AlloyDB is a fork of PostgreSQL on Google Cloud, optimized for high performance with vector embedding & retrieval workloads. As a PostgreSQL-compatible database, AlloyDB can be used as a drop-in replacement for any application that relies on a Postgres backend.…

Llama 3 on Cloudflare Workers AI – AI at the Edge

AI Infrastructure

AI as a Service, Cloud Run, CloudFlare, Docker, Edge AI, Generative AI, LibreChat, LiteLLM, Llama 3, NextCloud, Open LLM Models

Cloudflare Workers is a serverless computing platform leveraging Cloudflare’s global network of datacenters, also known as edge locations, in over 300 cities and 100 countries around the world. The Workers service is used by Cloudflare’s customers to host & run…

Local Embeddings with Hugging Face Text Embedding Inference

AI Infrastructure

Embeddings Models, Generative AI, LibreChat, Open Source AI, pgVector, Retrieval Augmented Generation, Vector Databases

Embeddings models convert data into numerical representations that can be stored in a vector database, and retrieved by a large language model through a framework such as LangChain for retrieval augmented generation. A common use case is embedding unstructured data,…

Deploy Anthropic Claude 3 with AI Models-as-a-Service

AI Infrastructure

AI as a Service, Amazon Bedrock, Anthropic, ChatGPT Alternative, Claude 3, Generative AI, LibreChat, LiteLLM, Models as a Service, Vertex AI

Anthropic Claude 3 Opus first debuted in Mar 2024, as a GPT-4 class AI model that outperforms OpenAI GPT-4 in some synthetic benchmarks & real-world tests, like attaining competitive scores on standardized exams such as the MBE (Multi State Bar…

RAG with any AI Model using Postgres pgVector + LibreChat

AI Infrastructure

Anthropic, Azure, Generative AI, LibreChat, LiteLLM, Ollama, OpenAI Alternative, pgVector, PostgreSQL, Retrieval Augmented Generation, Vector Databases

The addition of the RAG API microservice to LibreChat in version 0.7.0, the most rapidly trending open source ChatGPT clone, swings the door open to chatting with PDFs and documents using any supported AI model, in a private, self-hosted environment.…

Serverless Deployment of AI Middleware, LiteLLM, with Google Cloud Run

AI Infrastructure

AI Middleware, AI Proxy, Azure, Cloud Run, Generative AI, Google Cloud, LiteLLM, Microservices, Open Source AI, PostgreSQL, Serverless

AI middleware is an emerging term for the layer of the technology stack that facilitates the interfacing of AI end user applications with the Large Language Models and GPU-accelerated machines that drive them. Here are the major sub-categories of this…

Integrating Azure OpenAI with Search & Retrieval Plugins for RAG

AI Infrastructure

Azure, Azure AI Search, Cognitive Search, Generative AI, LangChain, LibreChat, OpenAI Compatible API, Retrieval Augmented Generation

If you have ever used ChatGPT Plus, OpenAI’s SaaS GenAI offering, you are likely familiar with the Browsing extension which retrieves current information from the Internet using Bing search to inform the GPT model’s response to the user’s prompts. One…

AI Retrieval Augmented Generation in the (Virtual) Private Cloud

AI Infrastructure

ChatGPT Alternative, GenAI, Generative AI, Open LLM Models, Open Source AI, Private AI, Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) is an AI technique for getting more tailored responses from a foundation model, without the need for training or fine-tuning a model from scratch. Large Language Models, such as Meta’s LLaMA 2, provide high quality responses…

GPU Inference with Ollama or TGI on Google Cloud

AI Infrastructure

Container Orchestration, GenAI, Generative AI, GPUs, Locally Hosted AI, NVIDIA CUDA, Ollama

GPUs increase the speed of LLM inference and embedding by an average of an order of magnitude or more, enabling a model to generate tokens as quickly as a user can read. As a typical reading speed is two to…

Run Open AI Models for Enterprise with a ChatGPT “Clone”

AI Infrastructure

Bionic-GPT, ChatGPT Alternative, GenAI, Generative AI, LibreChat, Open AI Models, Open LLM Models, Open Source AI, OpenAI Compatible API, Self-Hosted Apps

Are you looking for an alternative to ChatGPT Enterprise with no minimum number of seats or annual contract, with enterprise features such as single-sign on (SSO) through Google Workspace, OpenID, or Microsoft Entra ID? It is worth considering running an…

Retrieval Augmented Generation (RAG) with Local Embeddings

AI Infrastructure

Bionic-GPT, GenAI, Generative AI, Graph Databases, LiteLLM, Neo4j, Open LLM Models, pgVector, Retrieval Augmented Generation, Vector Databases

Retrieval Augmented Generation (RAG) could very well be the hottest topic in generative AI right now. It has been clear since ChatGPT took the world by storm that RAG is one of the best use cases of GenAI for enterprises.…