Nextcloud Context Chat is a retrieval augmented generation (RAG) based feature of Assistant, grounding a large language model’s responses to users’ prompts with the documents & data that they already store in Nextcloud. From Context Chat in the Nextcloud Assistant interface, users can ask the configured AI model for information and insight about all of their indexed stored files, or with the “selective context” option, only certain files or folders which they select.
Compared to AI assistants such as Microsoft 365 Copilot or Gemini for Google Workspace where the documents are vectorized by the service provider, Context Chat indexes the documents locally through the Context Chat Backend external app. The backend can be repeatably deployed as a Docker container, ideally on a machine with a GPU for best performance. The vector database, ChromaDB, also runs locally in your environment, meaning that your indexed data is as secure as your Nextcloud instance — alleviating potential privacy concerns. Furthermore, you can choose an inference model that runs locally or at an external provider, and it can be either an open model like LLaMA 3.2 or a commercially available model like GPT-4o.
For the most privacy sensitive use cases, we recommend both a local embedding and inference model. For general corporate use cases, using an external inference provider may be more cost effective than purchasing or renting a GPU – since you only pay for the tokens that you use. You also have the advantage of being able to use proprietary models, such as GPT, Claude, and Gemini, which are not available for local deployment.
In this deep dive, we will explore the components of Context Chat and how they work together to deliver this experience. It is recommended that the apps be installed and enabled in this order, and also that the Context Chat backend & frontend apps have a matching version number.
- AppAPI Deploy Daemon (Docker Socket Proxy)
- Context Chat Backend External App
- Nextcloud Assistant Context Chat (Frontend) App
- “LLM2” or “LocalAI or OpenAI Integration” App
- Nextcloud Assistant App
- Background Job Workers
If you require technical advice on how to properly deploy Nextcloud Context Chat or any other Nextcloud AI features, our team of consultants for the open source version of Nextcloud would be pleased to help with architecting your deployment, ensuring privacy & security, and tuning it for performance.
AppAPI Deploy Daemon
The Nextcloud core, and its predecessor, Owncloud, are based on PHP. AppAPI is a framework introduced in Nextcloud 27 that enables Nextcloud apps (add-ons) to be written in other programming languages, but be tightly integrated with Nextcloud – so called, “external apps” or ExApps. To make the deployment as seamless as possible, and save the Nextcloud admin the effort of gathering all the non-PHP dependencies, ExApps are packaged and shipped as Docker images. Through the Docker Engine API which runs on every machine with Docker installed, AppAPI can communicate with the Docker daemon to pull the images and run the containers – but it doesn’t do so directly.
A Docker container known as “deploy daemon” acts as a go-between, with a HAProxy running inside the container proxying all authenticated requests to the host machine’s Docker API socket that is “bind mounted” into the container.
Context Chat Backend External App
The Context Chat Backend external app is deployed through the app_api:app:register
occ command once AppAPI is connected to a working deploy daemon. In particular, the Context Chat Backend container serves up a Python RAG API which the frontend Context Chat app can call to index user data through a background job, or index particular files (on-demand) which a user included for “selective context.” As an app designed specifically for Nextcloud, Context Chat respects the ownership & permissions of data stored in Nextcloud. Only files that are accessible to the user (i.e. created by, or shared to them) may be included in the context of a chat — ensuring proper controls over data internally.
The container also has a built-in Chroma database, where both the text and vectorized representations of the embedded documents are stored. Because the text is stored separately from the Nextcloud data directory and can become quite voluminous, it may be necessary to mount a volume with additional storage into the context_chat_backend container using a Complex Install.
Nextcloud Assistant Context Chat (Frontend) App
Installing the Nextcloud Assistant Context Chat app, either through the Nextcloud App Store, or using the app:enable
occ command adds the “Context Chat” option to the Nextcloud Assistant.
By default, Context Chat considers all of the user-accessible data as possible context for a user’s prompt. By using vector similarity search in Chroma, Context Chat surfaces the document(s) which are most likely to contain relevant context, retrieves those snippets, and hands it off to the large language model for further processing.
Users of Context Chat can narrow down a RAG prompt by selecting the checkbox for “Selective Context” and choosing only the files or folders they wish to be considered by the model. It is planned that Nextcloud apps other than Files will be able to provide context for a Context Chat. Currently, one other first-party Nextcloud app, Analytics (a data visualization tool similar to Looker Studio) can serve as a provider to Context Chat.
“LLM2” or “LocalAI or OpenAI Integration” App
LLM2 is a Nextcloud ExApp for deploying a local large language model on a container running the llama.cpp library. During the nascent rise of open models such as LLaMA, there wasn’t a standardized way to run a model, and the set up depended on the specific hardware you owned. llama.cpp changed that, simplifying running any LLM in GGUF format on a variety of hardware, including CPU, GPU, and Apple Silicon.
We would recommend choosing LLM2 as the method to run a local LLM if you wish to stay within the Nextcloud ecosystem, have at least 8GB of VRAM and 12GB of system RAM, and download models quantized in GGUF format from HuggingFace, LocalAI, or GPT4All.
If you plan to deploy a model locally using a model serving engine other than llama.cpp such as Ollama, HuggingFace TGI, or vLLM, then it is better to install the “LocalAI or OpenAI Integration” app in Nextcloud instead. The same applies if you plan to use an external model provider, such as Azure OpenAI, Bedrock, or Vertex AI, which will be necessary for performance if you don’t have a GPU, or wish to use commercially available models such as GPT, Claude, or Gemini. For integrating non-OpenAI providers such as Bedrock or Vertex AI, an LLM gateway proxy is added in between Nextcloud and the inference service for compatibility reasons.
Note that the model configured in “LLM2” or “LocalAI or OpenAI Integration” is only used for interpreting the user’s prompt and context, and generating the related responses. The model that the Context Chat Backend relies on for indexing data is separate – with the default implementation, the embedding model always runs locally.
Nextcloud Assistant App
This one is self-explanatory. You must have the Nextcloud Assistant app itself installed through the app store, to use the Nextcloud Assistant Context Chat app. Besides Context Chat, you also gain access to a GPT-style chat experience called “Chat with AI”, text generation, and other features such as audio transcription if you install the Local Whisper Speech-to-Text Backend.
Background Job Workers
Background jobs are important to Context Chat for two reasons. First, it determines how frequently new user requests in Assistant are “picked up” by the next running worker process. This influences how responsive Assistant feels to the user, with near instantaneous responses if there are enough workers so that requests are not excessively queued, but not so many workers running simultaneously that the hardware resources are exhausted.
Second, workers are responsible for silently indexing users’ Nextcloud data in the background so that a user’s entire Nextcloud share is available as possible context, whenever “Selective Context” has not been checked.
For more information about this, see the “Optimizing the Nextcloud AI Worker Service for Performance” section of this article.