Integrating Nextcloud AI Assistant with Inference Engines & APIs

Nextcloud Assistant, featured in the latest versions of Nextcloud Hub, extends the cloud storage & collaboration suite with a range of AI-powered capabilities, including GPT-style chat, text generation, and speech-to-text transcription. With the Smart Picker, you can use these features in-line with Nextcloud apps such as Calendar, Mail, Talk, Text, and Office. Also, the chat & text generation can optionally use the documents you select from your Nextcloud cloud storage to provide contextually-aware responses & outputs with Context Chat and Context Write. It accomplishes all of this while respecting your privacy, in these ways:

The large language model used can run locally on your server or at a trusted inference provider. Inputs are never used for model training, and will never appear in responses to other customers.
Both open source & proprietary AI models, including GPT-4o, Claude 3.5, Gemini Pro, and Llama 3.2 are supported. Our middleware proxy translates Nextcloud Assistant’s OpenAI-compatible API calls into requests that are supported by Azure OpenAI, Bedrock, and Vertex AI.
Document indexing for Context Chat is carried out locally and stored in Chroma DB, an open source vector database.
Audio transcription is handled using by the best-in-class Whisper model supporting 99 different languages, with different model sizes to choose from for speed & accuracy.
If you are running a foundation or fine-tuned model locally on a GPU accelerated server, Nextcloud Assistant can be integrated with Ollama or other model serving engine such as Hugging Face TGI or vLLM.

Nextcloud Assistant "Chat with AI" feature — Nextcloud Assistant “Chat with AI” feature

Nextcloud Assistant is an open and extensible alternative to subscription-based add-ons such as Microsoft 365 Copilot and Gemini for Google Workspace. Unlike with the competitors, there are no per-seat licensing costs – you can create as many accounts with access to Assistant as you need – and simply pay-as-you-go for the number of tokens used. You can specify which users have access to Assistant, and integrate with an existing LDAP/AD or Entra directory to bring in your groups & organizational structure. This is supported through the LDAP, Nextcloud SSO & SAML, Social Login, and OpenID Connect Nextcloud apps.

These features require additional configuration following Nextcloud installation, which our consultants for Nextcloud can assist with. Even though the documentation for SSO & SAML configuration is pay-walled to Enterprise customers, the code itself is available open source to Nextcloud Community users – a subscription is not required.

Nextcloud Assistant integration with Azure OpenAI, Amazon Bedrock, Google Vertex AI

Because the Nextcloud “OpenAI and LocalAI integration” backend expects an OpenAI API compatible endpoint that supports the /models/ and /v1/chat/completions paths, out of the box, it can only be integrated directly with models served with an OpenAI compatible API, such as OpenAI, Azure OpenAI, and Ollama.

To integrate other models (e.g. Claude) or inference providers (e.g. Bedrock or Vertex AI) that use a different library, it is necessary to deploy a middleware proxy such as LiteLLM Proxy as a “translation layer.” With the LLM gateway, Nextcloud Assistant can plug in to 100+ LLMs from inference providers including Anthropic, Amazon Bedrock, and Google Vertex AI.

LiteLLM LLM Gateway can be deployed as a container on the same server as your Nextcloud instance, or as a standalone service on a separate server. The ideal deployment method will depend on the scale of your Nextcloud deployment (single server or clustered) and whether it resides on virtual machines, or pods in a Kubernetes cluster. For assistance with deploying LiteLLM as a proxy to your preferred LLM and inference provider, contact our AI and Data Infrastructure consultants.

Optimizing the Nextcloud AI Worker Service for Performance

By default, Nextcloud Assistant runs AI tasks using the single cron job (cron.php) that is set up by default with every Nextcloud installation. This is less than ideal because it could take up to 5 minutes for a user’s request to even begin being processed. It might be “acceptable” for background tasks such as generative text or audio transcription, but too slow for the “chat with AI” experience. Also, as the number of users simultaneously interacting with Nextcloud Assistant increases, it becomes more likely for tasks to be queued with this single path of execution.

To provide a more responsive experience, Nextcloud 30 & above supports configuring an AI worker as a systemd service to speed up AI task pick up between the regular cron execution intervals. The standard recommendation in the documentation is to have 4 workers running in parallel with a 60 second timeout, but these values can be tuned for your environment’s expected load.

We would recommend every Nextcloud administrator deploying Assistant to configure the AI worker service before making the feature available to any users, to prevent users from mistakenly thinking that their request is “hung” and repeatedly submitting their prompt again and again.

Nextcloud AI worker Systemd service — Nextcloud AI worker systemd service

Context Chat and its Dependency on AppAPI (ExApps)

Context Chat, which is a retrieval augmented generation feature of Nextcloud Assistant that enables your users to ask questions about their documents through natural-language prompts, also requires a deploy daemon to be configured for AppAPI.

AppAPI was introduced in Nextcloud 27 allowing apps in other languages than PHP to be written to extend Nextcloud – by deploying external apps (ExApps) as Docker containers. The deploy daemon is a “helper container” which exposes the Docker Engine socket on your server to the Nextcloud app store, through a bind mount. Because it allows anyone with access to it to run arbitrary code on your server, the deploy daemon should preferably not be exposed to the network (for a single Nextcloud server), or only a trusted, local network (for a clustered Nextcloud deployment).

In a networked set up, the deploy daemon should be secured with a strong HAProxy password, and have TLS certs configured to encrypt the connection. If the certs are self-signed, they should be manually added to the certificate store of the Nextcloud servers so that they are trusted.

Nextcloud AppAPI "Deploy Daemon" for external apps — Nextcloud AppAPI “Deploy Daemon” for external apps

Professional Nextcloud & Nextcloud Assistant Deployment

If you are looking at deploying Nextcloud Assistant as a self-hosted AI solution for your customers, get in touch with our consultants for Nextcloud who can help you plan the architecture of all the services needed, and properly configure them. This ensures the best experience for your users, minimizing the errors they may experience in some apps during the initial deployment. Also, we can recommend inference engines or services that meet the privacy & security requirements of your use case or industry.

Nextcloud Assistant is currently the only intelligent AI assistant in the market that is integrated with an open source groupware suite which can be self-hosted in any environment. Since 2016, Autoize has been assisting organizations, including international NGOs, top research universities, and companies in the financial services sector, with their Nextcloud deployments in public & hybrid cloud environments. Get in touch with us and we would be pleased to discuss your project, and provide unbiased advice which Nextcloud apps and hosting options are most suitable for your needs.