GPUaaS: Private, serverless transcription tool for confidential audio recordings

July 1, 2026

GPUaaSAzureCloudAzure BicepAzure Container appScale-to-zeroOpenAI/Whisper

GPUaaS: Private, serverless transcription tool for confidential audio recordings

Automatic speech recognition (ASR)

AI transcription has become a standard part of the workflow in Norwegian journalism, used for turning raw audio into text, generating subtitles, and translation. Most platforms on the market bundle all of that together with editing and publishing, aimed at newsrooms that can send their recordings wherever the tool happens to be hosted.

The problem with third-party transcription services

For many companies, confidentiality is the real constraint, and it rules out most of the obvious options before quality even comes into it. Public transcription services typically require some combination of the same things, an account, a file upload, storage on their servers, and that’s a risk on its own no matter how good the transcription is. Most of them also work far better in English than Norwegian, especially once dialects and local nuance are involved. They run on general, publicly available models like OpenAI’s through an API, which struggle with Norwegian dialects the way any non-native system does.

In-house solution

NB-AI-Lab’s models are the ones that actually handle Norwegian well, trained specifically for it. Their license would have allowed a hosted API, but the issue was sending confidential audio to any third party at all, licensed or not, and that was off the table from the start.

Schibsted had already solved a version of this with JoJo, a publicly known app that wraps NB-AI-Lab’s model in a UI shell running locally on a MacBook. It works, but only if you have one, and even then performance depends on the machine’s specs. It’s a single-user, single-machine tool, which doesn’t scale to a team without everyone owning matching hardware. Running a dedicated cloud GPU 24/7, or a full VM just to cover that gap, wasn’t affordable either, for something used intermittently.

Azure

GPU-as-a-Service

The result was a web application hosted on a private Azure environment, backed by serverless GPU compute with scale-to-zero. The model only runs, and only costs money, while it’s actually transcribing. File storage, authentication, and the job queue sit on infrastructure controlled internally, with access locked to verified accounts.

The GPU takes a short while to spin up from a cold state, the direct tradeoff for not paying to keep it running around the clock. That cold start also means the setup can afford to run on serious hardware, an H100, without the cost of leaving it idle. Once it’s up, transcription itself runs fast, and the app handles multiple jobs coming in.

The interface shows what’s happening at each stage, starting, queued, running, with live progress on the transcription itself. Output is available in several formats, and every job gets a saved link to its results. Files and transcripts are retained only for a limited period before automatic deletion, so nothing confidential sits around indefinitely.

Nvidia Azure

Impact

The result is a browser-based tool usable by the whole team, with no dependency on individual hardware, no files leaving internal infrastructure, and a cost that scales with actual use instead of uptime. It replaced hours of manual transcription on material that couldn’t go through a public service in the first place, without the recurring cost of a GPU sitting idle most of the day.

Whisper application

Future possibilities with GPUaaS

The serverless GPU pattern isn’t tied to transcription specifically, it works for any model that’s too expensive to keep online 24/7 but too sensitive to send through a third party. The same scale-to-zero approach could support other confidential AI workloads, document classification or redaction on sensitive files, image or video analysis that can’t leave internal infrastructure, fine-tuned models for internal search or summarization, or heavier one time jobs like batch OCR that only need a GPU for a few hours a month. Any team with a similar shape of problem, real GPU need but low and irregular usage, could reuse the same job queue, authentication, and storage pattern instead of building it from scratch.