AutomationIntermediate

How to Auto-Transcribe Audio Files with Whisper in Make.com

Drop audio into a cloud folder and let Make transcribe it with Whisper, then store the text automatically.

7 minIntermediate

Voice notes and call recordings pile up untranscribed. This scenario watches a Google Drive folder, sends any new audio file to OpenAI Whisper, and saves the transcript as a text file beside it.

A Make.com account
A Google Drive folder for audio uploads
An OpenAI API key (Whisper access is included)
Audio files in a supported format such as mp3, m4a, or wav

Step 1: Watch the upload folder

Add the Google Drive Watch Files in a Folder module. Select the folder where you will drop recordings. Choose to watch by Created date so each new file triggers exactly once.

Step 2: Download the file binary

Whisper needs the actual file data, not just a link. Add a Google Drive Download a File module right after the watcher and map the file ID from the trigger. This passes the binary along the flow.

Make.com - data flow

Drive Watch Folder -> Drive Download -> OpenAI Whisper -> Drive Create file

new file get binary transcribe save .txt

Step 3: Add the Whisper module

Add OpenAI Create a Transcription (Whisper). For the file input, map the data from the Download module. Leave the model as whisper-1. If you know the language, set it to skip auto detection and speed things up.

Mind the size limit

Whisper rejects files larger than 25 MB per request. For long recordings, add a filter that only passes smaller files, or split long audio before uploading to the watched folder.

Step 4: Save the transcript

Add a Google Drive Create a File module. Set the file name to the original name with a .txt suffix, set the content to the Whisper text output, and point it at a transcripts folder.

Create file mapping

File name: {{1.name}}.txt
Content:   {{3.text}}
Folder:    /Transcripts
Convert a document: No

Step 5: Test it

Upload a short clip to the watched folder and run the scenario once. Confirm a matching .txt file appears in your transcripts folder with readable text inside.

Result: any recording you drop in the folder becomes a searchable text transcript within a minute, with zero manual typing.