AutomationIntermediate

How to Transcribe and Summarize Audio with AI in n8n

Send an audio file to a speech-to-text model, then summarize the transcript with an LLM to turn meetings and voice notes into action items.

8 minIntermediate

Voice notes and meeting recordings are full of decisions nobody writes down. This workflow takes an audio file, transcribes it with a speech-to-text model, and summarizes the transcript into key points and action items. Drop a file in and get notes out.

What you need

A running n8n instance with an OpenAI credential
An audio file (mp3, m4a, or wav) under the model's size limit
A destination such as Notion, a doc, or email for the notes

Step 1: Bring the audio into the workflow

Use a trigger that supplies the file as binary data: a webhook upload, a Google Drive Trigger watching a folder, or a Read Files from Disk node for testing. The audio should arrive as a binary property on the item.

Step 2: Transcribe with the OpenAI node

Add an OpenAI node, set Resource to Audio and Operation to Transcribe a Recording. Choose the binary property holding the audio (for example data). The node returns the spoken words as plain text.

n8n - OpenAI transcription

Node: OpenAI

Resource Audio

Operation Transcribe a Recording

Input binary: data (meeting.m4a)

Output -> text: "Okay so for the launch we agreed..."

Transcribing the uploaded audio binary into text.

Split long recordings

Transcription has a per-file size limit. For long meetings, split the audio into chunks with an earlier node and transcribe each, then concatenate the transcripts before summarizing.

Step 3: Summarize into notes and actions

Feed the transcript into a second OpenAI node set to Message a Model. Ask for a structured summary so the output is consistent and easy to scan.

OpenAI node - User message

Summarize this transcript into:
- TL;DR (2 sentences)
- Key decisions (bullets)
- Action items (owner -> task)

Transcript:
{{ $json.text }}

Step 4: Deliver the notes

Add a destination node such as Notion (Create Page) or Gmail (Send) and map the summary into it. Run the workflow on a real recording and confirm the notes arrive with decisions and action items separated.

Notes output

Agent

TL;DR: Launch moves to the 14th. Pricing page needs a rewrite first.

Agent

Action items: Maria -> rewrite pricing copy. Sam -> update the launch checklist.

The finished summary with decisions and owners.

Result

Any recording you drop in comes back as a clean summary with decisions and owned action items. Meetings and voice memos turn into searchable notes without anyone taking minutes.