How to Transcribe and Summarize Audio with AI in n8n
Send an audio file to a speech-to-text model, then summarize the transcript with an LLM to turn meetings and voice notes into action items.
Voice notes and meeting recordings are full of decisions nobody writes down. This workflow takes an audio file, transcribes it with a speech-to-text model, and summarizes the transcript into key points and action items. Drop a file in and get notes out.
What you need
- A running n8n instance with an OpenAI credential
- An audio file (mp3, m4a, or wav) under the model's size limit
- A destination such as Notion, a doc, or email for the notes
Step 1: Bring the audio into the workflow
Use a trigger that supplies the file as binary data: a webhook upload, a Google Drive Trigger watching a folder, or a Read Files from Disk node for testing. The audio should arrive as a binary property on the item.
Step 2: Transcribe with the OpenAI node
Add an OpenAI node, set Resource to Audio and Operation to Transcribe a Recording. Choose the binary property holding the audio (for example data). The node returns the spoken words as plain text.
Step 3: Summarize into notes and actions
Feed the transcript into a second OpenAI node set to Message a Model. Ask for a structured summary so the output is consistent and easy to scan.
Summarize this transcript into:
- TL;DR (2 sentences)
- Key decisions (bullets)
- Action items (owner -> task)
Transcript:
{{ $json.text }}Step 4: Deliver the notes
Add a destination node such as Notion (Create Page) or Gmail (Send) and map the summary into it. Run the workflow on a real recording and confirm the notes arrive with decisions and action items separated.
Result
Any recording you drop in comes back as a clean summary with decisions and owned action items. Meetings and voice memos turn into searchable notes without anyone taking minutes.
Watch related tutorials
32:08
21:45
34:10
26:40
32:15
40:20