A
Cursor logoCursorBot that handles voice notes and images, not just text

Multimodal Telegram Bot: Voice, Photos, Commands

setuproll@setuproll
84.0Overall score

A Gemini 2.5 Pro Telegram bot that transcribes voice notes, reads photos, and replies with natural speech via ElevenLabs, all wired behind clean slash commands. Great for builders who want a hands-free, multimodal assistant that feels modern on a phone.

84.0Score
1.2kVotes
5Components

Install this build

Export
terminal
pip install aiogram google-genai elevenlabs && python bot.py

Components

Model

  • Gemini 2.5 Pro

Stack

  • aiogram
  • ElevenLabs
  • Python

Modalities

  • Voice transcription
  • Image understanding
  • Text-to-speech reply

How it works

  • Register /ask, /image, and /voice commands with BotFather
  • Send voice notes and photos straight to Gemini 2.5 Pro for understanding
  • Generate the answer, then synthesize an audio reply with ElevenLabs
  • Return text plus an optional voice message in the same chat

Deploy

  • VPS with long polling
  • systemd service
  • Per-user rate limits

Summary

A Gemini 2.5 Pro Telegram bot that transcribes voice notes, reads photos, and replies with natural speech via ElevenLabs, all wired behind clean slash commands. Great for builders who want a hands-free, multimodal assistant that feels modern on a phone.

84.0 score 1.2k votes

0 Reviews

Your rating
Sign in to post

Loading discussion...