A
CursorBot that handles voice notes and images, not just text
Multimodal Telegram Bot: Voice, Photos, Commands
setuproll@setuproll84.0Overall score
A Gemini 2.5 Pro Telegram bot that transcribes voice notes, reads photos, and replies with natural speech via ElevenLabs, all wired behind clean slash commands. Great for builders who want a hands-free, multimodal assistant that feels modern on a phone.
84.0Score
1.2kVotes
5Components
Install this build
terminal
pip install aiogram google-genai elevenlabs && python bot.pyComponents
Model
- Gemini 2.5 Pro
Stack
- aiogram
- ElevenLabs
- Python
Modalities
- Voice transcription
- Image understanding
- Text-to-speech reply
How it works
- Register /ask, /image, and /voice commands with BotFather
- Send voice notes and photos straight to Gemini 2.5 Pro for understanding
- Generate the answer, then synthesize an audio reply with ElevenLabs
- Return text plus an optional voice message in the same chat
Deploy
- VPS with long polling
- systemd service
- Per-user rate limits
Summary
A Gemini 2.5 Pro Telegram bot that transcribes voice notes, reads photos, and replies with natural speech via ElevenLabs, all wired behind clean slash commands. Great for builders who want a hands-free, multimodal assistant that feels modern on a phone.
84.0 score 1.2k votes
0 Reviews
Your rating
Sign in to post
Loading discussion...