How to Extract Data from Images and Screenshots with Gemini
Hand Gemini a photo of a receipt, whiteboard, or table and get clean structured data back as JSON or a spreadsheet.
Gemini can look at an image and read it, not just transcribe the text but understand the structure of a receipt, a handwritten note, or a screenshot of a table. This guide turns a photo into structured data you can drop into a database or sheet, both in the chat app and via the API.
What you need
- A photo or screenshot containing the data (receipt, table, form)
- A Gemini account, or an API key for the programmatic route
- A target format in mind: JSON, CSV, or a table
Step 1: Attach the image and ask for structure
In the chat app, attach the image with the plus icon, then ask for the data in a specific shape. The key is naming the fields you want, so Gemini does not guess at the schema.
Step 2: Enforce a strict JSON schema (API)
For automation you want JSON every time, not prose. The API supports a response schema that forces the model to return data in your exact structure, which removes the parsing headaches.
import os
from google import genai
from google.genai import types
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
img = client.files.upload(file="receipt.jpg")
schema = {
"type": "object",
"properties": {
"merchant": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"},
},
"required": ["merchant", "date", "total"],
}
resp = client.models.generate_content(
model="gemini-2.5-flash",
contents=[img, "Extract the receipt details."],
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=schema,
),
)
print(resp.text)Step 3: Handle multi-row tables
For a screenshot of a table, ask for an array of row objects and name every column. Tell Gemini to leave a field empty rather than invent a value when a cell is blurry or cut off.
Read this table screenshot. Return an array of objects with keys:
name, role, email. If a cell is unreadable, use an empty string,
never guess.Result
You get structured, parseable data from a plain photo. With response_schema in the API the output is reliable enough to feed straight into a spreadsheet, an invoice tool, or a database insert without manual cleanup.
Watch related tutorials
1:42:18
28:14
41:09
9:47
8:23
52:31