GeminiIntermediate

How to Feed a Long Document into Gemini's Long Context

Use Gemini's large context window to analyze a whole book or a stack of PDFs in one prompt instead of chunking it.

9 minIntermediate

Gemini's long context window lets you put an entire book, a long transcript, or many PDFs into a single request, so the model can answer across the whole thing without you splitting it into pieces. This guide loads a large PDF and asks questions that span the full document.

What you need

A Gemini API key
The google-genai Python SDK installed
One or more long PDFs or text files
Awareness that very large inputs cost more tokens per request

Step 1: Upload the document

For files over a few megabytes, upload through the File API rather than inlining the bytes. Smaller PDFs can be passed inline, but the File API is the reliable path for long documents.

long_doc.py

import os
from google import genai

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

doc = client.files.upload(file="annual-report.pdf")

Step 2: Ask a question that spans the whole file

The point of long context is questions that need information from many places at once. Ask the model to compare sections, trace a theme, or pull every mention of something across hundreds of pages.

long_doc.py

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents=[
        doc,
        "List every figure mentioned for total revenue across all "
        "quarters, with the page number for each.",
    ],
)
print(response.text)

Terminal - cross-document answer

$ python long_doc.py

Q1 revenue: 4.2M (p.12)

Q2 revenue: 4.9M (p.18)

Q3 revenue: 5.1M (p.24)

Q4 revenue: 6.0M (p.31)

One request, answers pulled from across the entire PDF.

Step 3: Check how many tokens you used

Long inputs add up fast. Read the usage metadata on the response to see how many tokens the document consumed, which helps you estimate cost and decide whether to enable context caching for repeat queries.

long_doc.py

u = response.usage_metadata
print("prompt tokens:", u.prompt_token_count)
print("output tokens:", u.candidates_token_count)
print("total tokens:", u.total_token_count)

Reuse the same doc cheaply

If you will ask many questions about the same large file, set up context caching. You pay to process the document once, then each follow-up question reuses the cache at a lower rate instead of re-reading the whole thing.

Heads up

Accuracy can dip when the answer depends on a single sentence buried in a huge input. For high-stakes lookups, ask the model to quote the exact passage and page so you can verify it.

Result

You get answers that draw on the entire document in one shot, complete with page references, without writing any chunking or retrieval code. For one-off deep reads this is far simpler than building a retrieval pipeline.