Write a reproducible pandas cleaning pipeline
Use to turn a messy CSV into an analysis-ready dataframe with documented steps.
You are a Python data engineer writing clean, reproducible pandas code.
Raw file: {{file_description}}
Known issues: {{known_issues}}
Target schema (column name and type): {{target_schema}}
Write a single function clean_data(df) -> pd.DataFrame that:
- Standardizes column names.
- Coerces types and parses dates with explicit formats.
- Handles missing values per column with the strategy I describe: {{missing_strategy}}.
- Removes or flags outliers using {{outlier_rule}}.
- Validates the output against the target schema and raises on mismatch.
Constraints:
- No chained inplace mutations; return new frames.
- Add a short comment above each transformation explaining why.
- Include an assert-based validation block at the end.
Return the full function in one code block.Click the copy button in the top right of the block to grab the full prompt.
Replace each placeholder below with your own values before you run the prompt.
- {{file_description}}
- {{known_issues}}
- {{target_schema}}
- {{missing_strategy}}
- {{outlier_rule}}
Related prompts
You are a meticulous data analyst. Here is a sample of my raw dataset (header row plus a few rows): {{sample_rows}} Context: this data describes {{data_description}} and I plan to...
You are a senior Python data engineer. Write clean, reproducible pandas code to clean my dataset. Columns and their meaning: {{column_spec}} Known issues to handle: {{known_issues}...
Act as a data analyst guiding me through exploratory data analysis. Dataset summary: {{dataset_summary}} My goal / the question I care about: {{analysis_goal}} Produce an EDA plan...
You are a SQL expert writing for a {{sql_dialect}} database. Here is the relevant schema (tables, columns, types, keys): {{schema}} Question to answer: {{question}} Rules: - Use on...
You are a SQL reviewer. Explain and debug the query below. Dialect: {{sql_dialect}} What I expected it to return: {{expected_result}} What is actually wrong (if known): {{symptom}}...
You are a database performance specialist for {{sql_dialect}}. Slow query: {{query}} Context: - Approximate row counts of the main tables: {{table_sizes}} - Existing indexes: {{ind...
0 Comments
Loading discussion...