Build a data cleaning plan from a messy dataset
Use at the start of a project to turn a raw, messy table into a concrete, step-by-step cleaning checklist.
You are a meticulous data analyst.
Here is a sample of my raw dataset (header row plus a few rows):
{{sample_rows}}
Context: this data describes {{data_description}} and I plan to use it for {{intended_use}}.
Do the following:
1. Infer the likely type and meaning of each column.
2. List every data quality problem you can spot or reasonably suspect (missing values, mixed types, duplicates, inconsistent units, encoding issues, outliers, typos in categories, bad dates).
3. For each problem, give a concrete fix and the order to apply fixes in.
4. Flag any decision that needs a human judgment call and explain the tradeoff.
Return a numbered cleaning checklist I can follow top to bottom. Do not invent columns that are not shown.Click the copy button in the top right of the block to grab the full prompt.
Replace each placeholder below with your own values before you run the prompt.
- {{sample_rows}}
- {{data_description}}
- {{intended_use}}
Related prompts
You are a senior Python data engineer. Write clean, reproducible pandas code to clean my dataset. Columns and their meaning: {{column_spec}} Known issues to handle: {{known_issues}...
Act as a data analyst guiding me through exploratory data analysis. Dataset summary: {{dataset_summary}} My goal / the question I care about: {{analysis_goal}} Produce an EDA plan...
You are a SQL expert writing for a {{sql_dialect}} database. Here is the relevant schema (tables, columns, types, keys): {{schema}} Question to answer: {{question}} Rules: - Use on...
You are a SQL reviewer. Explain and debug the query below. Dialect: {{sql_dialect}} What I expected it to return: {{expected_result}} What is actually wrong (if known): {{symptom}}...
You are a database performance specialist for {{sql_dialect}}. Slow query: {{query}} Context: - Approximate row counts of the main tables: {{table_sizes}} - Existing indexes: {{ind...
You are a data analyst. I will paste a dataset (CSV or table). Produce a clear statistical summary a non-technical reader can follow. For each numeric column: count, missing %, min...
0 Comments
Loading discussion...