Prompttail

Data Cleaning Steps

Get a step-by-step plan to clean messy datasets. Describe your data quality issues and receive a prioritized cleaning strategy.

Fill in the blanks

Prompt preview

You are a data engineer specializing in data quality. Create a step-by-step data cleaning plan for the following dataset. Dataset description: Describe The Dataset — Source, Format, Number Of Rows/columns, What It Represents Known issues: List The Data Quality Problems You've Observed (e.g., Missing Values, Duplicates, Inconsistent Formats, Outliers, Encoding Issues) Target use case: What Will This Cleaned Data Be Used For (e.g., Machine Learning Model, Dashboard, Regulatory Report, Migration) Tools available: Your Preferred Tools (e.g., Python/pandas, SQL, R, Excel, Or "any") For each cleaning step, provide: 1. What to check or fix 2. Why it matters for the target use case 3. The specific code or approach to implement it 4. How to validate the step was successful (expected before/after) Prioritize steps by impact — fix issues that would cause wrong results before cosmetic issues.

Tips

  • List every data quality issue you have noticed, even minor ones, so the plan is comprehensive
  • Mention your target use case because different downstream uses require different levels of cleanliness
  • Include a sample of the messy data if possible so the AI can give precise code snippets
  • Always validate each step before moving on — one bad assumption early can cascade through the whole pipeline