I will clean, wrangle and prepare your dataset in r or python
Statistical Consultant and Data Analyst : R, Python, Power BI
About this Gig
Is your dataset messy, inconsistent, or hard to use?
I will clean and structure it so it's ready for analysis.
I specialize in data cleaning and preparation for complex, real-world datasets, including multi-wave surveys, administrative records, and large-scale longitudinal data.
What I deliver:
- Removal of duplicates, outliers, and inconsistencies
- Correct data types and formatting
- Missing value handling (removal, imputation, flagging)
- Merging and joining multiple datasets
- Reshaping (wide <-> long format)
- Variable recoding and standardization
- Clean, analysis-ready output file (CSV, Excel, RDS, or similar)
Perfect for Excel files, CSV datasets, survey data, and business data.
You'll receive a clean file, ready to be analyzed.
Script in R or Python available with Standard and Premium packages.
Have a particularly complex dataset? Message me before ordering I'm happy to assess your case first.
My Portfolio
FAQ
How do I know which package is right for my dataset?
Basic: single file, standard cleaning. Standard: multiple files, merging/reshaping + script. Premium: large-scale, longitudinal, multi-wave, imputation + full pipeline. Not sure? Message me and I'll recommend the right one.
What kind of datasets can you clean?
Excel, CSV, survey data, business datasets, and more — from small files to large, complex, multi-source datasets. If unsure whether your data qualifies, just message me before ordering.
Do you deliver only the clean file, or also the code?
Basic: clean file only. Standard and Premium include a documented R or Python script with every step clearly explained, so you can reproduce or modify the pipeline yourself.
Will my data be kept confidential?
Your data is used exclusively to complete your order and never shared. If needed, I'm open to signing an NDA before you share any files.
Can you handle large or complex datasets?
Yes. I have experience with large-scale, multi-source, longitudinal, and multi-wave datasets — including data with 400k+ rows and 100+ variables.

