I will build ml classifier for cancer subtype prediction from gene expression data
About this Gig
Do you have labeled gene expression data and need
a machine learning classifier to predict cancer
subtypes or patient outcomes?
I will build a complete ML classification pipeline
tailored to your genomics dataset.
WHAT YOU GET:
- Data preprocessing and normalization
- Feature selection to identify most informative genes
- Multiple algorithm comparison (Random Forest, SVM,
Gradient Boosting, KNN)
- Cross-validation accuracy assessment
- Confusion matrix and classification report
- Feature importance visualization
- Production-ready saved model
MY EXPERIENCE:
Built a breast cancer subtype classifier on gene
expression data achieving 85.2% cross-validation
accuracy using SVM. Classified 4 subtypes:
LuminalA, LuminalB, HER2, TripleNegative.
Full pipeline on GitHub.
WHAT I NEED FROM YOU:
- Gene expression matrix (samples x genes)
- Subtype or outcome labels for each sample
- Number of classes to predict
- Any known important genes or pathways
TOOLS: Python, scikit-learn, pandas, numpy,
matplotlib, seaborn, joblib, Linux, Git
Expertise:
Classification
•
Clustering
•
Predictive analysis
Programming language:
Python
•
R
Frameworks:
Scikit-learn
•
Panda
APIs:
Other
Tools:
Jupyter Notebook
•
RStudio

