I will build ml classifier for cancer subtype prediction from gene expression data

Zambia

I speak English

Bioinfomatics Pipeline Developer

I build reproducible bioinformatics pipelines for NGS data analysis, specializing in RNA-seq differential expression and cancer genomics. Recent project: RNA-seq pipeline on NCBI breast cancer dat...
About this Gig

Do you have labeled gene expression data and need 

a machine learning classifier to predict cancer 

subtypes or patient outcomes?


I will build a complete ML classification pipeline 

tailored to your genomics dataset.


WHAT YOU GET:

- Data preprocessing and normalization

- Feature selection to identify most informative genes

- Multiple algorithm comparison (Random Forest, SVM,

 Gradient Boosting, KNN)

- Cross-validation accuracy assessment

- Confusion matrix and classification report

- Feature importance visualization

- Production-ready saved model


MY EXPERIENCE:

Built a breast cancer subtype classifier on gene 

expression data achieving 85.2% cross-validation 

accuracy using SVM. Classified 4 subtypes: 

LuminalA, LuminalB, HER2, TripleNegative.

Full pipeline on GitHub.


WHAT I NEED FROM YOU:

- Gene expression matrix (samples x genes)

- Subtype or outcome labels for each sample

- Number of classes to predict

- Any known important genes or pathways


TOOLS: Python, scikit-learn, pandas, numpy,

    matplotlib, seaborn, joblib, Linux, Git

Expertise:

Classification

Clustering

Predictive analysis

Programming language:

Python

R

Frameworks:

Scikit-learn

Panda

APIs:

Other

Tools:

Jupyter Notebook

RStudio