Open to Data Science roles

Data Scientist · Machine Learning · Decision Intelligence

Turning data into reliable models, clear insights, and decision-support tools.

I’m Amir Honardoust, a Data Scientist focused on explainable machine learning, forecasting, NLP, analytics, and practical AI systems people can understand and use.

Risk ML RAG Systems Synthetic Data Recommenders

What I do

Data science that moves from analysis to action.

My work connects statistical thinking, machine learning, product sense, and clear communication. I care about models that are evaluated, explainable, and useful beyond a notebook.

01

Risk Modeling & Decision Safety

Underwriting and fraud-risk workflows with calibration, threshold policies, abstention, validation, explainability, and review-focused reporting.

02

RAG, NLP & Recommendation Systems

Retrieval pipelines, knowledge-graph augmentation, text classification, recommender evaluation, and AI systems built for traceability.

03

Synthetic Data & Applied ML

Synthetic tabular-data evaluation, business prediction tools, reproducible model workflows, dashboard outputs, and portfolio-grade documentation.

Featured projects

Selected proof of work.

Explore the technical lab ↗

Fraud Detection · Risk ML

GitHub ↗

Financial Fraud Risk Engine

A cost-sensitive fraud-detection system with SHAP explanations, threshold optimization, batch scoring, and an interactive review dashboard.

Problem

Fraud detection has to balance missed fraud, false alarms, and analyst review time.

Method

Cost-sensitive thresholds, SHAP reason codes, batch scoring, and a triage dashboard.

Proves

Risk-modeling workflow design, explainability, and operational ML thinking.

Risk MLSHAPThresholdsStreamlit

RAG · LLM Systems

GitHub ↗

Graph-RAG Engine

An explainable graph + vector RAG system with FAISS retrieval, knowledge-graph reasoning paths, a FastAPI backend, and a Streamlit UI.

Problem

Plain vector RAG misses connected context and is hard to trace.

Method

FAISS retrieval combined with knowledge-graph reasoning paths behind a FastAPI + Streamlit app.

Proves

Applied AI architecture, retrieval design, and explainable answer generation.

RAGFAISSFastAPIStreamlit

Synthetic Data · Generative ML

GitHub ↗

Synthetic Data Artist

A research-style comparison of Gaussian Copula and VAE methods with distribution checks, correlation analysis, PCA diagnostics, and visual reports.

Problem

Synthetic data needs evidence of quality, not just generated rows.

Method

Compared Gaussian Copula and VAE outputs with distribution, correlation, and PCA diagnostics.

Proves

Statistical evaluation maturity, data-quality analysis, and reproducible ML tooling.

VAEGaussian CopulaPCAEvaluation

NLP · Responsible AI

GitHub ↗

Fake News Detector

A TF-IDF + Logistic Regression style-risk detector with a Streamlit app, CLI prediction, uncertainty handling, leakage analysis, tests, and CI.

Problem

Text classifiers can quietly learn leakage instead of real signal.

Method

TF-IDF + logistic regression with leakage analysis, uncertainty handling, tests, and CI.

Proves

End-to-end NLP workflow, careful validation, and responsible-AI framing.

NLPTF-IDFStreamlitCI

SQL · Machine Learning

GitHub ↗

Coffee Shop Profit Predictor

An end-to-end site-selection workflow with SQL feature engineering, regression modeling, model comparison, candidate ranking, tests, and CI.

Problem

Site-selection calls need realistic evaluation and decision-ready ranking.

Method

SQL feature engineering, regression modeling, model comparison, and candidate ranking.

Proves

SQL + ML workflow, honest evaluation, and business-oriented communication.

SQLRegressionModel ComparisonCI

More experiments live on honardoust.codes

Technical notes, project breakdowns, reproducible workflows, and deeper implementation details.

Visit technical lab

About

A Data Scientist with a builder’s mindset.

I focus on practical data science: understanding the problem, shaping the data, building the right model, evaluating it honestly, and communicating the result clearly.

My strongest interests are risk modeling, retrieval-augmented generation, synthetic data evaluation, recommender systems, explainability, and analytics systems that help people make better decisions.

“Good data science is not just a model. It is a reliable path from messy evidence to a decision someone can trust.”

Skills

Tools and capabilities.

Languages

Python, SQL, Solidity, MQL5.

Data

pandas, NumPy, SQLite, SQLAlchemy.

Machine Learning

scikit-learn, XGBoost, LightGBM, joblib.

Deep Learning / NLP

PyTorch, TensorFlow / Keras, Hugging Face Transformers, BERT.

RAG / AI Systems

FAISS, Sentence Transformers, FastAPI, Streamlit.

Visualization

matplotlib, Plotly, Streamlit dashboards.

Explainability

SHAP, feature importance, calibration, threshold analysis.

Workflow Quality

Tests, CI, reproducible outputs, model artifacts, documentation.

Contact

Open to Data Science roles, collaborations, and applied ML projects.

The fastest way to reach me is through LinkedIn or GitHub. For technical details, visit my lab at honardoust.codes.