MS Data Science  ·  Columbia University

Liang-Jie Chiu

ML Researcher · Deep Learning · XAI

Transforming data into actionable insights with AI and deep learning

SCROLL

About Me

I'm a Master's student in Data Science at Columbia University with a passion for leveraging AI to solve real-world problems.

Building on my B.B.A. in Management Information Systems from National Sun Yat-sen University (GPA: 3.97/4.0), I've developed expertise in deep learning, machine learning, data mining, and big data analytics.

At Academia Sinica, I applied advanced AI methods in healthcare and agriculture research, working with deep learning frameworks like PyTorch to improve medical imaging analysis and develop innovative solutions for agricultural challenges. These experiences strengthened my technical expertise while honing my skills in problem-solving and interdisciplinary collaboration.

Beyond research, I'm passionate about sports analytics, especially baseball, where I enjoy uncovering insights from data to support decision-making and strategy.

Education

MS in Data Science, Columbia University

BBA in MIS, NSYSU — GPA 3.97/4.0

Experience

Research Assistant, Academia Sinica

Healthcare & Agriculture AI Research

Interests

Deep Learning, XAI, MLOps

Sports Analytics & Data Visualization

Featured Projects

Showcase of my machine learning and data science work

NLP / RAG Computer Vision Published Research Sports Analytics Industry
Published — Elsevier 2026 Federated Learning · Agriculture

Orchid Double Spike Prediction with Federated Learning

Privacy-preserving AI system predicting double-spike rates in Phalaenopsis orchids for small-scale farms, enabling collaboration without data sharing.

  • Devised dual federated learning system merging YOLOv8 and TabNet
  • Achieved 0.82 prediction AUC, improving small farm accuracy from 0.69 to 0.775 via FL
  • Published in Engineering Applications of Artificial Intelligence (Elsevier, 2026)
Federated Learning YOLOv8 TabNet Computer Vision
Production Ready NLP · RAG · Cloud

Explainable AI for News Integrity

A production-ready fact-checking system using a 5-stage pipeline that analyzes news articles to detect misinformation with transparent, evidence-backed verdicts.

  • Architected hybrid pipeline integrating RoBERTa and Llama 3.1 for credibility analysis
  • Built retrieval engine with PostgreSQL/pgvector and Perplexity Search API
  • Deployed serverless infrastructure on Google Cloud Run with GCS FUSE
PyTorch NLP RAG GCP PostgreSQL
Applied Research Computer Vision · XAI

Deepfake Image Detection with XAI

Advanced computer vision system detecting deepfake images using CNNs and Vision Transformers with integrated Explainable AI for forensic analysis.

  • Constructed Multi-Source Domain Generalization framework (ResNet34)
  • Improved detection on unseen architectures by 20.6% with 96.9% accuracy
  • Synthesized Grad-CAM and Gemini 2.5 Flash for natural language forensic reports
Computer Vision PyTorch XAI Transformers Streamlit
Applied Research Time Series · Forecasting

Electricity Load Forecasting

End-to-end forecasting pipeline for 156 residential/commercial clients using the UCI LD2011-2014 dataset, covering preprocessing, user clustering, and six models across three complexity levels.

  • Benchmarked 6 models across 3 levels: AutoARIMA/AutoETS → SARIMAX/Prophet → iTransformer
  • AutoARIMA achieved best test MAPE of 15.33% with cluster-based disaggregation pipeline
  • Built Streamlit dashboard with Gemini 2.5 Flash AI agent for natural-language forecasting
Time Series Forecasting statsforecast iTransformer Streamlit
Kaggle Sports Analytics · Ensemble

MLB Hit Prediction with Swing Data

Predictive modeling system for MLB hit outcomes using advanced ensemble methods and deep learning on 50,000+ swing records.

  • Developed ensemble of DNN, LightGBM, and CatBoost models achieving 0.74 AUC
  • Implemented automatic hyperparameter optimization with Optuna
  • Conducted feature importance analysis with SHAP explanations
Sports Analytics Ensemble Learning XGBoost SHAP
Analytics Statistical Modeling · GAM

MLB Position Player Aging Analysis

Statistical analysis of MLB player performance decline using Generalized Additive Models to predict aging curves across key metrics.

  • Analyzed 16,625 player-seasons spanning 1980–2024
  • Applied GAM with tensor product splines for multi-feature interactions
  • Provided actionable insights for player evaluation and team strategy
Statistical Modeling GAM Sports Analytics Python

Technical Skills

Tools and technologies I work with

Programming Languages

Python C/C++ R SQL JavaScript HTML/CSS PHP

Machine Learning & AI

PyTorch TensorFlow Scikit-learn Keras Transformers XGBoost LightGBM SHAP

Data Engineering & Big Data

Apache Spark Hadoop Airflow Pandas Polars PostgreSQL ChromaDB

Cloud & DevOps

Google Cloud Platform Cloud Run Docker Git/GitHub CI/CD

Data Visualization

Matplotlib Seaborn Plotly Streamlit Power BI

Specialized Areas

Computer Vision NLP Explainable AI Federated Learning RAG Systems MLOps Time Series

Get In Touch

Open to collaboration, knowledge-sharing, and discussing the latest trends in data science