Hi, I'm Tino
I'm a software engineer with a specialization in applied machine learning, data-centric AI, and agentic systems. I completed my B.S. in Electrical Engineering and Computer Science at UC Berkeley and my M.S. in Data Science at UC San Diego. My work spans production ML systems, full-stack software engineering, and empirical research on LLM reasoning, evaluation, and safety.
I'm most interested in breaking down real-world, customer facing problems into solutions that are both technically robust and human-centered, with a particular focus on interpretability and trustworthiness. See About for more information about my experience, or browse my Projects .
Featured
-
Database Reporting Agent: a multi-agent text-to-SQL pipeline
A multi-agent text-to-SQL pipeline over a 15+ table enterprise database, with schema resolution, query generation, guardrails, validation, caching, and an evaluation suite.
-
Data extraction after exact unlearning
Reproducing and extending Wu et al.'s Reversed Model Guidance attack against exact unlearning. Across WMDP and a synthetic medical dataset, RMG reliably outperforms unguided pre-unlearning generation, lifting A-ESR by up to ~63%, and reveals a "sweet spot" in forget-set ratio plus an inverse relationship between memorization and the optimal guidance scale.
Recent Posts
-
Does differential privacy solve copyright?
A walkthrough of why generative AI scrambles two centuries of US copyright doctrine, the proposed technical fixes — differential privacy, near access-freeness, clean-room training — and why none of them are actually copyright protection. Memorization ≠ infringement. Privacy ≠ copyright.
-
Importance-weighted fine-tuning for relation extraction
Implementing ATLANTIS (Liu et al., ACL 2025) — an importance-weighted weak-to-strong fine-tuning method — from scratch and applying it to sentence-level relation extraction across encoder–decoder (Flan-T5) and decoder-only (Qwen2) models on SemEval-2010 Task 8 and CoNLL2004.
-
Scalable oversight via adversarial deception in resume screening
Applying the Engels et al. (2025) scalable oversight framework to resume screening. We model the task as an adversarial Houdini–Guard game and measure how well a weaker Guard can detect a stronger Houdini's deceptive selections, fitting domain Elo curves across 8 models and 200 games per pair.
-
Steering chain-of-thought length — and what it does to faithfulness
Reproducing ThinkEdit's interpretable weight edits to mitigate overly short chain-of-thought reasoning, then extending the analysis with ChainScope's IPHR faithfulness evaluation across the Qwen3 family (0.6B–8B).