David Bai

I currently work at Mercor as a Machine Learning Engineer, on leave from the University of Southern California, where I studied Computer Engineering & Computer Science.

Recently (as of December 2025), I've been thinking about taste as a core part of model capabilities. I'm interested in all things intelligence— how it's embodied, aligned, and more. Always happy to book a chat or read any arxiv papers sent my way!

2025
Mercor
Mercor

Machine Learning Engineer

Machine Learning EngineerJun 2025 → Now

Full time now! Thinking hard about data for frontier models.

Machine Learning Engineer InternMay 2025 → Jun 2025

Rebuilding evals for and working on talent matching.

Paper, 1st Place in Apart Research's AI Control Hackathon.

We investigated if AI agents can attack their oversight systems ("judges") by hiding instructions in their Chain-of-Thought reasoning. Experiments showed that some judge models prioritize these hidden directives over safety rules, revealing varying levels of vulnerability across different models and highlighting the importance of designing robust AI evaluation mechanisms that are resistant to such manipulation.

Paper, ICRA 2025 Workshop on Task and Motion Planning

Classical planners guarantee success but struggle with multi-agent concurrency, while LLMs can decompose tasks using commonsense but lack guarantees. Our TwoStep approach combines these strengths, using LLMs to intelligently break down goals for multiple agents, similar to human experts. This method results in faster planning, shorter execution times than traditional approaches, and maintains the guarantees of classical planning, performing comparably to human expert strategies.

Blogpost/Website, finalist at the Mercor x Etched x Cognition Hackathon, sponsored by CoreWeave and Anthropic.

We wrote two research papers/blogs— one on efficiently pruning similar branches in parallel reasoning, and the other about injecting interruption tokens in LLMs, then sampling them in parallel to determine coherency for hallucination reduction. Check out my blogpost on the latter here.

Weaver
Weaver

Steerable Neural Search with Sparse Autoencoders

USC x Anthropic Hackathon Winner. A search engine that can autonomously steer SAE features for a given query to enhance search results. Works totally locally and offers agentic autosteering.

Sparse AutoencodersMLX and PyTorchNeuronpediaReactNext.jsFastAPI

Research Fellow

Research FellowFeb 2025 → May 2025

Working with Simon Lermen on evaluating the dangers of Language Model Agents. Working on a platform for automated red-teaming with reasoning models, and investigating Chain-of-Thought security.

Vigilante
Vigilante

Fact-check your Tweets uninterrupted and help crowdsource the truth.

Treehacks 2025 Winner: Best Data Visualization An anonymous web extension that runs low-latency fact-checking on your Twitter/X feed through Groq and Perplexity Sonar. Data is collated and analyzed on a live dashboard with realtime websockets.

PlasmoReactLLMsNext.jsSupabaseFastAPI
2024
Empathy
Empathy

Talk to an AI agent and find the people you wouldn't have otherwise!

CalHacks 11.0. Inspired by spending ~40 hours in SF but talking to almost no one outside of our team. We dynamically generate a custom artifact based off your experiences, and then use LLMs to search for the most esoteric and fun connections you can imagine. Recently deployed at USC's Socal Tech Week Hackathon w/300+ people.

Next.jsReactFlaskWhisperSupabaseGemini
GLAMOR Lab
GLAMOR Lab

Research Assistant

Research AssistantAug 2024 → May 2025

At GLAMOR, I work as a CURVE Fellow, where I worked on enabling multi-agent robot tasks using Large Language Models (LLMs) and Planning Domain Definition Language—extending 2-agent plans to n-agent plans and designing a scheduler to optimize concurrent agent plans for TwoStep. Figuring out what to do for my next project!

USC CAIS++
USC CAIS++

Cohort Member

Cohort MemberAug 2024 → May 2025

USC Center for AI in Society's Student Branch. Worked on leveraging inference-time compute for machine translation, previously built a hatespeech detector alongside several other projects for social good.

Rhoman Aerospace
Rhoman Aerospace

UAV Autonomy Contractor

UAV Autonomy ContractorSep 2024 → Dec 2024

Built a compute-dynamic ROS2 interface between Unreal Engine and NVIDIA VSLAM on Jetson Orin Nano.

Machine Learning & Computer Vision InternJun 2024 → Aug 2024

Developed a platform-agnostic visual odometry for GPS-denied navigation, including optical flow and homography modules for Extended Kalman Filter integration.

Capyble
Capyble

Productivity app that you don't hate courtesy of a cute mascot.

LA Hacks 2024. A cute capybara sprite that navigates your screen and offers a suite of productivity features. Talk to Capy, use Gemini vision to evaluate whether you're on task, and reschedule events with Google Calendar.

ElectronReactGoogle CalendarGeminiwxPython
Dynamic Robotics & Control Lab
Dynamic Robotics & Control Lab

Research Assistant

Research AssistantFeb 2024 → Nov 2024

Augmented model predictive control with reinforcement learning to improve gait of biped robots, based off previous work with quadrupeds. Created Pybullet simulation platform for an open-source bipedal robot, implementing unit tests and logging. Migrated ROS-based model predictive control to Pybullet and implemented reinforcement learning policies with OpenAI Gymnasium.

2023
Jargon
Jargon

Reimagine your language learning in-browser.

HackSC-X 2023 Winner, Iovine and Young Academy IGNITE Grant ($1,500), TroyLabs DEMO Pitch Winner($12,000), and more totalling to 21k+ in competitive funding! A Chrome extension that embeds language learning into your browsing experience. V1 launched, looking at some cool LLM-based machine translation techniques now. Give it a download!

PlasmoReactSupabaseLLMs
USC Makers
USC Makers

Project Manager

Project ManagerAug 2023 → May 2025

I'm leading a team of 7 in building an electroencephalography-based brain-computer interface for drone control, including design of analog circuit for filtering alpha, beta, and delta waves from raw EEG data. Check it out here!

MemberAug 2023 → May 2025

Trained a CNN to identify plant species for an autonomous greenhouse and designed/built system with Raspberry Pi for video streaming and Arduino for lighting and irrigation control.