From audio tapes to AI

AI funding has spent the last two years rewarding the appearance of general intelligence. Robotics, meanwhile, keeps reminding investors that embodiment is a different problem. The latest signal comes from TDK Ventures, where investment director Ankur Saxena argues that foundation models and generative AI will not, by themselves, produce capable robots. In his framing, robotics needs physical AI: systems that connect model outputs to the messier realities of sensors, motion, and control.

That distinction matters because robotics is now where software narratives meet hardware constraints. A language model can answer quickly, but a robot still has to know where its joints are, what its cameras can actually see, how much latency its control stack can tolerate, and whether a grasp will hold once the object shifts. The difference between a demo and a deployable product is often the difference between abstract intelligence and embodied reliability.

Saxena’s argument lands as a useful market signal because it reflects a shift in how capital may evaluate robotics startups. The winners are unlikely to be the teams with the most fluent model demos. They are more likely to be the teams that can show sensor fusion maturity, kinematics that match the physical platform, and closed-loop feedback that survives contact with the real world.

The 4Ps of physical AI

Saxena’s framework is simple enough to remember and technical enough to matter. He breaks physical AI into four pillars: Perception, Planning, Proprioception, and Physical embodiment. Read literally, the model is a reminder that a robot is not just an inference engine with motors attached.

  • Perception is the machine’s ability to sense the world through cameras, lidar, microphones, force sensors, depth sensors, and other modalities. In practice, this is not about one sensor being “good enough.” It is about sensor fusion: combining imperfect inputs into a coherent estimate of the environment.
  • Planning turns that sensory picture into action selection. For robotics, planning cannot be evaluated only by whether the plan looks elegant in software. It has to account for latency, collision risk, uncertainty, and the possibility that the world changes between one cycle and the next.
  • Proprioception is the robot’s internal awareness of its own state: joint positions, torque, speed, balance, and posture. Without it, even a strong perception stack can fail because the machine does not know where it is in its own kinematic chain.
  • Physical embodiment is the final constraint. Hardware design, actuation, power management, stiffness, compliance, and closed-loop feedback all shape whether the intended motion can actually be executed.

The value of the 4Ps framework is that it turns “AI for robotics” from a vague category into a systems problem. It also helps product teams separate what can be improved with larger models from what requires changes in the robot itself.

Why foundation models alone aren’t enough for robots

The current AI cycle has encouraged a dangerous simplification: if models can reason across text, images, code, and audio, then robotic competence should follow as a straightforward extension. Saxena pushes back on that assumption, and the technical reason is clear. Robotics is governed by timing, friction, uncertainty, and error accumulation. Those are not the native strengths of a general-purpose foundation model.

A robot does not operate in isolated prompts. It operates in continuous time. Every action depends on the last sensed state, and every sensed state is already slightly stale by the time the controller acts on it. That is why closed-loop feedback is not optional. A plan that is good in simulation or on a whiteboard can fail if the control loop is too slow, the calibration drifts, the actuator saturates, or the object being manipulated moves unpredictably.

This is where sensor fusion becomes more than an architecture buzzword. A robot working in a warehouse, lab, factory, or hospital must reconcile conflicting inputs from multiple sources. Vision may be occluded, depth may be noisy, force readings may lag, and environmental changes may invalidate a previous estimate. If the system cannot integrate those signals in real time, the model’s intelligence is trapped above the level where physical action happens.

Kinematics presents another hard boundary. Robots are not generic bodies. Each platform has its own joint geometry, range of motion, payload limits, and motion constraints. A model that generates a plausible action sequence still has to respect the platform’s kinematic structure. That means any serious robotics stack needs explicit mapping between high-level intent and low-level actuation.

This is the central point in the current debate over AI and robotics: foundation models can improve interfaces, perception, and task decomposition, but they do not erase the physics of embodiment. They may make robots easier to instruct. They do not automatically make them reliable.

Product, engineering, and deployment implications

For builders, the practical conclusion is that robotics product strategy has to move upstream into hardware-software co-design. Teams that treat the model as the product will likely run into the same limits repeatedly: unreliable grasping, brittle navigation, poor adaptation to changing environments, and expensive field failures. Teams that treat the system as a coupled stack can design around those limits earlier.

That has several technical implications.

First, product rollout needs to be staged around what the hardware can verify. A robotics system that performs well in a controlled demo may still fail under variable lighting, floor conditions, payload shifts, or sensor degradation. Rollout plans should therefore include instrumentation, telemetry, and recovery logic from the start, not as post-launch patches.

Second, sensor selection matters as much as model selection. A robust system may need redundant sensing, carefully calibrated timestamp alignment, and fusion logic that can degrade gracefully when one channel becomes unreliable. In practice, that often means designing for observability before optimizing for raw autonomy.

Third, control architecture has to be measurable. Teams should ask not only whether the robot “works,” but whether the closed-loop system can be audited: how quickly it responds, how it handles drift, how it recovers from missed detections, and what its failure envelopes look like.

Fourth, deployment environments should shape the roadmap. A warehouse picker, surgical assistant, and consumer service robot do not share the same risk profile, even if they all sit under the robotics umbrella. The more dynamic and unstructured the environment, the more central physical AI becomes.

For product managers and systems engineers, this reframes the benchmark conversation. Success is not just model accuracy, benchmark wins, or demo virality. It is whether a platform can sustain repeatable behavior under real-world conditions. That is a much stricter standard, but it is the one robotics has always had to meet.

Investment and market positioning in a hardware-forward era

Saxena’s perspective also has implications for capital allocation. If physical AI is the gating factor for useful robots, then investors will need to adjust how they judge software-heavy robotics narratives. A startup with a compelling model layer but weak sensing, weak actuation, or weak control integration may look better in a slide deck than in a deployment.

In a hardware-forward market, capital should increasingly prize evidence of three things: mature sensor fusion, reliable kinematics, and demonstrated closed-loop feedback. Those capabilities are not easy to fake, and they usually require time, iteration, and platform-specific expertise. That makes them better indicators of durable moat than model novelty alone.

This also affects market positioning. Robotics companies that can show they understand the full stack of physical AI will likely be better placed to win enterprise trust, especially where downtime, safety, and maintenance costs matter. By contrast, companies that overindex on generalized AI branding may struggle if their systems cannot tolerate the variability of actual deployment environments.

There is a funding implication here as well. Corporate investors such as TDK Ventures are structurally closer to the component and systems side of the industry, which may make them more sensitive to the bottlenecks that pure software investors can miss. That does not guarantee better outcomes, but it does suggest a different lens: one grounded in hardware constraints, not just model scaling narratives.

The broader market question is whether capital will follow that lens. If it does, the next wave of winners may not be the companies promising that AI alone will transform robotics. They may be the ones doing the slower work of making intelligence physically legible, mechanically reliable, and deployable in the real world.