Search Results

MarkTechPost
marktechpost.com > 07/18/2026 > nvidia-released-deepstream-9-1-bringing-agentic-ai-to-vision-ai-with-13-skills-and-multi-view-3d-tracking

NVIDIA Released DeepStream 9.1: Bringing Agentic AI to Vision AI With 13 Skills and Multi-View 3D Tracking

9+ hour, 43+ min ago (357+ words) Describe a multi-camera pipeline in plain language. A coding agent then builds and deploys it. Building on that base, version 9.1 adds five notable items: Among those additions, MV3DT is the main skill, so consider how it works. At its core, MV3DT projects…...

MarkTechPost
marktechpost.com > 07/17/2026 > sakana-ais-error-diffusion-trains-dale-compliant-dual-stream-networks-reaching-96-7-mnist-and-61-7-cifar-10-without-backpropagation

NVIDIA AI Releases Nemotron 3 Embed: An Open Embedding Collection Whose 8B Checkpoint Ranks #1 on RTEB

22+ hour, 26+ min ago (160+ words) A biologically plausible learning rule reaches 96.7% on MNIST and a 61.7% CIFAR-10 baseline, then extends to reinforcement learning with PPO — all while keeping weights non-negative. Backpropagation dominates deep learning, yet it uses a mechanism the brain likely cannot. Specifically, the backward…...

MarkTechPost
marktechpost.com > 07/14/2026 > mistral-ai-releases-robostral-navigate-an-8b-model-enabling-robots-to-navigate-complex-environments-using-a-single-rgb-camera

Mistral AI Releases Robostral Navigate: An 8B Model Enabling Robots to Navigate Complex Environments Using a Single RGB Camera

4+ day, 21+ hour ago (360+ words) Mistral AI has released Robostral Navigate, its first model built for embodied navigation. The 8B model takes RGB images and a plain-language instruction, then moves a robot. Notably, it reaches 76.6% success on R2R-CE validation unseen using only a single RGB camera....

MarkTechPost
marktechpost.com > 07/11/2026 > ant-groups-robbyant-unveils-lingbot-va-2-0

Ant Group's Robbyant Unveils LingBot-VA 2.0: A Causal Video-Action Model Built Natively for Physical AI

1+ week, 21+ hour ago (571+ words) Robbyant, the embodied AI unit inside Ant Group, has released the LingBot-VA 2.0.The first embodied-native foundation model. It describes a video-action foundation model for generalist robot manipulation. The research team pretrains the whole stack for embodiment instead of fine-tuning a…...

MarkTechPost
marktechpost.com > 07/09/2026 > meet-lingbot-world-infinity-an-open-causal-world-model-with-an-agentic-harness

Meet LingBot-World-Infinity: An Open Causal World Model With An Agentic Harness

1+ week, 2+ day ago (668+ words) Robbyant, Ant Group’s embodied-intelligence unit, has released LingBot-World-Infinity (LingBot-World 2.0). It is a causal video generation model that behaves as an interactive world simulator. It is how the team attacks two failure modes: long-horizon drift and interactive latency. An interactive world…...

MarkTechPost
marktechpost.com > 07/07/2026 > ant-groups-robbyant-open-sources-lingbot-vision-a-1b-boundary-centric-vision-foundation-model-for-dense-spatial-perception

Ant Group’s Robbyant Open-Sources LingBot-Vision: A 1B Boundary-Centric Vision Foundation Model for Dense Spatial Perception

1+ week, 4+ day ago (435+ words) Robbyant, the embodied-AI company within Ant Group, has open-sourced LingBot-Vision, a family of self-supervised Vision Transformers built for dense spatial perception. The weights ship under Apache-2.0 on Hugging Face in four sizes — ViT-giant, ViT-large, ViT-base, and ViT-small — together with a…...

MarkTechPost
marktechpost.com > 07/03/2026 > nvidia-ai-introduces-aspire-a-self-improving-robotics-framework-reaching-31-zero-shot-on-libero-pro-long-tasks

NVIDIA AI Introduces ASPIRE: A Self-Improving Robotics Framework Reaching 31% Zero-Shot on LIBERO-Pro Long Tasks

2+ week, 22+ hour ago (429+ words) A team of researchers from NVIDIA, University of Michigan, UIUC, UC Berkeley, and CMU introduces ASPIRE (Agentic Skill Programming through Iterative Robot Exploration). It is a continual learning system that writes and refines robot control programs. It also distills validated…...

MarkTechPost
marktechpost.com > 06/19/2026 > nvidia-ai-introduce-spatialclaw-a-training-free-agent-that-treats-code-as-the-action-interface-for-spatial-reasoning

NVIDIA AI Introduce SpatialClaw: A Training-Free Agent That Treats Code as the Action Interface for Spatial Reasoning

4+ week, 1+ day ago (483+ words) NVIDIA Research has released SpatialClaw, a training-free framework for spatial reasoning. It targets a persistent weakness in vision-language models (VLMs). These models still struggle to judge where objects are, how they relate, and how they move in 3D. SpatialClaw does not…...

MarkTechPost
marktechpost.com > 06/16/2026 > meet-qwen-robotsuite-three-embodied-ai-models-for-vla-manipulation-video-world-modeling-and-navigation

Meet Qwen-RobotSuite: Three Embodied AI Models for VLA Manipulation, Video World Modeling, and Navigation

1+ mon, 2+ day ago (1335+ words) The Qwen team has released three embodied AI models, grouped as Qwen-Robot-Suite. The three are Qwen-RobotManip, Qwen-RobotWorld, and Qwen-RobotNav. Each is built on a Qwen vision-language backbone and targets a different robotics problem. Qwen-RobotManip is a Vision-Language-Action model for manipulation,…...

MarkTechPost
marktechpost.com > 06/03/2026 > nvidia-releases-cosmos-3-a-two-tower-mixture-of-transformers-foundation-model-unifying-physical-reasoning-world-generation-and-action-generation

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

1+ mon, 2+ week ago (716+ words) Physical AI systems must understand the world before acting in it. Robots and vehicles need to perceive, predict, and then act. Earlier Cosmos releases split these jobs across separate models. Cosmos 3 unifies them with a Mixture-of-Transformers (MoT) architecture. The architecture…...

Web

News