News

Mark Tech Post
marktechpost. com > 04/15/2026 > google-deepmind-releases-gemini-robotics-er-1-6-bringing-enhanced-embodied-reasoning-and-instrument-reading-to-physical-ai

Google Deep Mind Releases Gemini Robotics-ER 1. 6: Bringing Enhanced Embodied Reasoning and Instrument Reading to Physical AI

3+ hour, 31+ min ago  (1097+ words) Google Deep Mind research team introduced Gemini Robotics-ER 1. 6, a significant upgrade to its embodied reasoning model designed to serve as the "cognitive brain" of robots operating in real-world environments. The model specializes in reasoning capabilities critical for robotics, including visual…...

Mark Tech Post
marktechpost. com > 04/12/2026 > a-coding-implementation-of-molmoact-for-depth-aware-spatial-reasoning-visual-trajectory-tracing-and-robotic-action-prediction

A Coding Implementation of Molmo Act for Depth-Aware Spatial Reasoning, Visual Trajectory Tracing, and Robotic Action Prediction

2+ day, 14+ hour ago  (274+ words) We set up the tutorial and prepared the environment needed to run Molmo Act in Google Colab. We install all required packages, import the core libraries, and configure the runtime to detect whether GPU acceleration is available. We also define…...

Mark Tech Post
marktechpost. com > 04/10/2026 > a-coding-guide-to-markerless-3d-human-kinematics-with-pose2sim-rtmpose-and-opensim

A Coding Guide to Markerless 3 D Human Kinematics with Pose2 Sim, RTMPose, and Open Sim

4+ day, 14+ hour ago  (1038+ words) In this tutorial, we build and run a complete Pose2 Sim pipeline on Colab to understand how markerless 3 D kinematics works in practice. We begin with environment setup, configure the project for Colab's headless runtime, and then walk through calibration, 2 D…...

Mark Tech Post
marktechpost. com > 04/03/2026 > tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts

TII Releases Falcon Perception: A0. 6 B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

1+ week, 5+ day ago  (376+ words) In the current landscape of computer vision, the standard operating procedure involves a modular "Lego-brick" approach: a pre-trained vision encoder for feature extraction paired with a separate decoder for task prediction. While effective, this architectural separation complicates scaling and bottlenecks…...

Mark Tech Post
marktechpost. com > 03/23/2026 > yann-lecuns-new-leworldmodel-lewm-research-targets-jepa-collapse-in-pixel-based-predictive-world-modeling

Yann Le Cun's New Le World Model (Le WM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling

3+ week, 1+ day ago  (202+ words) Le WM consists of two primary components learned jointly: an Encoder and a Predictor. The model is optimized using a streamlined objective function consisting of only two loss terms: As per the research paper, applying a dropout rate of 0. 1 in…...

Mark Tech Post
marktechpost. com > 03/03/2026 > physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks

Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4 B VLAs 15-Minute Context for Complex Tasks

1+ mon, 1+ week ago  (194+ words) MEM factorizes robotic memory into two distinct scales to balance semantic context with real-time control constraints. The computational complexity is reduced from O(n2 K2) to O(Kn2+n K2), where n is the number of spatial patches and K is the number of…...

Mark Tech Post
marktechpost. com > 02/20/2026 > nvidia-releases-dreamdojo-an-open-source-robot-world-model-trained-on-44711-hours-of-real-world-human-video-data

NVIDIA Releases Dream Dojo: An Open-Source Robot World Model Trained on 44, 711 Hours of Real-World Human Video Data

1+ mon, 3+ week ago  (333+ words) Building simulators for robots has been a long term challenge. Traditional engines require manual coding of physics and perfect 3 D models. NVIDIA is changing this with Dream Dojo, a fully open-source, generalizable robot world model. Instead of using a physics…...

Mark Tech Post
marktechpost. com > 02/08/2026 > meet-oat-the-new-action-tokenizer-bringing-llm-style-scaling-and-flexible-anytime-inference-to-the-robotics-world

Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World

2+ mon, 6+ day ago  (150+ words) A team of researchers from Harvard University and Stanford University have released a new framework called Ordered Action Tokenization (OAT) to bridge this gap. Tokenization turns complex data into a sequence of discrete numbers (tokens). For robots, these actions are…...

Mark Tech Post
marktechpost. com > 01/29/2026 > a-coding-deep-dive-into-differentiable-computer-vision-with-kornia-using-geometry-optimization-loftr-matching-and-gpu-augmentations

A Coding Deep Dive into Differentiable Computer Vision with Kornia Using Geometry Optimization, Lo FTR Matching, and GPU Augmentations

2+ mon, 2+ week ago  (673+ words) We implement an advanced, end-to-end Kornia tutorial and demonstrate how modern, differentiable computer vision can be built entirely in Py Torch. We start by constructing GPU-accelerated, synchronized augmentation pipelines for images, masks, and keypoints, then move into differentiable geometry by…...

Mark Tech Post
marktechpost. com > 01/29/2026 > ant-group-releases-lingbot-vla-a-vision-language-action-foundation-model-for-real-world-robot-manipulation

Ant Group Releases Ling Bot-VLA, A Vision Language Action Foundation Model For Real World Robot Manipulation

2+ mon, 2+ week ago  (294+ words) The pre-training dataset is built from real world teleoperation on 9 popular dual arm configurations. These include Agi Bot G1, Agile X, Galaxea R1 Lite, Galaxea R1 Pro, Realman Rs 02, Leju KUAVO 4 Pro, Qinglong humanoid, ARX Lift2, and a Bimanual Franka setup. All systems have…...