Sensor-invariant Tactile Representation For Zero-shot Transfer Across Vision-based Tactile Sensors

Trending 2 weeks ago
ARTICLE AD BOX

Tactile sensing is simply a important modality for intelligent systems to comprehend and interact pinch nan beingness world. The GelSight sensor and its variants person emerged arsenic influential tactile technologies, providing elaborate accusation astir interaction surfaces by transforming tactile information into ocular images. However, vision-based tactile sensing lacks transferability betwixt sensors owed to creation and manufacturing variations, which consequence successful important differences successful tactile signals. Minor differences successful optical creation aliases manufacturing processes tin create important discrepancies successful sensor output, causing machine learning models trained connected 1 sensor to execute poorly erstwhile applied to others.

Computer imagination models person been wide applied to vision-based tactile images owed to their inherently ocular nature. Researchers person adapted practice learning methods from nan imagination community, pinch contrastive learning being celebrated for processing tactile and visual-tactile representations for circumstantial tasks. Auto-encoding practice approaches are besides explored, pinch immoderate researchers utilizing Masked Auto-Encoder (MAE) to study tactile representations. Methods for illustration general-purpose multimodal representations utilize aggregate tactile datasets successful LLM frameworks, encoding sensor types arsenic tokens. Despite these efforts, existent methods often require ample datasets, dainty sensor types arsenic fixed categories, and deficiency nan elasticity to generalize to unseen sensors.

Researchers from nan University of Illinois Urbana-Champaign projected Sensor-Invariant Tactile Representations (SITR), a tactile practice to transportation crossed various vision-based tactile sensors successful a zero-shot manner. It is based connected nan premise that achieving sensor transferability requires learning effective sensor-invariant representations done vulnerability to divers sensor variations. It uses 3 halfway innovations: utilizing easy-to-acquire calibration images to qualify individual sensors pinch a transformer encoder, utilizing supervised contrastive learning to stress geometric aspects of tactile information crossed aggregate sensors, and processing a large-scale synthetic dataset that contains 1M examples crossed 100 sensor configurations.

Researchers utilized nan tactile image and a group of calibration images for nan sensor arsenic inputs for nan network. The sensor inheritance is subtracted from each input images to isolate nan pixel-wise colour changes. Following Vision Transformer (ViT), these images are linearly projected into tokens, pinch calibration images requiring tokenization only erstwhile per sensor. Further, 2 supervision signals guideline nan training process: a pixel-wise normal representation reconstruction nonaccomplishment for nan output spot tokens and a contrastive nonaccomplishment for nan people token. During pre-training, a lightweight decoder reconstructs nan interaction aboveground arsenic a normal representation from nan encoder’s output. Moreover, SITR  employs Supervised Contrastive Learning (SCL), extending accepted contrastive approaches by utilizing explanation accusation to specify similarity.

In entity classification tests utilizing nan researchers’ real-world dataset, SITR outperforms each baseline models erstwhile transferred crossed different sensors. While astir models execute good successful no-transfer settings, they neglect to generalize erstwhile tested connected chopped sensors. It shows SITR’s expertise to seizure meaningful, sensor-invariant features that stay robust contempt changes successful nan sensor domain. In airs estimation tasks, wherever nan extremity is to estimate 3-DoF position changes utilizing first and last tactile images, SITR reduces nan Root Mean Square Error by astir 50% compared to baselines. Unlike classification results, ImageNet pre-training only marginally improves airs estimation performance, showing that features learned from earthy images whitethorn not transportation efficaciously to tactile domains for precise regression tasks.

In this paper, researchers introduced SITR, a tactile practice model that transfers crossed various vision-based tactile sensors successful a zero-shot manner. They constructed large-scale, sensor-aligned datasets utilizing synthetic and real-world information and developed a method to train SITR to seizure dense, sensor-invariant features. The SITR represents a measurement toward a unified attack to tactile sensing, wherever models tin generalize seamlessly crossed different sensor types without retraining aliases fine-tuning. This breakthrough has nan imaginable to accelerate advancements successful robotic manipulation and tactile investigation by removing a cardinal obstruction to nan take and implementation of these promising sensor technologies.


Check out the Paper and Code. All in installments for this investigation goes to nan researchers of this project. Also, feel free to travel america on Twitter and don’t hide to subordinate our 85k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference connected OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 p.m. PST) + Hands connected Workshop [Sponsored]

Sajjad Ansari is simply a last twelvemonth undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into nan applicable applications of AI pinch a attraction connected knowing nan effect of AI technologies and their real-world implications. He intends to articulate analyzable AI concepts successful a clear and accessible manner.

More