A Code Implementation Of Monocular Depth Estimation Using Intel Midas Open Source Model On Google Colab With Pytorch And Opencv

Trending 2 weeks ago
ARTICLE AD BOX

Monocular extent estimation involves predicting segment extent from a azygous RGB image—a basal task successful machine imagination pinch wide-ranging applications, including augmented reality, robotics, and 3D segment understanding. In this tutorial, we instrumentality Intel’s MiDaS (Monocular Depth Estimation via a Multi-Scale Vision Transformer), a state-of-the-art exemplary designed for high-quality extent prediction from a azygous image. Leveraging Google Colab arsenic nan compute platform, on pinch PyTorch, OpenCV, and Matplotlib, this tutorial enables you to upload your image and visualize nan corresponding extent maps easily.

!pip instal -q timm opencv-python matplotlib

First, we instal nan basal Python libraries—timm for exemplary support, opencv-python for image processing, and matplotlib for visualizing nan extent maps.

!git clone https://github.com/isl-org/MiDaS.git %cd MiDaS

Then, we clone nan charismatic Intel MiDaS repository from GitHub and navigate into its directory to entree nan exemplary codification and translator utilities.

import torch import cv2 import matplotlib.pyplot arsenic plt import numpy arsenic np from PIL import Image from torchvision.transforms import Compose from google.colab import files from midas.dpt_depth import DPTDepthModel from midas.transforms import Resize, NormalizeImage, PrepareForNet device = torch.device("cuda" if torch.cuda.is_available() other "cpu")

We import each nan basal libraries and MiDaS components required for loading nan model, preprocessing images, handling uploads, and visualizing extent predictions. Then we group nan computation instrumentality to GPU (CUDA) if available; otherwise, it defaults to CPU, ensuring strategy compatibility.

model_path = torch.hub.load("intel-isl/MiDaS", "DPT_Large", pretrained=True, force_reload=True) model = model_path.to(device) model.eval()

Here, we download nan pretrained MiDaS DPT_Large exemplary from Intel’s torch.hub, moves it to nan selected instrumentality (CPU aliases GPU), and sets it to information mode for inference.

transform = Compose([ Resize(384, 384, resize_target=None, keep_aspect_ratio=True, ensure_multiple_of=32, resize_method="upper_bound"), NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), PrepareForNet() ])

We specify MiDaS’s image preprocessing pipeline, which resizes nan input image, normalizes its pixel values, and formats it appropriately for exemplary inference.

uploaded = files.upload() for filename successful uploaded: img = cv2.imread(filename) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) break

We let nan personification to upload an image successful Colab, publication it utilizing OpenCV, and person it from BGR to RGB format for meticulous colour representation.

img_input = transform({"image": img})["image"] input_tensor = torch.from_numpy(img_input).unsqueeze(0).to(device) with torch.no_grad(): prediction = model(input_tensor) prediction = torch.nn.functional.interpolate( prediction.unsqueeze(1), size=img.shape[:2], mode="bicubic", align_corners=False, ).squeeze() depth_map = prediction.cpu().numpy()

Now, we use nan preprocessing toggle shape to nan uploaded image, person it to a tensor, execute extent prediction utilizing nan MiDaS model, resize nan output to lucifer nan original image dimensions, and extract nan last extent representation arsenic a NumPy array.

plt.figure(figsize=(10, 5)) plt.subplot(1, 2, 1) plt.imshow(img) plt.title("Original Image") plt.axis("off") plt.subplot(1, 2, 2) plt.imshow(depth_map, cmap='inferno') plt.title("Depth Map") plt.axis("off") plt.tight_layout() plt.show()

Finally, we create a side-by-side visualization of nan original image and its corresponding extent representation utilizing Matplotlib. The extent representation is displayed utilizing nan ‘inferno’ colormap for amended contrast.

In conclusion, by completing this tutorial, we’ve successfully deployed Intel’s MiDaS exemplary connected Google Colab to execute monocular extent estimation utilizing conscionable an RGB image. Using PyTorch for exemplary inference, OpenCV for image processing, and Matplotlib for visualization, we’ve built a robust pipeline to make high-quality extent maps pinch minimal setup. This implementation is simply a beardown instauration for further exploration, including video extent estimation, real-time applications, and integration of AR/VR systems.


Here is nan Colab Notebook. Also, don’t hide to travel america on Twitter and subordinate our Telegram Channel and LinkedIn Group. Don’t Forget to subordinate our 85k+ ML SubReddit.

Asif Razzaq is nan CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing nan imaginable of Artificial Intelligence for societal good. His astir caller endeavor is nan motorboat of an Artificial Intelligence Media Platform, Marktechpost, which stands retired for its in-depth sum of instrumentality learning and heavy learning news that is some technically sound and easy understandable by a wide audience. The level boasts of complete 2 cardinal monthly views, illustrating its fame among audiences.

More
rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy rb.gy