Create Portrait Mode Effect with Segment Anything Model 2 (SAM2)

4 Mins read

Have you ever admired how smartphone cameras isolate the main subject from the background, adding a subtle blur to the background based on depth? This “portrait mode” effect gives photographs a professional look by simulating shallow depth-of-field similar to DSLR cameras. In this tutorial, we’ll recreate this effect programmatically using open-source computer vision models, like SAM2 from Meta and MiDaS from Intel ISL.

To build our pipeline, we’ll use:

  1. Segment Anything Model (SAM2): To segment objects of interest and separate the foreground from the background.
  2. Depth Estimation Model: To compute a depth map, enabling depth-based blurring.
  3. Gaussian Blur: To blur the background with intensity varying based on depth.

Step 1: Setting Up the Environment

To get started, install the following dependencies:

pip install matplotlib samv2 pytest opencv-python timm pillow

Step 2: Loading a Target Image

Choose a picture to apply this effect and load it into Python using the Pillow library.

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

image_path = "<path to your image>.jpg"
img =
img_array = np.array(img)

# Display the image

Step 3: Initialize the SAM2

To initialize the model, download the pretrained checkpoint. SAM2 offers four variants based on performance and inference speed: tiny, small, base_plus, and large. In this tutorial, we’ll use tiny for faster inference.

Download the model checkpoint from:<model_type>.pt

Replace <model_type> with your desired model type.

from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.utils.misc import variant_to_config_mapping
from sam2.utils.visualization import show_masks

model = build_sam2(
image_predictor = SAM2ImagePredictor(model)

Step 4: Feed Image into SAM and Select the Subject

Set the image in SAM and provide points that lie on the subject you want to isolate. SAM predicts a binary mask of the subject and background.

input_point = np.array([[2500, 1200], [2500, 1500], [2500, 2000]])
input_label = np.array([1, 1, 1])

masks, scores, logits = image_predictor.predict(
output_mask = show_masks(img_array, masks, scores)
sorted_ind = np.argsort(scores)[::-1]

Step 5: Initialize the Depth Estimation Model

For depth estimation, we use MiDaS by Intel ISL. Similar to SAM, you can choose different variants based on accuracy and speed.Note: The predicted depth map is reversed, meaning larger values correspond to closer objects. We’ll invert it in the next step for better intuitiveness.

import torch
import torchvision.transforms as transforms

model_type = "DPT_Large"  # MiDaS v3 - Large (highest accuracy)

# Load MiDaS model
model = torch.hub.load("intel-isl/MiDaS", model_type)

# Load and preprocess image
transform = torch.hub.load("intel-isl/MiDaS", "transforms").dpt_transform
input_batch = transform(img_array)

# Perform depth estimation
with torch.no_grad():
    prediction = model(input_batch)
    prediction = torch.nn.functional.interpolate(

prediction = prediction.cpu().numpy()

# Visualize the depth map
plt.imshow(prediction, cmap="plasma")
plt.colorbar(label="Relative Depth")
plt.title("Depth Map Visualization")

Step 6: Apply Depth-Based Gaussian Blur

Here we optimize the depth-based blurring using an iterative Gaussian blur approach. Instead of applying a single large kernel, we apply a smaller kernel multiple times for pixels with higher depth values.

import cv2

def apply_depth_based_blur_iterative(image, depth_map, base_kernel_size=7, max_repeats=10):
    if base_kernel_size % 2 == 0:
        base_kernel_size += 1

    # Invert depth map
    depth_map = np.max(depth_map) - depth_map

    # Normalize depth to range [0, max_repeats]
    depth_normalized = cv2.normalize(depth_map, None, 0, max_repeats, cv2.NORM_MINMAX).astype(np.uint8)

    blurred_image = image.copy()

    for repeat in range(1, max_repeats + 1):
        mask = (depth_normalized == repeat)
        if np.any(mask):
            blurred_temp = cv2.GaussianBlur(blurred_image, (base_kernel_size, base_kernel_size), 0)
            for c in range(image.shape[2]):
                blurred_image[..., c][mask] = blurred_temp[..., c][mask]

    return blurred_image

blurred_image = apply_depth_based_blur_iterative(img_array, prediction, base_kernel_size=35, max_repeats=20)

# Visualize the result
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.title("Original Image")

plt.subplot(1, 2, 2)
plt.title("Depth-based Blurred Image")

Step 7: Combine Foreground and Background

Finally, use the SAM mask to extract the sharp foreground and combine it with the blurred background.

def combine_foreground_background(foreground, background, mask):
    if mask.ndim == 2:
        mask = np.expand_dims(mask, axis=-1)
    return np.where(mask, foreground, background)

mask = masks[sorted_ind[0]].astype(np.uint8)
mask = cv2.resize(mask, (img_array.shape[1], img_array.shape[0]))
foreground = img_array
background = blurred_image

combined_image = combine_foreground_background(foreground, background, mask)

plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.title("Original Image")

plt.subplot(1, 2, 2)
plt.title("Final Portrait Mode Effect")


With just a few tools, we’ve recreated the portrait mode effect programmatically. This technique can be extended for photo editing applications, simulating camera effects, or creative projects.

Future Enhancements:

  1. Use edge detection algorithms for better refinement of subject edges.
  2. Experiment with kernel sizes to enhance the blur effect.
  3. Create a user interface to upload images and select subjects dynamically.


  1. Segment anything model by META (
  2. CPU compatible implementation of SAM 2 (
  3. MIDas Depth Estimation Model (

Vineet Kumar is a consulting intern at MarktechPost. He is currently pursuing his BS from the Indian Institute of Technology(IIT), Kanpur. He is a Machine Learning enthusiast. He is passionate about research and the latest advancements in Deep Learning, Computer Vision, and related fields.

Source link

Related posts

Google AI Releases Gemini 2.0 Flash Thinking model (gemini-2.0-flash-thinking-exp-01-21): Scoring 73.3% on AIME (Math) and 74.2% on GPQA Diamond (Science) Benchmarks

3 Mins read
Artificial Intelligence has made significant strides, yet some challenges persist in advancing multimodal reasoning and planning capabilities. Tasks that demand abstract reasoning,…

What are Haystack Agents? A Comprehensive Guide to Tool-Driven NLP with Code Implementation

4 Mins read
Modern NLP applications often demand multi-step reasoning, interaction with external tools, and the ability to adapt dynamically to user queries. Haystack Agents,…

OpenAI ups its lobbying efforts nearly seven-fold

3 Mins read
This new AI energy race is inseparable from the positioning of AI as essential for national security and US competitiveness with China….



Leave a Reply

Your email address will not be published. Required fields are marked *