Building Your Own Real-Time Object Detection App: Roboflow(YOLOv8) and Streamlit (Part 5)

How to deploy the app in Streamlit Cloud

Eduardo Padron
10 min readAug 15, 2023

In this Medium post, we’ll embark on a journey to deploy a cutting-edge Object Detection and Tracking App using Streamlit Share. We’ll explore how to leverage the potential of Streamlit to create an intuitive and user-friendly interface that not only showcases your object detection model but also provides real-time tracking capabilities.

Whether you’re a seasoned developer looking to enhance your deployment skills or someone new to the world of web applications, this tutorial will equip you with the knowledge and tools to showcase your object detection and tracking prowess to the world. Let’s dive in and unlock the potential of Streamlit Cloud for deploying our Object Detection and Tracking App that we have been creating together in this series.

Why is this different from part 3? We didn’t face problems with libraries that need some changes until part 4 and when you deploy a streamlit app, the app is running on the cloud and when you call the command cv2.VideoCapture() , it searches for the camera on the server instead of the user’s browser. So when you are running it locally, your server has a camera and so it works perfectly fine.

So, what is the solution for this?.. You need to use the tool named streamlit-webrtc . streamlit-webrtc, is a component that enables Streamlit to handle real-time media streams over a network to solve this problem. In this part , I’ll also briefly introduce you to WebRTC (check out the article from the creator of the library article here for more in-depth info on WebRTC). If you want to jump right to playing with the component here is a sample app.

Ready? Let’s dive in.

The problem with existing approaches

Streamlit is actively used by many developers and researchers to prototype apps backed with computer vision and machine learning models, but it can’t yet natively support real-time video processing.

One existing approach to achieve real-time video processing with Streamlit is to use OpenCV to capture video streams. However, this only works when the Python process can access the video source — in other words, only when the camera is connected to the same host the app is running on.

Due to this limitation, there have always been problems with deploying the app to remote hosts and using it with video streams from local webcams. cv2.VideoCapture(0) consumes a video stream from the first (indexed as 0) locally connected device, and when the app is hosted on a remote server, the video source is a camera device connected to the server - not a local webcam. In a Raspberry Pi we can confirm this running this command in the command linels/dev/video* .

How WebRTC resolves this issue

WebRTC (Web Real-Time Communication) enables web servers and clients, including web browsers, to send and receive video, audio, and arbitrary data streams over the network with low latency.

It is now supported by major browsers like Chrome, Firefox, and Safari, and its specs are open and standardized. Browser-based real-time video chat apps like Google Meet are common examples of WebRTC usage.

WebRTC extends Streamlit’s powerful capabilities to transmit video, audio, and arbitrary data streams between frontend and backend processes, like browser JavaScript and server-side Python.

If you want to know more about these WebRTC concepts, read this article.

This tutorial will use Streamlit-WebRTC and Pytube.

Install dependencies

Install the necessary packages. Note that this tutorial work with the latest version of all libraries mentioned before but now that we have our project create a copy and in the requirements.txt file we will add the next lines:

git+https://github.com/fullmakeralchemist/pytube/
git+https://github.com/gatagat/lap@new-packaging
streamlit==1.2.0
streamlit-webrtc==0.35.1
torch==2.0.1
ultralytics==8.0.150

We will focus on the first line because in part 4 we made changes in one file from the library but how we can do that in Streamlit deployment, unfortunately Streamlit share doesn’t have a terminal to work so how can we make changes in the library? I forked the repository of the library

Once I had the fork in my github profile I look for the file Cipher.py and then cliked the edit file button to modify the line number 30 for:var_regex = re.compile(r"^\w+\W") to var_regex = re.compile(r"^$*\w+\W"). With this change we can save the file and add the url to the first line in the requirements.txt file.

The changes in the code of OpenCV to streamlit-webrtc

The advantages of web-based apps

We have been typically using OpenCV to build real-time demo apps of image or video processing. Some of you may have seen the following code or similar many times.

import cv2

cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
img = cv2.Canny(frame, 100, 200) # Some image processing
cv2.imshow('frame', img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()

Compared to the GUI apps like above using cv2.VideoCapture and cv2.imshow that run on local environments, web-based apps have some advantages like below.

Easy to share and run:

  • If we deploy the apps on the cloud, we can share the apps with our users simply by sending the URLs.
  • The users can use the apps only by accessing them through web browsers. It does not require any set-ups or external dependencies.

Usable on smartphones:

  • Because all the users need is web browsers, the users can use the apps on their smartphones. It’s convenient if we can show demos on such portable devices.

Install necessary packages

Next, we have to install the packages necessary for this guide. Is not necessary, only if you are going to run it locally but this guide is to run it in the Streamlit share so with having the correct name in the requirements.txt file is only we need.

$ pip install streamlit-webrtc
  • streamlit-webrtc: A custom component of Streamlit which deals with real-time video and audio streams.

Introduce the real-time video/audio streaming component

Create a file app.py as below.

import streamlit as st
from streamlit_webrtc import webrtc_streamer

st.title("My first Streamlit app")
st.write("Hello, world")
webrtc_streamer(key="example")

We have added a single line with webrtc_streamer(). The web app would be like the screenshot below.

At the first trial, it may take some time to compile the package so that the page keeps showing the “running” message for a while after clicking the “Rerun” button. In such a case, wait for the process to finish.

Click the “START” button to start the video and audio streaming. You may be asked for permission to access the webcam and microphone at the first trial. Allow permission in that case.

The webrtc_streamer(key="example") above is a Streamlit component which deals with video and audio real-time I/O through web browsers. The key argument is a unique ID in the script to identify the component instance. We have set it as "example" here, but you can use any string for it. The component in this example only receives video and audio from the client-side webcam and microphone and outputs the raw streams. It's the most basic version of the component. We are going to enhance its functionality by adding other options in the following sections.

Development of a real-time video processing application

We will use the code from part 4 and we only have to update the helper.py as follows.

from ultralytics import YOLO
import streamlit as st
from streamlit_webrtc import webrtc_streamer, VideoTransformerBase
import numpy as np
from PIL import Image
import av
import cv2
from pytube import YouTube

import settings


def load_model(model_path):
"""
Loads a YOLO object detection model from the specified model_path.

Parameters:
model_path (str): The path to the YOLO model file.

Returns:
A YOLO object detection model.
"""
model = YOLO(model_path)
return model


def display_tracker_options():
display_tracker = st.radio("Display Tracker", ('Yes', 'No'))
is_display_tracker = True if display_tracker == 'Yes' else False
if is_display_tracker:
tracker_type = st.radio("Tracker", ("bytetrack.yaml", "botsort.yaml"))
return is_display_tracker, tracker_type
return is_display_tracker, None


def _display_detected_frames(conf, model, st_frame, image, is_display_tracking=None, tracker=None):
"""
Display the detected objects on a video frame using the YOLOv8 model.

Args:
- conf (float): Confidence threshold for object detection.
- model (YoloV8): A YOLOv8 object detection model.
- st_frame (Streamlit object): A Streamlit object to display the detected video.
- image (numpy array): A numpy array representing the video frame.
- is_display_tracking (bool): A flag indicating whether to display object tracking (default=None).

Returns:
None
"""

# Resize the image to a standard size
image = cv2.resize(image, (720, int(720*(9/16))))

# Display object tracking, if specified
if is_display_tracking:
res = model.track(image, conf=conf, persist=True, tracker=tracker)
else:
# Predict the objects in the image using the YOLOv8 model
res = model.predict(image, conf=conf)

# # Plot the detected objects on the video frame
res_plotted = res[0].plot()
st_frame.image(res_plotted,
caption='Detected Video',
channels="BGR",
use_column_width=True
)


def play_youtube_video(conf, model):
"""
Plays a webcam stream. Detects Objects in real-time using the YOLOv8 object detection model.

Parameters:
conf: Confidence of YOLOv8 model.
model: An instance of the `YOLOv8` class containing the YOLOv8 model.

Returns:
None

Raises:
None
"""
source_youtube = st.sidebar.text_input("YouTube Video url")

is_display_tracker, tracker = display_tracker_options()

if st.sidebar.button('Detect Objects'):
try:
yt = YouTube(source_youtube)
stream = yt.streams.filter(file_extension="mp4", res=720).first()
vid_cap = cv2.VideoCapture(stream.url)

st_frame = st.empty()
while (vid_cap.isOpened()):
success, image = vid_cap.read()
if success:
_display_detected_frames(conf,
model,
st_frame,
image,
is_display_tracker,
tracker
)
else:
vid_cap.release()
break
except Exception as e:
st.sidebar.error("Error loading video: " + str(e))


def play_rtsp_stream(conf, model):
"""
Plays an rtsp stream. Detects Objects in real-time using the YOLOv8 object detection model.

Parameters:
conf: Confidence of YOLOv8 model.
model: An instance of the `YOLOv8` class containing the YOLOv8 model.

Returns:
None

Raises:
None
"""
source_rtsp = st.sidebar.text_input("rtsp stream url")
is_display_tracker, tracker = display_tracker_options()
if st.sidebar.button('Detect Objects'):
try:
vid_cap = cv2.VideoCapture(source_rtsp)
st_frame = st.empty()
while (vid_cap.isOpened()):
success, image = vid_cap.read()
if success:
_display_detected_frames(conf,
model,
st_frame,
image,
is_display_tracker,
tracker
)
else:
vid_cap.release()
break
except Exception as e:
st.sidebar.error("Error loading RTSP stream: " + str(e))




def play_webcam(conf, model):
"""
Plays a webcam stream. Detects Objects in real-time using the YOLO object detection model.

Returns:
None

Raises:
None
"""
st.sidebar.title("Webcam Object Detection")

webrtc_streamer(
key="example",
video_transformer_factory=lambda: MyVideoTransformer(conf, model),
rtc_configuration={"iceServers": [{"urls": ["stun:stun.l.google.com:19302"]}]},
media_stream_constraints={"video": True, "audio": False},
)

class MyVideoTransformer(VideoTransformerBase):
def __init__(self, conf, model):
self.conf = conf
self.model = model

def recv(self, frame):
image = frame.to_ndarray(format="bgr24")
processed_image = self._display_detected_frames(image)
st.image(processed_image, caption='Detected Video', channels="BGR", use_column_width=True)

def _display_detected_frames(self, image):
orig_h, orig_w = image.shape[0:2]
width = 720 # Set the desired width for processing

# cv2.resize used in a forked thread may cause memory leaks
input = np.asarray(Image.fromarray(image).resize((width, int(width * orig_h / orig_w))))

if self.model is not None:
# Perform object detection using YOLO model
res = self.model.predict(input, conf=self.conf)

# Plot the detected objects on the video frame
res_plotted = res[0].plot()
return res_plotted

return input



def play_stored_video(conf, model):
"""
Plays a stored video file. Tracks and detects objects in real-time using the YOLOv8 object detection model.

Parameters:
conf: Confidence of YOLOv8 model.
model: An instance of the `YOLOv8` class containing the YOLOv8 model.

Returns:
None

Raises:
None
"""
source_vid = st.sidebar.selectbox(
"Choose a video...", settings.VIDEOS_DICT.keys())

is_display_tracker, tracker = display_tracker_options()

with open(settings.VIDEOS_DICT.get(source_vid), 'rb') as video_file:
video_bytes = video_file.read()
if video_bytes:
st.video(video_bytes)

if st.sidebar.button('Detect Video Objects'):
try:
vid_cap = cv2.VideoCapture(
str(settings.VIDEOS_DICT.get(source_vid)))
st_frame = st.empty()
while (vid_cap.isOpened()):
success, image = vid_cap.read()
if success:
_display_detected_frames(conf,
model,
st_frame,
image,
is_display_tracker,
tracker
)
else:
vid_cap.release()
break
except Exception as e:
st.sidebar.error("Error loading video: " + str(e))

We have to define a callback that receives an input frame and returns an output frame. We also need to put image processing code inside the callback. As a result, we have injected the image processing code into the real-time video app through the callback.

Detailed explanations about the code follow.

  1. play_webcam(conf, model): This function plays a webcam video stream. It uses the webrtc_streamer function from the streamlit_webrtc library to capture the webcam feed and process frames in real-time using the MyVideoTransformer class.
  2. MyVideoTransformer(VideoTransformerBase): This class is a subclass of VideoTransformerBase provided by streamlit_webrtc. It initializes with the confidence threshold and YOLOv8 model. The recv method processes each frame received from the webcam feed and displays the detected objects.

Deploy the app to the cloud

We are going to make the web app available to everyone by deploying it to the cloud. To deploy the app to the cloud, we have to add rtc_configuration parameter to the webrtc_streamer(). And also add media_stream_constraints to avoid using audio with the webcam.

webrtc_streamer(
key="example",
video_frame_callback=callback,
rtc_configuration={ # Add this line
"iceServers": [{"urls": ["stun:stun.l.google.com:19302"]}]
},
media_stream_constraints={"video": True, "audio": False}
)

This configuration is necessary to establish the media streaming connection when the server is on a remote host.

streamlit_webrtc uses WebRTC for its video and audio streaming. It has to access a "STUN server" in the global network for the remote peers (precisely, peers over the NATs) to establish WebRTC connections. While we don't look at the details about STUN servers in this article, please google it with keywords such as STUN, TURN, or NAT traversal if interested.

We configured the code to use a free STUN server provided by Google in the example above. You can also use any other available STUN servers. After this changes we are ready to run the app in Streamlit Share following the same steps in part 3 to create a repo and then deploy it.

We can see in the image above that in the manage app function from Streamlit is running but unfutonately at this moment is not showing in the webcam the object frames I need to do a deep investigation about Streamlit-WebRTC to solve this but it detects the objects as showen below.

I’ll update this post when I fix it, for the moment is everything and with this you should be able to run the app without problems in Streamlit Share. Check my demo and check my github repository you can clone it to make the necessary changes for your project.

Keep an eye I added one extra part to upgrade your model I recommend you part 6 to Enhancing Active Learning and improve your model with new data. In case you want to use the app in Raspberry Pi check parts 7, part 8 and part 9.If you find errors following this or feedback about this guide let me know in the comments, thank you for following this post. Good luck with your projects.

--

--

Eduardo Padron
Eduardo Padron

Written by Eduardo Padron

Data Scientist and enthusiast of IoT projects.

Responses (3)