The Future of Sketching: A Touchless Drawing Experience with Gesture Recognition
Ajith Kumar M↗?
Building Complyance | E-invoicing simplified Globally | Malaysia LHDN & MDEC Peppol e-invoicing | KSA Zatca e-invoicing | UAE CTC e-invoicing Solutions Providers - Integrate with any ERP/CRM ?
Abstract:
Advancements in Artificial Intelligence (AI) technologies, especially computer vision, are reshaping research and creative processes. As the demand for seamless human-machine interaction increases, this paper introduces a Touchless User Interface (TUI). This system interprets hand gestures — rich, expressive forms of communication — to control computers without any physical contact with a keyboard, mouse, or screen. The paper demonstrates the application’s capacity to interpret hand gestures to sketch various shapes like circles, rectangles, and lines, or to employ freehand drawing and erasing functions. The system employs a convolutional neural network to segment and detect the human hand in real time against complex backgrounds.
Objective:
Develop a virtual canvas for sketching.
Existing System:
Proposed System:
In this project, a live video stream, captured using OpenCV, serves as the input. The system then interprets hand gestures, as identified through MediaPipe, to dictate the subsequent actions in the application. The outputcomprising the structures drawn by the user — is displayed in real time.
Let’s start Code Section:
we will learn how to create a touchless sketching application using hand gesture recognition. Our input is a live video stream, and we will use the?MediaPipe?library for hand gesture recognition and?OpenCV?for rendering and handling the video feed. The application recognizes different hand gestures to switch between various drawing tools. The entire code is in Python.
Let’s break down the code step by step:
Step 1: Import Necessary Libraries
We start by importing the necessary libraries:
import mediapipe as mp
import cv2
import numpy as np
import time
Step 2: Initialize Variables and Constants
ml = 150
max_x, max_y = 250 + ml, 50
curr_tool = "select tool"
time_init = True
rad = 40
var_inits = False
thick = 4
prevx, prevy = 0, 0
These are self-explanatory, representing various parameters like thickness of the line, previous x, and y coordinates, etc.
Step 3: Define Helper Functions
Step 4: Prepare Hand Tracking Model
We initialize the hand tracking model with some basic parameters.
hands = mp.solutions.hands
hand_landmark = hands.Hands(min_detection_confidence=0.6, min_tracking_confidence=0.6, max_num_hands=1)
draw = mp.solutions.drawing_utils
Step 5: Load Drawing Tools Image
We load an image which will be used to represent different drawing tools:
tools = cv2.imread("tools.png")
tools = tools.astype('uint8')
Step 6: Create a Mask
We create a white mask of the same size as our output window. This mask is used to draw the shapes.
mask = np.ones((480, 640)) * 255
mask = mask.astype('uint8')
领英推荐
Step 7: Capture Video Feed and Process
The main loop of the program where we read frames from the webcam, process them, detect the hands, and based on the gesture, draw different shapes or use different tools.
cap = cv2.VideoCapture(0)
while True:
# ... (all the hand detection and drawing logic)
This loop reads a frame from the webcam, flips it for a more natural interaction, processes it to detect hand landmarks, and uses these landmarks to draw with the selected tool.
Step 8: Drawing Tools
In this loop, based on the?curr_tool?variable, different shapes are drawn. For example, if?curr_tool?is set to "draw", we will draw freehand lines on the screen based on the position of our index finger.
Step 9: Display the Output
We display the processed frame, which includes our drawings:
cv2.imshow("paint app", frm)
Step 10: Exit Condition
The loop continues until the ‘Esc’ key is pressed:
if cv2.waitKey(1) == 27:
cv2.destroyAllWindows()
cap.release()
break
And that’s it! We’ve built a real-time touchless drawing application using Python, MediaPipe, and OpenCV. This application can recognize hand gestures to switch between various drawing tools. This technology can be very useful in various fields like interactive presentations, virtual reality, etc.
Complete Code:
import mediapipe as mp
mp_face_mesh = mp.solutions.face_mesh
import cv2
import numpy as np
import time
#contants
ml = 150
max_x, max_y = 250+ml, 50
curr_tool = "select tool"
time_init = True
rad = 40
var_inits = False
thick = 4
prevx, prevy = 0,0
#get tools function
def getTool(x):
if x < 50 + ml:
return "line"
elif x<100 + ml:
return "rectangle"
elif x < 150 + ml:
return"draw"
elif x<200 + ml:
return "circle"
else:
return "erase"
def index_raised(yi, y9):
if (y9 - yi) > 40:
return True
return False
hands = mp.solutions.hands
hand_landmark = hands.Hands(min_detection_confidence=0.6, min_tracking_confidence=0.6, max_num_hands=1)
draw = mp.solutions.drawing_utils
# drawing tools
tools = cv2.imread("tools.png")
tools = tools.astype('uint8')
mask = np.ones((480, 640))*255
mask = mask.astype('uint8')
'''
tools = np.zeros((max_y+5, max_x+5, 3), dtype="uint8")
cv2.rectangle(tools, (0,0), (max_x, max_y), (0,0,255), 2)
cv2.line(tools, (50,0), (50,50), (0,0,255), 2)
cv2.line(tools, (100,0), (100,50), (0,0,255), 2)
cv2.line(tools, (150,0), (150,50), (0,0,255), 2)
cv2.line(tools, (200,0), (200,50), (0,0,255), 2)
'''
cap = cv2.VideoCapture(0)
while True:
_, frm = cap.read()
frm = cv2.flip(frm, 1)
rgb = cv2.cvtColor(frm, cv2.COLOR_BGR2RGB)
op = hand_landmark.process(rgb)
if op.multi_hand_landmarks:
for i in op.multi_hand_landmarks:
draw.draw_landmarks(frm, i, hands.HAND_CONNECTIONS)
x, y = int(i.landmark[8].x*640), int(i.landmark[8].y*480)
if x < max_x and y < max_y and x > ml:
if time_init:
ctime = time.time()
time_init = False
ptime = time.time()
cv2.circle(frm, (x, y), rad, (0,255,255), 2)
rad -= 1
if (ptime - ctime) > 0.8:
curr_tool = getTool(x)
print("your current tool set to : ", curr_tool)
time_init = True
rad = 40
else:
time_init = True
rad = 40
if curr_tool == "draw":
xi, yi = int(i.landmark[12].x*640), int(i.landmark[12].y*480)
y9 = int(i.landmark[9].y*480)
if index_raised(yi, y9):
cv2.line(mask, (prevx, prevy), (x, y), 0, thick)
prevx, prevy = x, y
else:
prevx = x
prevy = y
elif curr_tool == "line":
xi, yi = int(i.landmark[12].x*640), int(i.landmark[12].y*480)
y9 = int(i.landmark[9].y*480)
if index_raised(yi, y9):
if not(var_inits):
xii, yii = x, y
var_inits = True
cv2.line(frm, (xii, yii), (x, y), (50,152,255), thick)
else:
if var_inits:
cv2.line(mask, (xii, yii), (x, y), 0, thick)
var_inits = False
elif curr_tool == "rectangle":
xi, yi = int(i.landmark[12].x*640), int(i.landmark[12].y*480)
y9 = int(i.landmark[9].y*480)
if index_raised(yi, y9):
if not(var_inits):
xii, yii = x, y
var_inits = True
cv2.rectangle(frm, (xii, yii), (x, y), (0,255,255), thick)
else:
if var_inits:
cv2.rectangle(mask, (xii, yii), (x, y), 0, thick)
var_inits = False
elif curr_tool == "circle":
xi, yi = int(i.landmark[12].x*640), int(i.landmark[12].y*480)
y9 = int(i.landmark[9].y*480)
if index_raised(yi, y9):
if not(var_inits):
xii, yii = x, y
var_inits = True
cv2.circle(frm, (xii, yii), int(((xii-x)**2 + (yii-y)**2)**0.5), (255,255,0), thick)
else:
if var_inits:
cv2.circle(mask, (xii, yii), int(((xii-x)**2 + (yii-y)**2)**0.5), (0,255,0), thick)
var_inits = False
elif curr_tool == "erase":
xi, yi = int(i.landmark[12].x*640), int(i.landmark[12].y*480)
y9 = int(i.landmark[9].y*480)
if index_raised(yi, y9):
cv2.circle(frm, (x, y), 30, (0,0,0), -1)
cv2.circle(mask, (x, y), 30, 255, -1)
op = cv2.bitwise_and(frm, frm, mask=mask)
frm[:, :, 1] = op[:, :, 1]
frm[:, :, 2] = op[:, :, 2]
frm[:max_y, ml:max_x] = cv2.addWeighted(tools, 0.7, frm[:max_y, ml:max_x], 0.3, 0)
cv2.putText(frm, curr_tool, (270+ml,30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
cv2.imshow("paint app", frm)
if cv2.waitKey(1) == 27:
cv2.destroyAllWindows()
cap.release()
break
Download the code, Github Page:?click here
Follow my Linkedin page:?https://www.dhirubhai.net/in/ajitharunai/
Conclusion:
The research introduces an effective hand identification system for air application sketching, boasting an experimental accuracy of 98.48%. This system has the potential to revolutionize traditional writing and teaching methodologies. Designed as a real-time application for spatial sketching on a two-dimensional surface, this technology offers substantial benefits, particularly for individuals with disabilities, seniors, or those who struggle with conventional input devices like keyboards.
Future Scope:
This system holds the promise of broad utility, including the control of IoT devices. As an exemplary tool for smart wearables, it will enable more intuitive interactions with digital environments. Augmented reality technologies could further enrich text and visual information. Importantly, future iterations should focus on securing the system, ensuring that air-writing only responds to authorized gestures. Additionally, impending object detection techniques, such as YOLO v3, may further enhance fingertip recognition accuracy and processing speed.
References:
[1] Chen, Qing, Nicolas D. Georganas, and Emil M. Petriu. “Real-time vision-based hand gesture recognition using haar-like features.” In Instrumentation and Measurement Technology Conference Proceedings, 2007. IMTC 2007. IEEE, pp. 1–6. IEEE, 2007.
[2] Garg, Pragati, Naveen Aggarwal, and Sanjeev Sofat. “Vision based hand gesture recognition.” World Academy of Science, Engineering and Technology 49, no. 1 (2009): 972- 977.
[3] Lienhart, Rainer, and Jochen Maydt. “An extended set of haar-like features for rapid object detection.” In Image Processing. 2002. Proceedings. 2002 International Conference on, vol. 1, pp. I-I. IEEE, 2002.
[4] Freeman, William T., and Michal Roth. “Orientation histograms for hand gesture recognition.” In International workshop on automatic face and gesture recognition, vol. 12, pp. 296–301. 1995.
[5] Flórez, Francisco, Juan Manuel García, José García, and Antonio Hernández. “Hand gesture recognition following the dynamics of a topology preserving network.” In Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on, pp. 318–323. IEEE, 2002.
Data Scientist Expert (ML, DL, Yolo) | Python | Computer Vision and Image Processing | NLP | Digital Marketing | Proficient in SQL | Django | Delving deeper into LLM | Committed to lifelong learning!
9 个月Hey buddy can you help me to add circle on my canvas app
13.9M+ Impressions | Content Writer | SEO Specialist | Data Scientist | Code plus Creativity
1 年Your work is always inspiring, and your blog has taught me so much about touchless drawing using AI ????