The Future of Sketching: A Touchless Drawing Experience with Gesture Recognition

The Future of Sketching: A Touchless Drawing Experience with Gesture Recognition

Abstract:

Advancements in Artificial Intelligence (AI) technologies, especially computer vision, are reshaping research and creative processes. As the demand for seamless human-machine interaction increases, this paper introduces a Touchless User Interface (TUI). This system interprets hand gestures — rich, expressive forms of communication — to control computers without any physical contact with a keyboard, mouse, or screen. The paper demonstrates the application’s capacity to interpret hand gestures to sketch various shapes like circles, rectangles, and lines, or to employ freehand drawing and erasing functions. The system employs a convolutional neural network to segment and detect the human hand in real time against complex backgrounds.

Objective:

Develop a virtual canvas for sketching.

  • Utilize the human finger as a color marker in sketches.
  • Perform necessary morphological operations.
  • Establish an interactive interface between the user and the system.

No alt text provided for this image

Existing System:

  1. The current system is restricted to finger inputs, with no support for additional tools like highlighters or paints.
  2. Isolating and identifying an object, such as a finger, from an RGB image without a depth sensor is challenging.
  3. Due to the absence of depth detection, tracking the vertical movements of the pen is not possible.

Proposed System:

In this project, a live video stream, captured using OpenCV, serves as the input. The system then interprets hand gestures, as identified through MediaPipe, to dictate the subsequent actions in the application. The outputcomprising the structures drawn by the user — is displayed in real time.

Let’s start Code Section:

we will learn how to create a touchless sketching application using hand gesture recognition. Our input is a live video stream, and we will use the?MediaPipe?library for hand gesture recognition and?OpenCV?for rendering and handling the video feed. The application recognizes different hand gestures to switch between various drawing tools. The entire code is in Python.

Let’s break down the code step by step:

Step 1: Import Necessary Libraries

We start by importing the necessary libraries:

import mediapipe as mp
import cv2
import numpy as np
import time        

  • mediapipe: This library contains the tools we need for hand tracking.
  • cv2: OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning software library.
  • numpy: NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices.
  • time: We use the time library to create delays when selecting tools.

Step 2: Initialize Variables and Constants

ml = 150
max_x, max_y = 250 + ml, 50
curr_tool = "select tool"
time_init = True
rad = 40
var_inits = False
thick = 4
prevx, prevy = 0, 0        

These are self-explanatory, representing various parameters like thickness of the line, previous x, and y coordinates, etc.

Step 3: Define Helper Functions

  • getTool(x): Based on the x-coordinate, this function determines which tool is selected. This can be a line, rectangle, circle, drawing tool, or eraser.
  • index_raised(yi, y9): This function determines if the index finger is raised by comparing the y-coordinates of different finger landmarks.

Step 4: Prepare Hand Tracking Model

We initialize the hand tracking model with some basic parameters.

hands = mp.solutions.hands
hand_landmark = hands.Hands(min_detection_confidence=0.6, min_tracking_confidence=0.6, max_num_hands=1)
draw = mp.solutions.drawing_utils        

Step 5: Load Drawing Tools Image

We load an image which will be used to represent different drawing tools:

tools = cv2.imread("tools.png")
tools = tools.astype('uint8')        

Step 6: Create a Mask

We create a white mask of the same size as our output window. This mask is used to draw the shapes.

mask = np.ones((480, 640)) * 255
mask = mask.astype('uint8')        

Step 7: Capture Video Feed and Process

The main loop of the program where we read frames from the webcam, process them, detect the hands, and based on the gesture, draw different shapes or use different tools.

cap = cv2.VideoCapture(0)
while True:
 # ... (all the hand detection and drawing logic)        

This loop reads a frame from the webcam, flips it for a more natural interaction, processes it to detect hand landmarks, and uses these landmarks to draw with the selected tool.

Step 8: Drawing Tools

In this loop, based on the?curr_tool?variable, different shapes are drawn. For example, if?curr_tool?is set to "draw", we will draw freehand lines on the screen based on the position of our index finger.

Step 9: Display the Output

We display the processed frame, which includes our drawings:

cv2.imshow("paint app", frm)        

Step 10: Exit Condition

The loop continues until the ‘Esc’ key is pressed:

if cv2.waitKey(1) == 27:
 cv2.destroyAllWindows()
 cap.release()
 break        

And that’s it! We’ve built a real-time touchless drawing application using Python, MediaPipe, and OpenCV. This application can recognize hand gestures to switch between various drawing tools. This technology can be very useful in various fields like interactive presentations, virtual reality, etc.

Complete Code:

import mediapipe as mp
mp_face_mesh = mp.solutions.face_mesh
import cv2
import numpy as np
import time

#contants
ml = 150
max_x, max_y = 250+ml, 50
curr_tool = "select tool"
time_init = True
rad = 40
var_inits = False
thick = 4
prevx, prevy = 0,0
        

#get tools function

def getTool(x):
 if x < 50 + ml:
  return "line"

 elif x<100 + ml:
  return "rectangle"

 elif x < 150 + ml:
  return"draw"

 elif x<200 + ml:
  return "circle"

 else:
  return "erase"

def index_raised(yi, y9):
 if (y9 - yi) > 40:
  return True

 return False



hands = mp.solutions.hands
hand_landmark = hands.Hands(min_detection_confidence=0.6, min_tracking_confidence=0.6, max_num_hands=1)
draw = mp.solutions.drawing_utils


# drawing tools
tools = cv2.imread("tools.png")
tools = tools.astype('uint8')

mask = np.ones((480, 640))*255
mask = mask.astype('uint8')
'''
tools = np.zeros((max_y+5, max_x+5, 3), dtype="uint8")
cv2.rectangle(tools, (0,0), (max_x, max_y), (0,0,255), 2)
cv2.line(tools, (50,0), (50,50), (0,0,255), 2)
cv2.line(tools, (100,0), (100,50), (0,0,255), 2)
cv2.line(tools, (150,0), (150,50), (0,0,255), 2)
cv2.line(tools, (200,0), (200,50), (0,0,255), 2)
'''

cap = cv2.VideoCapture(0)
while True:
 _, frm = cap.read()
 frm = cv2.flip(frm, 1)

 rgb = cv2.cvtColor(frm, cv2.COLOR_BGR2RGB)

 op = hand_landmark.process(rgb)

 if op.multi_hand_landmarks:
  for i in op.multi_hand_landmarks:
   draw.draw_landmarks(frm, i, hands.HAND_CONNECTIONS)
   x, y = int(i.landmark[8].x*640), int(i.landmark[8].y*480)

   if x < max_x and y < max_y and x > ml:
    if time_init:
     ctime = time.time()
     time_init = False
    ptime = time.time()

    cv2.circle(frm, (x, y), rad, (0,255,255), 2)
    rad -= 1

    if (ptime - ctime) > 0.8:
     curr_tool = getTool(x)
     print("your current tool set to : ", curr_tool)
     time_init = True
     rad = 40

   else:
    time_init = True
    rad = 40

   if curr_tool == "draw":
    xi, yi = int(i.landmark[12].x*640), int(i.landmark[12].y*480)
    y9  = int(i.landmark[9].y*480)

    if index_raised(yi, y9):
     cv2.line(mask, (prevx, prevy), (x, y), 0, thick)
     prevx, prevy = x, y

    else:
     prevx = x
     prevy = y



   elif curr_tool == "line":
    xi, yi = int(i.landmark[12].x*640), int(i.landmark[12].y*480)
    y9  = int(i.landmark[9].y*480)

    if index_raised(yi, y9):
     if not(var_inits):
      xii, yii = x, y
      var_inits = True

     cv2.line(frm, (xii, yii), (x, y), (50,152,255), thick)

    else:
     if var_inits:
      cv2.line(mask, (xii, yii), (x, y), 0, thick)
      var_inits = False

   elif curr_tool == "rectangle":
    xi, yi = int(i.landmark[12].x*640), int(i.landmark[12].y*480)
    y9  = int(i.landmark[9].y*480)

    if index_raised(yi, y9):
     if not(var_inits):
      xii, yii = x, y
      var_inits = True

     cv2.rectangle(frm, (xii, yii), (x, y), (0,255,255), thick)

    else:
     if var_inits:
      cv2.rectangle(mask, (xii, yii), (x, y), 0, thick)
      var_inits = False

   elif curr_tool == "circle":
    xi, yi = int(i.landmark[12].x*640), int(i.landmark[12].y*480)
    y9  = int(i.landmark[9].y*480)

    if index_raised(yi, y9):
     if not(var_inits):
      xii, yii = x, y
      var_inits = True

     cv2.circle(frm, (xii, yii), int(((xii-x)**2 + (yii-y)**2)**0.5), (255,255,0), thick)

    else:
     if var_inits:
      cv2.circle(mask, (xii, yii), int(((xii-x)**2 + (yii-y)**2)**0.5), (0,255,0), thick)
      var_inits = False

   elif curr_tool == "erase":
    xi, yi = int(i.landmark[12].x*640), int(i.landmark[12].y*480)
    y9  = int(i.landmark[9].y*480)

    if index_raised(yi, y9):
     cv2.circle(frm, (x, y), 30, (0,0,0), -1)
     cv2.circle(mask, (x, y), 30, 255, -1)



 op = cv2.bitwise_and(frm, frm, mask=mask)
 frm[:, :, 1] = op[:, :, 1]
 frm[:, :, 2] = op[:, :, 2]

 frm[:max_y, ml:max_x] = cv2.addWeighted(tools, 0.7, frm[:max_y, ml:max_x], 0.3, 0)

 cv2.putText(frm, curr_tool, (270+ml,30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
 cv2.imshow("paint app", frm)

 if cv2.waitKey(1) == 27:
  cv2.destroyAllWindows()
  cap.release()
  break        
Download the code, Github Page:?click here
Follow my Linkedin page:?https://www.dhirubhai.net/in/ajitharunai/

Conclusion:

The research introduces an effective hand identification system for air application sketching, boasting an experimental accuracy of 98.48%. This system has the potential to revolutionize traditional writing and teaching methodologies. Designed as a real-time application for spatial sketching on a two-dimensional surface, this technology offers substantial benefits, particularly for individuals with disabilities, seniors, or those who struggle with conventional input devices like keyboards.


Future Scope:
This system holds the promise of broad utility, including the control of IoT devices. As an exemplary tool for smart wearables, it will enable more intuitive interactions with digital environments. Augmented reality technologies could further enrich text and visual information. Importantly, future iterations should focus on securing the system, ensuring that air-writing only responds to authorized gestures. Additionally, impending object detection techniques, such as YOLO v3, may further enhance fingertip recognition accuracy and processing speed.

References:

[1] Chen, Qing, Nicolas D. Georganas, and Emil M. Petriu. “Real-time vision-based hand gesture recognition using haar-like features.” In Instrumentation and Measurement Technology Conference Proceedings, 2007. IMTC 2007. IEEE, pp. 1–6. IEEE, 2007.

[2] Garg, Pragati, Naveen Aggarwal, and Sanjeev Sofat. “Vision based hand gesture recognition.” World Academy of Science, Engineering and Technology 49, no. 1 (2009): 972- 977.

[3] Lienhart, Rainer, and Jochen Maydt. “An extended set of haar-like features for rapid object detection.” In Image Processing. 2002. Proceedings. 2002 International Conference on, vol. 1, pp. I-I. IEEE, 2002.

[4] Freeman, William T., and Michal Roth. “Orientation histograms for hand gesture recognition.” In International workshop on automatic face and gesture recognition, vol. 12, pp. 296–301. 1995.

[5] Flórez, Francisco, Juan Manuel García, José García, and Antonio Hernández. “Hand gesture recognition following the dynamics of a topology preserving network.” In Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on, pp. 318–323. IEEE, 2002.

Real Python Python Coding Python Learning Machine Learning Mastery Artificial Intelligence News


Manish Kumar

Data Scientist Expert (ML, DL, Yolo) | Python | Computer Vision and Image Processing | NLP | Digital Marketing | Proficient in SQL | Django | Delving deeper into LLM | Committed to lifelong learning!

9 个月

Hey buddy can you help me to add circle on my canvas app

Sanket Sarwade

13.9M+ Impressions | Content Writer | SEO Specialist | Data Scientist | Code plus Creativity

1 年

Your work is always inspiring, and your blog has taught me so much about touchless drawing using AI ????

要查看或添加评论,请登录

社区洞察

其他会员也浏览了