Kevin Jennings

BrowPrompt for Computer Vision-Based AI Interaction

BrowPrompt Click for demo

🤔 The Problem: Why I Built BrowPrompt

Is it just me, or is something left to be desired in how we interface with generative AI?

Throughout my day, I routinely consult language models for answers. However, I have not found an effective way to access these tools without:

  1. constantly navigating to a web app; sifting through all my tabs, only to find that I’ve been automatically logged out.
  2. having my microphone always on and listening: “Hey Siri!” — Even though consumers have resigned to having applications always listening, it is hard to believe this is the preferred status quo.

As AI becomes a more useful copilot, I am interested in holistically exploring new ways to interact with it. This inspired me to develop BrowPrompt, a solution designed to streamline this process with a unique approach.

đź’ˇ The Solution: Exploring Applications of Computer Vision for Improved Interaction

What if your facial expression could be used to identify when you are prompting an AI model?

Using facial expressions as a trigger enables hands-free interaction with the model, eliminating the need to type a prompt, press a button, or use a vocal command like “Hey Siri.”

BrowPrompt, my computer vision-based prototype, leverages an eyebrow raise lasting over two seconds to trigger AI-prompting. This facial cue is simple yet deliberate, minimizing false positives since extended eyebrow raises are uncommon in regular activities.

Here’s how it works:

🚀 User Benefits:

👨‍🏫 How to Use:

  1. User sets the sliding scale to calibrate the distance from their face to the camera (can be changed while the program is running).
  2. When the program detects a brow raise longer than 2 seconds, it will play an audible ping and the user can begin dictating their prompt.
  3. User can choose to copy response generated in the text box using the “Copy Text” button.

đź’» Tech Specs:

Technologies Used:

# Draw rectangles around the detected faces
for (x, y, w, h) in faces:
  cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
  # Converting the OpenCV rectangle coordinates to Dlib rectangle
  dlib_rect = dlib.rectangle(int(x), int(y), int(x+w), int(y+h))
  # Detecting landmarks
  detected_landmarks = predictor(frame, dlib_rect).parts()
  # Converting to np matrix
  landmarks = np.array([[p.x, p.y] for p in detected_landmarks])
  #logic to measure distance between eyes and eye brows
  LE_1 = np.linalg.norm(landmarks[39] - landmarks[21])
  LE_2 = np.linalg.norm(landmarks[38] - landmarks[20])
  LE_3 = np.linalg.norm(landmarks[37] - landmarks[19])
  RE_1 = np.linalg.norm(landmarks[42] - landmarks[22])
  RE_2 = np.linalg.norm(landmarks[43] - landmarks[23])
  RE_3 = np.linalg.norm(landmarks[44] - landmarks[24])

code block shows how OpenCV, dlib, and NumPy interact: OpenCV detects faces, while dlib converts into a format suitable for landmark detection. The dlib predictor identifies key facial landmarks, which are converted into a NumPy array for easier manipulation. The code then calculates the Euclidean distances between specific eye and eyebrow landmarks to determine if the eyebrows are raised, leading to microphone trigger event

Architecture and Design:

Roadmap: