Deep Dive into the Code

Advanced code blocks with highlighting, file names and more.

Features of the Recognition Systems

Dynamic Facial Recognition: Utilizes real-time webcam feed for instant facial recognition and verification using Haar Cascade classifier and face_recognition library.
Pre-Loaded Face Encodings: Efficiently handles known face data through pre-loaded encodings, enabling quick matching and identification processes.
Adaptive Sound Analysis: Employs YAMNet, a powerful sound event recognition model, to accurately detect specific sounds like gunshots in different auditory environments.
Continuous Monitoring and Processing: Capable of ongoing monitoring, the system constantly processes both visual and auditory data streams for immediate response.
Robust Face Encoding Generation: Includes a dedicated module to generate and update face encodings, ensuring the system adapts to new individuals over time.
Data Management for Individuals: Streamlines data handling by adding and updating individual profiles (such as students) with unique identifiers and relevant information.
Versatile Application: Designed for integration in diverse environments, particularly educational settings, enhancing security and monitoring capabilities.

These are important snippets for conceptual understanding, to see more go to the github.

Visual

This section initializes the facial recognition process. It includes loading pre-trained face encodings, initializing the Haar Cascade classifier for face detection, and setting up the webcam feed.

main.py - loader code

import os
import pickle
import numpy as np
import cv2
import face_recognition
from datetime import datetime
 
# Load the encoding file
print("Loading Encode File ...")
file = open('EncodeFile.p', 'rb')
encodeListKnownWithIds = pickle.load(file)
file.close()
encodeListKnown, studentIds = encodeListKnownWithIds
print("Encode File Loaded")
 
# Initialize the Haar Cascade classifier for face detection
face_cascade = cv2.CascadeClassifier('./haarcascade_frontalface_default.xml')
 
cap = cv2.VideoCapture(0)
cap.set(3, 1920)
cap.set(4, 1080)

Here, the code continually reads frames from the webcam, processes each frame to detect faces, and performs face recognition. If a face is recognized as known (verified), it's marked and logged; otherwise, it's flagged as unauthorized.

main.py - main inference code

 
while True:
    success, img = cap.read()
 
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
 
    # Perform face detection using the Haar Cascade classifier
    faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))
 
    for (x, y, w, h) in faces:
        # Crop the detected face region
        face_img = img[y:y+h, x:x+w]
        face_img_rgb = cv2.cvtColor(face_img, cv2.COLOR_BGR2RGB)
 
        # Perform face recognition using face_recognition library
        face_encodings = face_recognition.face_encodings(face_img_rgb)
 
        if len(face_encodings) > 0:
            # Compare face encodings with known encodings
            face_encoding = face_encodings[0]
            matches = face_recognition.compare_faces(encodeListKnown, face_encoding)
            face_distances = face_recognition.face_distance(encodeListKnown, face_encoding)
            match_index = np.argmin(face_distances)
 
            if matches[match_index]:
                # Detected face is a known face
                cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
                cv2.putText(img, "Verified", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
 
                # Retrieve the person ID and update the last attendance time in Firebase
                person_id = studentIds[match_index]
                current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
                person_ref = db.collection("Schools").document(new_school_id).collection("students").document(person_id)
                person_ref.update({"last_attendance_time": current_time})
 
            else:
                # Detected face is an unauthorized face
                cv2.rectangle(img, (x, y), (x+w, y+h), (0, 0, 255), 2)
                cv2.putText(img, "Unauthorized", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 2)

This function finds and encodes faces from a given set of images, essential for building the database of known faces.

encode-generator.py - face encodings code

def findEncodings(imagesList):
    encodeList = []
    for img, imgPath in zip(imagesList, imgPaths):
        try:
            if img is None:
                print(f"Empty image at path: {imgPath}. Skipping...")
                continue
            
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            faceEncodings = face_recognition.face_encodings(img)
            if len(faceEncodings) > 0:
                encode = faceEncodings[0]
                encodeList.append(encode)
            else:
                print(f"No face found in image: {imgPath}. Skipping...")
        except Exception as e:
            print(f"Error processing image: {imgPath}\n{str(e)}")
            continue
    return encodeList

This script focuses on adding new individuals (students) to the recognition system, including generating unique IDs and storing relevant information like name and grade.

data.py

def add_new_person(school_id, name, grade):
    person_id = str(uuid.uuid4())  # Generate a random person ID
    person_data = {
        "name": name,
        "grade": grade,
        "last_attendance_time": ""
    }
    school_ref = db.collection("Schools").document(school_id)
    school_ref.collection("students").document(person_id).set(person_data)
    print("New person added successfully.")

This code initializes the environment for sound analysis, particularly for gunshot detection. It includes loading the YAMNet model, a pre-trained model for sound event recognition.

Auditory

main.py - loader code

import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import sounddevice as sd
from scipy.io import wavfile
 
# Load the YAMNet model
model = hub.load('https://tfhub.dev/google/yamnet/1')
class_names = class_names_from_csv(model.class_map_path().numpy())

main.py - main inference code

def detect_gunshots(waveform):
    sample_rate = 16000  # Desired sample rate for YAMNet model
    sample_rate, waveform = ensure_sample_rate(sample_rate, waveform)
    waveform = waveform / np.max(np.abs(waveform))  # Normalize waveform
 
    scores, embeddings, spectrogram = model(waveform)
    scores_np = scores.numpy()
    infered_class = class_names[scores_np.mean(axis=0).argmax()]
 
    return infered_class

The core functionality for detecting specific sounds, such as gunshots, is illustrated here. It involves processing sound data and using the YAMNet model to identify if the sounds match any known classes, like gunshots.