Layered Voice Detection Systems - Sixth DevLog

This Week's Focus

Steady groundwork across multiple fronts: transforming the old Cow.cs into ModularEnemy, polishing the blog system with DevBot Companion improvements, and architecting a sophisticated four-layer voice detection system that bridges academic research with practical game mechanics.

What Went Right

A satisfying week of small but meaningful victories. The Unity Store sale timing was perfect - picked up two VFX packs that should give Voice Chain Reaction the visual polish I struggle with. Visual effects have never been my strong suit, so having quality assets to work with should make explosions and combat feedback much more impactful.

On the AI front, real progress on the layered voice detection system. We already had parts of Layer 1 (phonetic analysis) running, but now work has started on Layer 0 - the foundational noise filtering that happens before any classification begins. This creates a nice alignment between my coursework and the game's technical depth, something worth sharing publicly.

The Four-Layer Voice Detection Architecture

The heart of Voice Chain Reaction's innovation lies in its sophisticated voice detection pipeline. Rather than a monolithic recognition system, we're building four distinct layers that each serve a specific purpose:

Voice Detection Pipeline

0 Noise Filtering
Filters out non-human noise before classification begins

1 Phonetic Analysis
Basic sound pattern recognition and intentionality detection

2 Personalization
User-specific templates and recognition patterns

3 Deep Learning
Local micro-model trained on spectrograms using lightweight GRU network

Layer 3: The Deep Learning Component

The most technically interesting piece is Layer 3, which will use a local micro-model trained on spectrograms. The plan involves:

Lightweight GRU Network - Designed for real-time performance without heavy computational overhead
Colab Training Pipeline - Using Google Colab for dataset preparation and model training
Spectrogram Analysis - Converting audio input to visual patterns for ML processing
Local Inference - Running the trained model locally to avoid network dependencies

Technical Deep Dive Available

For complete technical specifications, system architecture, and the interactive data flow diagram, visit the Voice Chain Reaction Technical Deep Dive. This living document covers the full four-layer architecture, GRU network design, and anti-exploit systems.

This approach allows us to maintain low latency while providing sophisticated recognition capabilities. The model will be trained specifically for the types of vocalizations that work best in gameplay scenarios.

ModularEnemy Transformation

Meanwhile, the old Cow.cs finally got its long-overdue refactoring into ModularEnemy. This change should make the codebase much more flexible for future enemy designs. By committing to modularization now, the foundations of Voice Chain Reaction will be far easier to expand later.

It's classic technical debt paydown - not glamorous, but essential infrastructure work that enables everything else to move faster.

Development Tool Insights

An interesting observation about AI development tools emerged this week. The decision to split responsibilities across different AI assistants is becoming clearer:

Claude for Web Development - Excels at front-end work and architectural discussions
GPT for Unity Projects - More natural for gameplay code and Unity-specific patterns

Claude has a tendency to restructure files too aggressively and sometimes creates god objects, but it shines when shaping front-end work. GPT feels more natural for Unity and gameplay code. Understanding these strengths helps optimize the development workflow.

Personal Milestone

On a personal note, today marks a one-year anniversary with my girlfriend. It's a milestone that feels significant given the role she played in my studies. She lent me the money I needed to finish my last semester, which allowed me to focus fully on both coursework and side projects. That support turned into a High Distinction on the SDBT framework, a project I still consider a cornerstone of my portfolio.

Moments like this remind me that these devlogs are not just about code or visuals but about the people behind them.

What Sucked

The blog spacing bug continues to be a persistent thorn. No matter how many adjustments we make, it seems stuck in place. It's a small UI problem but frustrating when the rest of the system feels cleaner.

More significantly, hitting the usage cap with Codex has slowed development. Having to wait nearly four days before using it again shifts the load more toward Claude and GPT. This means website progress will taper off while I focus on Unity work.

[Chart: Development Tool Usage Distribution]
Showing how development focus shifts when different AI tools hit usage limits

Next Steps

The priority for the coming stretch is to finalize the ModularEnemy system and keep refining the layered voice detection pipeline. Once Layers 0-2 are properly stitched together, Layer 3 becomes the focus - setting up the Colab training pipeline and preparing the spectrogram dataset.

The anti-exploit system architecture is designed to make Voice Chain Reaction both fairer and more viral - players can't game the system, but legitimate vocal skills create genuine competitive advantage.

Beyond technical work, I still plan to return to that stubborn spacing bug on the blog. Sometimes the smallest UI issues are the most persistent.

Community Note

Outside of development, The International is in full swing. Falcons are struggling, but XG's recent victory over Team Spirit makes them the team to watch. There's something satisfying about following competitive gaming while building games - seeing the polish and balance that top-tier competition demands.

Technical Takeaway

The layered approach to voice detection demonstrates how complex problems benefit from architectural thinking. Rather than building one massive recognition system, we're creating specialized layers that each solve specific problems. This modularity makes testing easier, performance more predictable, and future enhancements more manageable.