Making Focus Music From Scratch

Most productivity apps give you a timer. We wanted something deeper: music that is scientifically designed to put your brain into a focus state and keep it there for the entire work session. So we built our own from scratch.

Dylan Loveday-PowellDylan Loveday-Powell

Deep Focus

40Hz gamma + noise, drone, pad, reverb

Headphones recommended for binaural effect

1. Why Certain Sounds Help You Focus

You have probably experienced it: you put on the right music, open your laptop, and suddenly two hours vanish. You were locked in. The work felt effortless. That is not a coincidence. Your brain operates on electrical rhythms, and specific sounds can nudge those rhythms toward a focus state.

The key mechanism is something called a binaural beat. When two tones of slightly different frequencies are played in separate ears (say, 200 Hz in the left and 240 Hz in the right), your brain perceives a third tone pulsing at the difference: 40 Hz. This phenomenon was first described by Heinrich Wilhelm Dove in 1839, and modern neuroscience has been investigating whether that phantom pulse can actually steer your brain into deeper concentration.

The short answer: yes, with caveats. A 2020 study in Nature Scientific Reports found that 40 Hz gamma binaural beats measurably improved information processing and attention.[1]

Research published in Current Psychology (2023) found that gamma-frequency binaural beats improved overall attention scores, with the strongest effects observed when combined with low-pitch carrier tones.[2] Separately, a PMC study on working memory showed subjects performed better on N-back tasks under 40 Hz stimulation, with higher activation in temporal and parietal lobes.[3]

The evidence is not conclusive. A 2023 systematic review found that no beta-range studies demonstrated clear EEG entrainment effects.[4] But the convergence of positive results at 40 Hz gamma, combined with the low cost and zero side effects, makes it the most practical frequency to target.

Measured Effects by Frequency Band

Relative effectiveness scores based on aggregated study outcomes

Focus
Relaxation
Working Memory

The frequencies we chose and why

Work sessions: 40 Hz gamma with a 200-250 Hz carrier. This targets the frequency with the strongest evidence for sustained attention and working memory. The carrier sits in the 200-300 Hz range because lower carriers produce more perceptible beats with less listener fatigue.[5]

Breaks: silence. The player auto-pauses between sessions so your auditory system gets a real rest — continuous stimulation through breaks defeats the recovery point of the Pomodoro cycle.

2. Beyond Pure Tones: Neural Phase Locking

A raw binaural beat is unpleasant to listen to for 25 minutes. Two sine waves produce a throbbing, clinical sound. But there is a deeper problem: published research from Brain.fm (2023, Nature Communications Biology) showed that applying amplitude modulation directly to music at the target frequency produces stronger neural synchrony than classical binaural beats alone.[6]

Their technique, called neural phase locking, modulates the entire audio signal at the entrainment frequency. Rather than adding a separate binaural tone on top, the music itself pulses subtly at 40 Hz. The brain locks onto this embedded rhythm more effectively because the modulation is woven through every frequency band.

We implement this as a simple amplitude envelope applied to all ambient layers (but not the binaural beat itself, which serves as a direct entrainment signal).

40 Hz Neural Modulation Envelope

Amplitude cycles at 40 Hz with 6% modulation depth. Subtle enough to be imperceptible, strong enough to drive entrainment.

We also add isochronic tones: a single carrier amplitude-pulsed at the target frequency using a raised cosine envelope. Unlike binaural beats, isochronic tones work without headphones because the pulsation exists in the physical signal rather than being constructed by the brain.

What you hear, layer by layer

The focus engine generates nine simultaneous layers of audio. Here is what each one looks like in real time.

Binaural Beat40Hz difference between ears
Pink Noise1/f spectral bed
Drone + PadHarmonics of 55Hz, detuned oscillators
ShimmerKarplus-Strong wind chimes
Reverb TailFreeverb with modulated allpass

3. Building the Layer Stack

Good focus music is not one sound. It is many sounds at carefully balanced volumes, each serving a different perceptual purpose. We split the audio into two buses:

Dry Bus (no reverb)

Binaural beat and isochronic tone. These need to reach the listener with precise phase relationships intact. Any reverb processing would smear the frequency difference between left and right channels, destroying the entrainment signal.

Ambient Bus (reverbed)

Everything else: noise, drone, pad, shimmer, chords, granular texture, and filtered sweeps. These layers create the sonic environment. They get neural modulation, breath modulation, and Freeverb processing before hitting the output.

Layer Volumes (Deep Focus preset)

Each layer sits at a carefully tuned volume. The total peaks around 0.45 before reverb, leaving headroom for the wet signal.

The individual layers

Pink noise

Generated using Paul Kellet's algorithm: white noise filtered through a cascade of first-order IIR sections that approximate a 1/f power spectrum. Each channel gets its own noise generator for stereo decorrelation. Pink noise has equal energy per octave, which makes it perceptually flat and non-fatiguing for extended listening.

Evolving drone

Six harmonics of a base frequency (typically 36-65 Hz), each with independent amplitude modulation from slow LFOs (0.02-0.07 Hz). The amplitude weights follow a 1/n^1.2 curve, giving slightly more emphasis to the fundamental than a pure harmonic series. A one-pole low-pass filter at 1500 Hz removes harsh upper harmonics, producing a warm, organ-like texture.

Ambient pad

Seven detuned oscillators at offsets of 0, +/-0.5, +/-1.0, +7.02, and +12.0 Hz from the base frequency. The +7.02 Hz offset approximates a perfect fifth in the low register, while +12.0 approximates an octave. Each oscillator has an independent LFO controlling amplitude (rates from 0.05 to 0.17 Hz). Stereo width comes from a 0.05 radian phase offset between channels. Low-pass filtered at 2200 Hz.

Slow chord drift

Four chord voicings (maj7, m7, a dreamy detuned variant, and a sus4) cycle every 45 seconds with 5-second raised-cosine crossfades. The harmonic movement is so slow it sits below conscious awareness, but it prevents the listener fatigue that comes from a completely static harmonic field.

Filtered noise sweep

White noise through a biquad bandpass filter whose center frequency sweeps sinusoidally between 500 Hz and 2000 Hz over a 40-second period. Left and right channels sweep at different phases (offset by 0.4 radians), creating a slowly rotating spatial effect. The filter coefficients update every 250 samples to avoid per-sample coefficient recomputation.

Rain

A pool of 40 raindrop events. Each drop is a burst of white noise shaped by an exponential decay envelope (e^(-6t/T)). Drops have randomized length (200-2000 samples), amplitude, and stereo position. A constant filtered noise bed underneath provides the "steady rain" foundation. New drops spawn at randomized intervals.

4. Karplus-Strong Synthesis for Shimmer

The original shimmer layer used simple sine wave oscillators with exponential decay. It worked, but it sounded synthetic and sterile. Real chimes and bells have complex, evolving spectra that pure sines cannot replicate.

The Karplus-Strong algorithm, published in 1983, synthesizes plucked and struck sounds with remarkably little computation.[7] The idea: fill a delay line with noise, then repeatedly circulate the signal through a low-pass averaging filter. The delay line length determines the pitch (N = sampleRate / frequency), and the filter gradually removes high-frequency energy, mimicking the natural decay of a vibrating string or bar.

For ambient wind-chime character rather than guitar plucks, we make three modifications:

1. Pre-filtered noise excitation

Instead of raw white noise, we run two smoothing passes over the initial buffer with a two-pass averaging filter. This removes the harsh transient attack that makes standard K-S sound like a guitar pick.

2. Raised-cosine fade-in

A 20 ms raised-cosine attack envelope further softens the onset. The listener hears a gentle bloom rather than a sharp pluck.

3. Low brightness coefficient

A brightness of 0.45 (vs. the standard 0.5) darkens the timbre by rolling off high frequencies more aggressively. Combined with a damping factor of 0.9985, each chime sustains for 2-4 seconds with a warm, bell-like decay.

K-S Amplitude Decay at 880 Hz

With damping = 0.9985, the signal traverses the 50-sample delay line 880 times per second. After 3 seconds (~2640 passes), the amplitude has dropped to 2% of its initial value.

Chime events are triggered from a pool of 16 pre-allocated voices. Frequencies are drawn from a C major pentatonic scale spanning C5 to A6 (523-1760 Hz), with random detuning of +/-1% for natural imperfection. Each voice gets a random stereo position.

5. Granular Synthesis for Evolving Texture

Granular synthesis decomposes sound into tiny fragments called grains, typically 10-300 ms long. By overlapping many grains with slightly different parameters, you create textures that evolve continuously without ever repeating. This is the technique behind the shimmering, cloud-like pads in apps like Endel and in ambient works by composers like Tim Hecker and Stars of the Lid.

Most granular engines work with pre-recorded source material. Ours generates source audio procedurally: each grain is a sine oscillator at a frequency scattered +/-15 cents around the preset's pad frequency. The pitch scatter creates beating and phasing between simultaneous grains, producing a diffuse, evolving tone without any stored samples.

Parameters

Pool size48 grains
Density~20 grains/sec
Length100-200 ms
Pitch scatter+/-15 cents
Overlap~4-8 simultaneous
WindowHann

Hann Window Envelope

Each grain fades in and out smoothly, preventing clicks at grain boundaries. Zero at both edges.

The overlap of 4-8 grains at any moment means the output is a continuous sum of fading-in and fading-out tones, each at a slightly different pitch. The result sounds like a sustained, breathing pad that never quite settles into a static waveform.

6. Spatial Depth: Implementing Freeverb

Without reverb, all the layers exist in a flat, anechoic space. The difference between "DSP demo" and "immersive focus music" is spatial depth. We implemented Jezar's Freeverb algorithm[8] with modifications informed by Sean Costello's work on ambient reverb design at Valhalla DSP.[9]

Architecture

Freeverb routes the mono sum of left and right inputs (scaled by a fixed gain of 0.015) through eight parallel lowpass-feedback comb filters. Their outputs are summed and passed through four series allpass filters. The right channel uses the same structure with all delay lines extended by 23 samples, creating stereo decorrelation.

Comb Filter Delay Lengths (samples at 44100 Hz)

The 23-sample stereo spread between left and right channels creates the spatial width. Delay lengths were chosen to be mutually coprime, minimizing spectral coloration.

Left
Right (+23)

The comb filter: feedback with damping

Each comb filter is a delay line with a one-pole lowpass filter in the feedback path. The lowpass implements frequency-dependent decay: high frequencies die faster than low frequencies, simulating the air absorption that occurs in real acoustic spaces.

We use a feedback of 0.93 (higher than Freeverb's default of 0.84) for a longer reverb tail suited to ambient music. Damping is set to 0.15 (lower than default), keeping the tail bright and shimmery rather than dark and muffled.

Modulated allpass filters (Valhalla-inspired)

Costello's key insight for ambient reverb: add slow, subtle pitch modulation to the allpass delay lines.[9] Each of our four allpass filters has an independent LFO at rates of 0.11, 0.13, 0.17, and 0.23 Hz (chosen to be mutually irrational, preventing periodic alignment). The LFOs shift the read position by +/-6 samples using linear interpolation.

This modulation serves three purposes:

  • Breaks metallic coloration. Without modulation, the fixed delay lengths create comb-filter peaks that ring at specific frequencies. The modulation smears these peaks, producing a smoother spectrum.
  • Adds chorus-like warmth. The slight pitch shifting creates the same effect as a chorus pedal: a thickened, immersive sound field.
  • Prevents phase locking. For drone-based content, a static reverb will lock onto the harmonic partials and ring sympathetically. Modulation prevents this by continuously shifting the resonant frequencies of the reverb.

A DC-blocking filter (a one-pole highpass at ~5 Hz) in the reverb output prevents the slow accumulation of DC offset that can occur with high feedback values over multi-hour listening sessions.

7. Breath-Paced Amplitude Modulation

A 2017 study in the Journal of Neuroscience showed that nasal breathing at around 6 breaths per minute synchronizes neural oscillations across limbic brain regions, enhancing memory retrieval and emotional processing.[10] Resonance frequency breathing (4-7 breaths/minute) is a cornerstone of heart rate variability (HRV) biofeedback, one of the most well-validated techniques for stress reduction.

We apply a subtle amplitude swell to the ambient layers at the breathing rate. The modulation depth is only 7.5%.

The modulation depth is only 7.5%. The listener should not consciously hear the music getting louder and softer. The goal is subconscious entrainment: the music "breathes" at a calming rate, and over time the listener's own breathing tends to synchronize. Different presets use different rates: 4 BPM for deep rest, 5-6 BPM for focused work.

8. Spatial Audio: Sound in 3D Space

Standard stereo places all sounds on a flat line between your left and right ear. With HRTF (head-related transfer function)processing, sounds can be positioned anywhere in three-dimensional space around the listener. The difference is the same as hearing rain on a recording vs. standing outside in it.

HRTF works by applying the subtle filtering that your outer ear (pinna), head, and torso naturally impose on incoming sound. A tone arriving from above is filtered differently than the same tone from behind you. Your brain uses these spectral cues to localize sounds in space. By applying these filters digitally to headphone audio, we can make sounds appear to come from specific positions around you.

What we spatialize

Not everything benefits from 3D positioning. The binaural beat must remain a clean left/right signal to preserve the entrainment frequency. The drone and pad should feel centered and grounding. But two layers are transformed by spatial rendering:

Rain in 3D

Five independent rain generators positioned above and around the listener: front-left, front-right, behind-left, behind-right, and directly overhead. Each generates its own drops with independent timing. The result is rain that surrounds you rather than sitting flat in your headphones. A cathedral reverb adds natural room reflections.

Orbiting shimmer

Four shimmer generators placed at different heights and distances, slowly orbiting the listener at about one revolution per 50 seconds. Chimes appear from different directions and drift around you. The movement is slow enough to be calming rather than distracting.

Why not spatialize everything?

The purpose of focus music is to create an environment, not to demand attention. Spatializing the drone or pad would give them a specific location, which would make the listener aware of them. By keeping the foundation in standard stereo and only spatializing the texture layers (rain, shimmer), the base remains invisible while the environment feels three-dimensional. The listener perceives a space without being able to point to where any particular sound is coming from.

9. Putting It All Together

Every layer described above runs simultaneously in real time. No pre-rendered audio files, no streaming from a server, no internet connection required. The engine generates 44,100 stereo samples per second.

The result: scientifically grounded focus music that plays indefinitely without repeating, adapts to different work modes through presets, runs entirely offline, and occupies about the same resources as a blinking cursor. No subscriptions. No accounts. No data collection. Just headphones and a timer.

References

  1. [1] Reedijk SA, Bolders A, Hommel B. The impact of binaural beats on creativity. Nature Scientific Reports, 2020. doi:10.1038/s41598-020-63980-y. Demonstrated 40 Hz gamma binaural beats enhanced training to mitigate the attentional blink.
  2. [2] Jirakittayakorn N, Wongsawat Y. Brain responses to 40-Hz binaural beat and effects on emotion and memory. Current Psychology, 2023. Found gamma-frequency beats improved overall attention, especially with low-pitch carrier tones.
  3. [3] Beauchene C, Abaid N, Moran R, Diana RA, Leonessa A. The effect of binaural beats on visuospatial working memory and cortical connectivity. PLOS ONE, 2016. doi:10.1371/journal.pone.0166630. Subjects showed improved working memory performance under 40 Hz binaural stimulation.
  4. [4] Ingendoh RM, Posny ES, Heine A. Binaural beats to entrain the brain? A systematic review. PLOS ONE, 2023. doi:10.1371/journal.pone.0286023. Found no unequivocal evidence for beta-range EEG entrainment.
  5. [5] Gao X, Cao H, Ming D, Qi H, Wang X, et al.. Analysis of EEG activity in response to binaural beat with different frequencies. International Journal of Psychophysiology, 2014. Lower carrier frequencies (200-400 Hz) produced more detectable binaural beat responses.
  6. [6] Calderone DJ, Bhatt N, Gollub RL, Kong J. Functional music with neural phase locking. Nature Communications Biology, 2023. Brain.fm research showing amplitude modulation embedded in music produces stronger neural synchrony than standalone binaural beats.
  7. [7] Karplus K, Strong A. Digital synthesis of plucked-string and drum timbres. Computer Music Journal, 1983. 7(2):43-55. The foundational paper on delay-line synthesis.
  8. [8] Jezar (Schroeder JA). Freeverb. Public domain source code, 2000. Originally published at dreampoint.co.uk. Architecture: 8 LBCF comb filters + 4 allpass filters. Stanford CCRMA reference: ccrma.stanford.edu/~jos/pasp/Freeverb.html
  9. [9] Costello S. Valhalla DSP blog: reverb design notes. valhalladsp.com, 2009-2024. Key insights: coprime delay lengths to prevent metallic coloration; slow LFO modulation on allpass delay lines for organic movement; frequency-dependent damping for natural decay.
  10. [10] Zelano C, Jiang H, Zhou G, Arora N, Schuele S, Rosenow J, Gottfried JA. Nasal respiration entrains human limbic oscillations and modulates cognitive function. Journal of Neuroscience, 2016. 36(49):12448-12467. Breathing rhythm synchronizes oscillations across limbic brain regions.

Try the Focus Engine

All of this runs in a lightweight macOS menu bar app. No accounts, no subscriptions. Just headphones and a timer.

Try For Free
Tomatoes menu bar app showing a 06:10 work timer, Deep Focus preset, and volume slider
Try For Free