Here's a overview document that talks in detail to loudness and tehniques for maintaining loudness.
Despite the conclusion of the DTV transition, many broadcasters and the production community have been slow to effectively adapt to the changes required to transition from analog NTSC audio techniques to contemporary digital audio practices. With digital television’s expanded aural dynamic range (over 100 dB) comes the opportunity for excessive variation in content when DTV loudness is not managed properly. Consumers do not expect large changes in audio loudness from program to interstitials and from channel to channel. Inappropriate use of the available wide dynamic range has led to
complaints from consumers and the need to keep their remote controls at hand to adjust the volume for their own listening comfort. The NTSC analog television system uses conventional audio dynamic range processing at
various stages of the signal path to manage audio loudness for broadcasts. This practice compensates for limitations in the dynamic range of analog equipment and controls the various loudness levels of audio received from suppliers. It also helps smooth the loudness of program to interstitial
transitions. Though simple and effective, this practice permanently reduces dynamic range and changes the audio before it reaches the audience. It modifies the characteristics of the original sound, altering it from what the program provider intended, to fit within the limitations of the analog system.
The AC-3 audio system defined in the ATSC Digital Television Standard uses metadata or “data about the data” to control loudness and other audio parameters more effectively without permanently altering the dynamic range of the content. The content provider or DTV operator encodes metadata along with the audio content. From the audience’s perspective, the Dialog
Normalization (dialnorm) metadata parameter sets different content to a uniform loudness transparently. It achieves results similar to a viewer using a remote control to set a comfortable volume between disparate TV programs, commercials, and channel changing transitions. The
dialnorm and other metadata parameters are integral to the AC-3 audio bit stream. ATSC document A/53 Part 5:2007 , which the FCC has incorporated into its Rules by reference, mandates the carriage of dialnorm and correctly set dialnorm values. The industry has recognized that a new proficiency in loudness measurement, production monitoring, metadata usage, and contemporary dynamic range practices is critical for meeting the
expectations of the content supplier, the broadcaster, the audience, and governing bodies. This document provides technical recommendations and information concerning:
• Loudness measurement using the ITU-R BS.1770 recommendation.
• Target loudness for content exchange without metadata.
• The set up of reference monitoring environments when producing for the expanded range
of digital television, with consideration for multiple listening environments in the home.
• Provides methods to effectively control program-to-interstitial loudness.
• Effective uses of audio metadata for production, distribution, and transmission of digital
• Dynamic range control within AC-3 audio and contemporary conventional dynamic range
control as an addition or alternative, including recommendations for loudness and
dynamics management at the boundaries of programs and interstitial content.
Goal is to correctly setup your listening environment once and make sure you
are always listening at this level when creating content. This is true even if you must use headphones to monitor.
With the monitor level set correctly, always mix relying on your hearing. Use a BS.1770 loudness monitoring tool to confirm what you hear.
When generating content and the program delivery level requirement is
unknown or has not been specified, mix Dialog Level to -24 LKFS with true peaks below-2 dB TP.
The station AC-3 encoder’s dialnorm will be set to match the loudness of average Dialog Level of the content.
Measure the loudness of all audio channels14 and all elements of the soundtrack integrated over the duration of the short form content.
Measure the long form content audio when typical dialog is present and record this value as the Dialog Level of the content.