Draft: Add many audio sources (including voice) #5870

rom1v · 2025-02-22T12:00:03Z

The existing audio sources were:

output (default): forwards the whole audio output, and disables playback on the device (mapped to REMOTE_SUBMIX).
playback: captures the audio playback (Android apps can opt-out, so the whole output is not necessarily captured).
mic: captures the microphone (mapped to MIC).

This PR adds:

mic-unprocessed: captures the microphone unprocessed (raw) sound (mapped to UNPROCESSED).
mic-camcorder: captures the microphone tuned for video recording, with the same orientation as the camera if available (mapped to CAMCORDER).
mic-voice-recognition: captures the microphone tuned for voice recognition (mapped to VOICE_RECOGNITION).
mic-voice-communication: captures the microphone tuned for voice communications (it will for instance take advantage of echo cancellation or automatic gain control if available) (mapped to VOICE_COMMUNICATION).
voice-call: captures voice call (mapped to VOICE_CALL).
voice-call-uplink: captures voice call uplink only (mapped to VOICE_UPLINK).
voice-call-downlink: captures voice call downlink only (mapped to VOICE_DOWNLINK).
voice-performance: captures audio meant to be processed for live performance (karaoke), includes both the microphone and the device playback (mapped to VOICE_PERFORMANCE).

Discontinuities

The existing audio sources always produce a continuous audio stream. A major issue is that some new audio sources (like the "voice call" source) do not produce packets on silence (they only capture during a voice call).

The audio regulator (the component responsible to maintain a constant latency) assumed that the input audio stream was continuous. In this PR, it now detects discontinuities based on the input PTS (and adjusts its behavior). This only works correctly if the input PTS are "correct".

Another major problem is that, even if the capture timestamps are correct, some encoders (OPUS) rewrite the PTS based on the number of samples (ignoring the input PTS). As a consequence, when encoding in OPUS, the timings are broken: they represent a continuous audio stream where the silences are removed. This breaks the discontinuity detection in the audio regulator (we could work around the problem by relying on the current recv date, since the real time playback itself does not depend on PTS). But the most important problem is that it breaks recording timings. For example:

scrcpy --audio-source=voice-call --record=file.mp4

If the voice call does not start immediately, the audio will not be played at the correct date.

With the AAC encoder, it works (the encoder on the device does not rewrite the PTS based only on the number of samples):

scrcpy --audio-source=voice-call --record=file.mp4 --audio-codec=aac

This PR is in draft due to this unsolved issue.

Aims to fix #5670 and #5412.

Only enable them if SC_AUDIO_REGULATOR_DEBUG is set, as they may spam the output.

Report the number of silence samples inserted due to underflow every second, along with the other metrics.

The audio regulator assumed a continuous audio stream. But some audio sources (like the "voice call" audio source) do not produce any packets on silence, breaking this assumption. Use PTS to detect such discontinuities. TODO: if PTS values are broken, the detection is also broken.

Store the target audio source integer (one of the constants in android.media.MediaRecorder.AudioSource) in the AudioSource enum (or -1 if not relevant). This will simplify adding new audio sources.

rom1v added 5 commits February 22, 2025 12:26

Disable audio regulator underflow logs

ea4c076

Only enable them if SC_AUDIO_REGULATOR_DEBUG is set, as they may spam the output.

Report underflow samples in verbose mode

8925bdc

Report the number of silence samples inserted due to underflow every second, along with the other metrics.

Refactor audio sources

9fb7446

Store the target audio source integer (one of the constants in android.media.MediaRecorder.AudioSource) in the AudioSource enum (or -1 if not relevant). This will simplify adding new audio sources.

Add audio sources

1ebe2e2

rom1v mentioned this pull request Feb 22, 2025

Draft: Add many audio sources (including voice) #5869

Closed

rom1v changed the base branch from master to dev February 22, 2025 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Add many audio sources (including voice) #5870

Draft: Add many audio sources (including voice) #5870

rom1v commented Feb 22, 2025

Draft: Add many audio sources (including voice) #5870

Are you sure you want to change the base?

Draft: Add many audio sources (including voice) #5870

Conversation

rom1v commented Feb 22, 2025

Discontinuities