Draft: Add many audio sources (including voice) #5870
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The existing audio sources were:
output
(default): forwards the whole audio output, and disables playback on the device (mapped toREMOTE_SUBMIX
).playback
: captures the audio playback (Android apps can opt-out, so the whole output is not necessarily captured).mic
: captures the microphone (mapped toMIC
).This PR adds:
mic-unprocessed
: captures the microphone unprocessed (raw) sound (mapped toUNPROCESSED
).mic-camcorder
: captures the microphone tuned for video recording, with the same orientation as the camera if available (mapped toCAMCORDER
).mic-voice-recognition
: captures the microphone tuned for voice recognition (mapped toVOICE_RECOGNITION
).mic-voice-communication
: captures the microphone tuned for voice communications (it will for instance take advantage of echo cancellation or automatic gain control if available) (mapped toVOICE_COMMUNICATION
).voice-call
: captures voice call (mapped toVOICE_CALL
).voice-call-uplink
: captures voice call uplink only (mapped toVOICE_UPLINK
).voice-call-downlink
: captures voice call downlink only (mapped toVOICE_DOWNLINK
).voice-performance
: captures audio meant to be processed for live performance (karaoke), includes both the microphone and the device playback (mapped toVOICE_PERFORMANCE
).Discontinuities
The existing audio sources always produce a continuous audio stream. A major issue is that some new audio sources (like the "voice call" source) do not produce packets on silence (they only capture during a voice call).
The audio regulator (the component responsible to maintain a constant latency) assumed that the input audio stream was continuous. In this PR, it now detects discontinuities based on the input PTS (and adjusts its behavior). This only works correctly if the input PTS are "correct".
Another major problem is that, even if the capture timestamps are correct, some encoders (OPUS) rewrite the PTS based on the number of samples (ignoring the input PTS). As a consequence, when encoding in OPUS, the timings are broken: they represent a continuous audio stream where the silences are removed. This breaks the discontinuity detection in the audio regulator (we could work around the problem by relying on the current recv date, since the real time playback itself does not depend on PTS). But the most important problem is that it breaks recording timings. For example:
If the voice call does not start immediately, the audio will not be played at the correct date.
With the AAC encoder, it works (the encoder on the device does not rewrite the PTS based only on the number of samples):
This PR is in draft due to this unsolved issue.
Aims to fix #5670 and #5412.