Noise contributions contain very loud noise

I have taken a closer look at the noise contributions at [media.xiph.org/rnnoise/rnnoise_contributions.tar.gz](https://media.xiph.org/rnnoise/rnnoise_contributions.tar.gz). With the help of `sox` I have skimmed through some of the loudest files and found many instances where the noise is so loud, that I find it unreasonable to expect an AI model to recognize voice next to it. Those are the most problematic files I've found:
- 1506612372095-other.raw
- 1506865846246-other.raw
- 1506890776920-other.raw
- 1506896387552-other.raw
- 1506904933605-coffee.raw
- 1506905761767-coffee.raw
- 1506931866078-other.raw
- 1506937851368-office.raw
- 1506942115691-office.raw
- 1507008551397-other.raw
- 1507024121772-other.raw
- 1507046472430-other.raw
- 1507051246600-street.raw
- 1507053038795-other.raw
- 1507225021633-other.raw
- 1507225705223-other.raw
- 1507256882651-other.raw
- 1507264564781-other.raw
- 1507279040493-train.raw
- 1507279110456-train.raw
- 1507288337806-other.raw
- 1506716634275-other.raw
- 1507372594108-office.raw
- 1508468651573-office.raw
- 1508504834575-car.raw
- 1508917528488-office.raw
- 1509685708555-none.raw
- 1509701170578-train.raw
- 1511050964203-none.raw

I think removing those files from the dataset will improve the quality of the AI model.

There are many more files containing loud noise, but I've tried not to include files where a human could at least make out some voice next to the noise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noise contributions contain very loud noise #252

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Noise contributions contain very loud noise #252

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions