Bark is a multilingual text-to-audio model that can generate various types of audio, including music and nonverbal sounds.
Bark is a transformer-based text-to-audio model created by Suno that can generate highly realistic, multilingual speech and other audio including music, background noise, and sound effects. The model supports various languages out-of-the-box, including English, German, Spanish, French, Hindi, Italian, Japanese, Korean, Polish, Portuguese, Russian, Turkish, and Chinese. Bark has the capability to fully clone voices and can be used for foreign language text, music, and speaker prompts. The model has been tested on both CPU and GPU and can generate audio in real-time on modern GPUs, but inference time may be slower on older GPUs or CPU. Bark is licensed under a non-commercial license and uses EnCodec as a neural codec backend. Access to pretrained model checkpoints is provided for research purposes.