Multi-lingual Speech to Text

Most multi-lingual services require a specific input-target language direction (e.g. English speech, French text). However, real-world speech interactions with multi-lingual voices are never clearly delineated. For example, an LLM agent in language learning would encounter some English, then some Spanish, sprinkled throughout.

We create a model that can automatically generate language delineating boundaries by timestamp. Developers can use this model as an AI-as-a-Service.