Extract from the Register of European Patents

EP About this file: EP4537331

EP4537331 - USING ALIGNED TEXT AND SPEECH REPRESENTATIONS TO TRAIN AUTOMATIC SPEECH RECOGNITION MODELS WITHOUT TRANSCRIBED SPEECH DATA [Right-click to bookmark this link]
StatusRequest for examination was made
Status updated on  14.03.2025
Database last updated on 28.03.2026
FormerThe international publication has been made
Status updated on  26.01.2024
Formerunknown
Status updated on  22.08.2023
Most recent event   Tooltip26.09.2025Change: Validation statespublished on 29.10.2025  [2025/44]
26.09.2025Change - extension statespublished on 29.10.2025  [2025/44]
Applicant(s)For all designated states
Google LLC
1600 Amphitheatre Parkway
Mountain View, CA 94043 / US
[2025/16]
Inventor(s)01 / ROSENBERG, Andrew
Mountain View, California 94043 / US
02 / CHEN, Zhehuai
Mountain view, California 94043 / US
03 / BAPNA, Ankur
Mountain View, California 94043 / US
04 / ZHANG, Yu
Mountain view, California 94043 / US
05 / RAMABHADRAN, Bhuvana
Mountain view, California 94043 / US
 [2025/16]
Representative(s)Shipp, Nicholas, et al
Kilburn & Strode LLP
Lacon London
84 Theobalds Road
London WC1X 8NL / GB
[2025/16]
Application number, filing date23754555.320.07.2023
[2025/16]
WO2023US28267
Priority number, dateUS202263369213P22.07.2022         Original published format: US 202263369213 P
[2025/16]
Filing languageEN
Procedural languageEN
PublicationType: A1 Application with search report
No.:WO2024020154
Date:25.01.2024
Language:EN
[2024/04]
Type: A1 Application with search report 
No.:EP4537331
Date:16.04.2025
Language:EN
The application published by WIPO in one of the EPO official languages on 25.01.2024 takes the place of the publication of the European patent application.
[2025/16]
Search report(s)International search report - published on:EP25.01.2024
ClassificationIPC:G10L15/26, G10L13/08, G10L15/16
[2025/16]
CPC:
G10L15/063 (EP,KR,US); G06N3/044 (KR); G10L13/08 (KR);
G10L15/16 (EP,KR); G10L15/26 (KR)
Designated contracting statesAL,   AT,   BE,   BG,   CH,   CY,   CZ,   DE,   DK,   EE,   ES,   FI,   FR,   GB,   GR,   HR,   HU,   IE,   IS,   IT,   LI,   LT,   LU,   LV,   MC,   ME,   MK,   MT,   NL,   NO,   PL,   PT,   RO,   RS,   SE,   SI,   SK,   SM,   TR [2025/16]
TitleGerman:VERWENDUNG VON AUSGERICHTETEN TEXT- UND SPRACHDARSTELLUNGEN ZUM TRAINIEREN AUTOMATISCHER SPRACHERKENNUNGSMODELLE OHNE TRANSKRIBIERTE SPRACHDATEN[2025/16]
English:USING ALIGNED TEXT AND SPEECH REPRESENTATIONS TO TRAIN AUTOMATIC SPEECH RECOGNITION MODELS WITHOUT TRANSCRIBED SPEECH DATA[2025/16]
French:UTILISATION DE REPRÉSENTATIONS DE TEXTE ET DE PAROLE ALIGNÉES POUR ENTRAÎNER DES MODÈLES DE RECONNAISSANCE VOCALE AUTOMATIQUE SANS DONNÉES DE PAROLE TRANSCRITES[2025/16]
Entry into regional phase13.01.2025National basic fee paid 
13.01.2025Designation fee(s) paid 
13.01.2025Examination fee paid 
Examination procedure13.01.2025Examination requested  [2025/16]
13.01.2025Date on which the examining division has become responsible
02.06.2025Amendment by applicant (claims and/or description)
Fees paidRenewal fee
28.07.2025Renewal fee patent year 03
Opt-out from the exclusive  Tooltip
competence of the Unified
Patent Court
See the Register of the Unified Patent Court for opt-out data
Responsibility for the accuracy, completeness or quality of the data displayed under the link provided lies entirely with the Unified Patent Court.
Cited inInternational search[Y] US2021350786  (CHEN ZHEHUAI et al.) [Y] 2,12,14,24 * paragraphs [0003] , [0009] , [0048] , [0 65] - paragraph [0068]; figures 1, 2A, 3A; claim 8 *
 [E] WO2023183680  (GOOGLE LLC et al.) [E] 1,13 * paragraphs [0027] , [0028] , [0032] , [0035] , [0067]; figure 3B *
 [Y]   WENXIN HOU ET AL: "Exploiting Adapters for Cross-lingual Low-resource Speech Recognition", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 18 May 2021 (2021-05-18), XP081969202 [Y] 1-24 * low resource target language. zero-shot learning; page 3, column left, line 6 - line 14; figure 1 * * conditioning the model on a language identifier; paragraphs [00II] - [000A] * * paragraphs [0III] - [000C] * * paragraphs [00IV] - [000C] * * paragraphs [0III] - [000A] * * Training for each source language & for target language separately; page 5, paragraphs V-C * * Hint to text: Text classification tasks; page 3, paragraphs II-B - paragraphs III-A * * page 1, column right, paragraph I * * [28] Bert: Pre-training of deep bidirectional transformers for language understanding.; page 11; example [28] *
 [Y]   WANG WEI ET AL: "Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding", ICASSP 2022 - 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 23 May 2022 (2022-05-23), pages 7802 - 7806, XP034156949, DOI: 10.1109/ICASSP43922.2022.9747760 [Y] 1,3,5,6,8-11,13,15,17,18,20-23 * Audio encoder, text encoder; shared decoder, embedding aligner; decoder; paragraphs [02.1] , [ 2.2] , [ 3.1] , [ 3.1.1]; figure 1 *

DOI:   http://dx.doi.org/10.1109/ICASSP43922.2022.9747760
 [Y]   ZHEHUAI CHEN ET AL: "MAESTRO: Matched Speech Text Representations through Modality Matching", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 April 2022 (2022-04-07), XP091201199 [Y] 4,5,7,11,12,16,17,19 * paragraphs [04.1] , [ 4.2] , [ 4.4] , [ 3.2]; figure 1 *
The EPO accepts no responsibility for the accuracy of data originating from other authorities; in particular, it does not guarantee that it is complete, up to date or fit for specific purposes.