| EP4537331 - USING ALIGNED TEXT AND SPEECH REPRESENTATIONS TO TRAIN AUTOMATIC SPEECH RECOGNITION MODELS WITHOUT TRANSCRIBED SPEECH DATA [Right-click to bookmark this link] | Status | Request for examination was made Status updated on 14.03.2025 Database last updated on 28.03.2026 | |
| Former | The international publication has been made Status updated on 26.01.2024 | ||
| Former | unknown Status updated on 22.08.2023 | Most recent event Tooltip | 26.09.2025 | Change: Validation states | published on 29.10.2025 [2025/44] | 26.09.2025 | Change - extension states | published on 29.10.2025 [2025/44] | Applicant(s) | For all designated states Google LLC 1600 Amphitheatre Parkway Mountain View, CA 94043 / US | [2025/16] | Inventor(s) | 01 /
ROSENBERG, Andrew Mountain View, California 94043 / US | 02 /
CHEN, Zhehuai Mountain view, California 94043 / US | 03 /
BAPNA, Ankur Mountain View, California 94043 / US | 04 /
ZHANG, Yu Mountain view, California 94043 / US | 05 /
RAMABHADRAN, Bhuvana Mountain view, California 94043 / US | [2025/16] | Representative(s) | Shipp, Nicholas, et al Kilburn & Strode LLP Lacon London 84 Theobalds Road London WC1X 8NL / GB | [2025/16] | Application number, filing date | 23754555.3 | 20.07.2023 | [2025/16] | WO2023US28267 | Priority number, date | US202263369213P | 22.07.2022 Original published format: US 202263369213 P | [2025/16] | Filing language | EN | Procedural language | EN | Publication | Type: | A1 Application with search report | No.: | WO2024020154 | Date: | 25.01.2024 | Language: | EN | [2024/04] | Type: | A1 Application with search report | No.: | EP4537331 | Date: | 16.04.2025 | Language: | EN | The application published by WIPO in one of the EPO official languages on 25.01.2024 takes the place of the publication of the European patent application. | [2025/16] | Search report(s) | International search report - published on: | EP | 25.01.2024 | Classification | IPC: | G10L15/26, G10L13/08, G10L15/16 | [2025/16] | CPC: |
G10L15/063 (EP,KR,US);
G06N3/044 (KR);
G10L13/08 (KR);
G10L15/16 (EP,KR);
G10L15/26 (KR)
| Designated contracting states | AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LI, LT, LU, LV, MC, ME, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, TR [2025/16] | Title | German: | VERWENDUNG VON AUSGERICHTETEN TEXT- UND SPRACHDARSTELLUNGEN ZUM TRAINIEREN AUTOMATISCHER SPRACHERKENNUNGSMODELLE OHNE TRANSKRIBIERTE SPRACHDATEN | [2025/16] | English: | USING ALIGNED TEXT AND SPEECH REPRESENTATIONS TO TRAIN AUTOMATIC SPEECH RECOGNITION MODELS WITHOUT TRANSCRIBED SPEECH DATA | [2025/16] | French: | UTILISATION DE REPRÉSENTATIONS DE TEXTE ET DE PAROLE ALIGNÉES POUR ENTRAÎNER DES MODÈLES DE RECONNAISSANCE VOCALE AUTOMATIQUE SANS DONNÉES DE PAROLE TRANSCRITES | [2025/16] | Entry into regional phase | 13.01.2025 | National basic fee paid | 13.01.2025 | Designation fee(s) paid | 13.01.2025 | Examination fee paid | Examination procedure | 13.01.2025 | Examination requested [2025/16] | 13.01.2025 | Date on which the examining division has become responsible | 02.06.2025 | Amendment by applicant (claims and/or description) | Fees paid | Renewal fee | 28.07.2025 | Renewal fee patent year 03 |
| Opt-out from the exclusive Tooltip competence of the Unified Patent Court | See the Register of the Unified Patent Court for opt-out data | ||
| Responsibility for the accuracy, completeness or quality of the data displayed under the link provided lies entirely with the Unified Patent Court. | Cited in | International search | [Y] US2021350786 (CHEN ZHEHUAI et al.) [Y] 2,12,14,24 * paragraphs [0003] , [0009] , [0048] , [0 65] - paragraph [0068]; figures 1, 2A, 3A; claim 8 * | [E] WO2023183680 (GOOGLE LLC et al.) [E] 1,13 * paragraphs [0027] , [0028] , [0032] , [0035] , [0067]; figure 3B * | [Y] WENXIN HOU ET AL: "Exploiting Adapters for Cross-lingual Low-resource Speech Recognition", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 18 May 2021 (2021-05-18), XP081969202 [Y] 1-24 * low resource target language. zero-shot learning; page 3, column left, line 6 - line 14; figure 1 * * conditioning the model on a language identifier; paragraphs [00II] - [000A] * * paragraphs [0III] - [000C] * * paragraphs [00IV] - [000C] * * paragraphs [0III] - [000A] * * Training for each source language & for target language separately; page 5, paragraphs V-C * * Hint to text: Text classification tasks; page 3, paragraphs II-B - paragraphs III-A * * page 1, column right, paragraph I * * [28] Bert: Pre-training of deep bidirectional transformers for language understanding.; page 11; example [28] * | [Y] WANG WEI ET AL: "Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding", ICASSP 2022 - 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 23 May 2022 (2022-05-23), pages 7802 - 7806, XP034156949, DOI: 10.1109/ICASSP43922.2022.9747760 [Y] 1,3,5,6,8-11,13,15,17,18,20-23 * Audio encoder, text encoder; shared decoder, embedding aligner; decoder; paragraphs [02.1] , [ 2.2] , [ 3.1] , [ 3.1.1]; figure 1 * DOI: http://dx.doi.org/10.1109/ICASSP43922.2022.9747760 | [Y] ZHEHUAI CHEN ET AL: "MAESTRO: Matched Speech Text Representations through Modality Matching", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 April 2022 (2022-04-07), XP091201199 [Y] 4,5,7,11,12,16,17,19 * paragraphs [04.1] , [ 4.2] , [ 4.4] , [ 3.2]; figure 1 * |