XSL Content

Speech Processing28281

Centre
Faculty of Informatics
Degree
Grado en Inteligencia Artficial
Academic course
2024/25
Academic year
4
No. of credits
6
Languages
Spanish
Code
28281

TeachingToggle Navigation

Distribution of hours by type of teaching
Study typeHours of face-to-face teachingHours of non classroom-based work by the student
Lecture-based4060
Applied laboratory-based groups2030

Teaching guideToggle Navigation

Description and Contextualization of the SubjectToggle Navigation

PLEASE NOTE THAT THIS SUBJECT IS TAUGHT ONLY IN SPANISH



This course is an elective course of the 4th year of the Artificial Intelligence Degree (taught during the first term).



The course is designed to introduce students to the theoretical and practical aspects necessary to understand and apply Speech Processing techniques. Therefore, the course uses concepts learned in previous courses in the areas of digital signal processing and programming. Students will explore the basic concepts associated with speech signals and learn techniques and algorithms used for their processing, putting them into practice through exercises and projects.



The fundamental objectives are:



- To introduce students to the basic concepts related to Speech Processing: production, perception, modeling and analysis.



- To introduce different practical applications (Synthesis, Recognition, ...) of these techniques and alternatives for their implementation.



- To put into practice the concepts studied, applying them in the laboratory to real cases of signal processing using the MATLAB platform.

Skills/Learning outcomes of the subjectToggle Navigation

The learning outcomes provided by the course are the following:



- To understand the fundamentals of speech signal processing.



- To know the main pre-processing and feature extraction algorithms applied to speech signals.



- To know the different modeling techniques used to represent and encode the speech signal.



- To know how to apply machine learning techniques to speech recognition.



- To know and know how to apply different speech production strategies or models for speech synthesis.

Theoretical and practical contentToggle Navigation

1- Production and perception

Physiology, articulation and acoustics of vocal sounds, perception.



2- Modeling and coding

Acoustics, discrete models, formants, LPC



3- Analysis

Temporal and frequency domain, feature extraction, cepstral coefficients



4- Synthesis

Concatenative, by formants



5- Recognition

Deterministic methods (DTW), statistical methods (HMMs), language modeling



6- Other applications

Enhancement, transformation, speaker recognition, ...

MethodologyToggle Navigation

All topics will be taught in a combination of lectures and laboratory sessions specific to the content covered, which will have an associated practical work. ...

Assessment systemsToggle Navigation

  • Continuous Assessment System
  • Final Assessment System
  • Tools and qualification percentages:
    • The percentages and types of assessment are specified in the following sections (%): 100

Ordinary Call: Orientations and DisclaimerToggle Navigation

The course has two modes of evaluation:



a) Continuous evaluation. This is the default mode of evaluation and will be used only in the ordinary call.

It requires active and continuous participation of the students: attendance to classes and laboratories, delivery of exercises and assignments, and completion of the corresponding evaluation tests, practices and presentations. If these conditions are not fulfilled, the global evaluation model will be applied.

The evaluation will consist of: written tests (40%), and practical work developed in groups of 2 persons (60%). To pass the course it is necessary to pass both parts separately. There will be an individual written evaluation that will weight the global grade of the practical part.

Students who, fulfilling the conditions to continue in the continuous evaluation system, decide to opt for the global evaluation, must inform in writing (email) to the faculty responsible for the subject.



To waive the call, it will be enough to abandon the continuous evaluation before the end of it and not to deliver any practical work or not to take any



b) Global (or overall) evaluation. In case of not following the course in continuous evaluation, this evaluation model will be chosen. Students will have to hand in the practical work with its corresponding technical reports at least two weeks before the date of the ordinary exam. In this case, the exam taken on the date of the ordinary exam will have a weight of 60% and the practical part (based on the work previously handed in) 40%. In order to pass the course it is necessary to pass both parts separately.



In order to waive the exam, it will be enough not to take the written exam.

Extraordinary Call: Orientations and DisclaimerToggle Navigation

In the case of the extraordinary call, the final mark is calculated based on two parts:



- Theory (60%): Assessed by a knowledge test.



- Practical (40%): This is assessed on the basis of the technical reports corresponding to the specific and final projects, which must be submitted before the date of the theory test. There will be an individual written evaluation that will weigh the overall mark of the practical part.



In order to pass the course it is necessary to pass both parts (theoretical and practical).



Compulsory materialsToggle Navigation

For the correct development of the subject it is required:
- a PC type personal computer.
- and specific software for signal processing (MATLAB, etc.), for the laboratory practices.
The centre provides both resources. In addition, students have the possibility of carrying out the practical projects on their own computers using the UPV/EHU's MATLAB corporate licence.

BibliographyToggle Navigation

Basic bibliography

L. Rabiner and R. W. Schafer: “Theory and Applications of Digital Speech Processing”. Pearson, 2011.

B. Gold, N. Morgan, D. Ellis: “Speech and audio signal processing, Processing and Perception of Speech and Music” 2nd Edition. Wiley, 2011.

D. O'Shaughnessy, “Speech Communications: Human and Machine” 2nd Ed. IEEE Press, 2000.

X. Huang, A. Acero, H.Hon, “Spoken Language Processing”. Prentice Hall, 2001

In-depth bibliography

T. F. Quatieri, “Discrete-Time Speech Signal Processing – Principles and Practice”. Pearson Education, 2001.
P. Taylor. “Text-to_speech Synthesis”. Cambridge University Press, 2009
C. Becchetti, L. P. Ricotti, “Speech Recognition”. John Wiley and Sons, 1999.
K. Sayood, “Introduction to Data Compression” 2nd Ed. Morgan Kaufmann, 2000.

GroupsToggle Navigation

16 Teórico (Spanish - Tarde)Show/hide subpages

Calendar
WeeksMondayTuesdayWednesdayThursdayFriday
1-15

17:00-18:30 (1)

14:00-15:30 (2)

Teaching staff

16 Applied laboratory-based groups-1 (Spanish - Tarde)Show/hide subpages

Calendar
WeeksMondayTuesdayWednesdayThursdayFriday
1-15

15:30-17:00 (1)

Teaching staff