Gaia

XSLaren edukia

Ikasketa Automatikoa (II)

Gaiari buruzko datu orokorrak

Modalitatea: Ikasgelakoa
Hizkuntza: Ingelesa

Irakasgaiaren azalpena eta testuingurua

El curso pone el foco en un conjunto de técnicas inspiradas en la inteligencia artificial y la estadística. En la última década, estos campos han experimentado un crecimiento notable, particularmente relacionado con el análisis de grandes cantidades de datos mediante técnicas y algoritmos de base matemática, estadística y de optimización heurística. La aplicación de técnicas de aprendizaje automático está ampliamente expandido en áreas como la bioinformática, finanzas, y también el procesamiento de textos.

El alumnado estudiará las principales técnicas para la minería de datos, y aumentará sus habilidades en usos de populares herramientas de software que implementan estas técnicas. Todo ello mediante la demostración sobre aplicaciones reales de procesamiento de texto.

Irakasleak

Izena	Erakundea	Kategoria	Doktorea	Irakaskuntza-profila	Arloa	Helbide elektronikoa
INZA CANO, IÑAKI	Euskal Herriko Unibertsitatea	Irakaslego Osoa	Doktorea	Elebiduna	Konputazio Zientzia eta Adimen Artifiziala	inaki.inza@ehu.eus

Gaitasunak

Izena	Pisua
Habilidad para manejar las estrategias y herramientas basadas en conocimiento para el procesamiento del lenguaje humano.	30.0 %
Habilidad para el manejo y la adaptación de los métodos simbólicos y basados en corpus (aprendizaje automático) más relevantes para la investigación en las tecnologías de la lengua.	70.0 %

Irakaskuntza motak

Mota	Ikasgelako orduak	Ikasgelaz kanpoko orduak	Orduak guztira
Magistrala	10	15	25
Ordenagailuko p.	20	30	50

Irakaskuntza motak

Izena	Orduak	Ikasgelako orduen ehunekoa
Eskola magistralak	25.0	40 %
Ordenagailuko praktikak, irteerak, bisitak	50.0	40 %

Ebaluazio-sistemak

Izena	Gutxieneko ponderazioa	Gehieneko ponderazioa
Lan praktikoak	0.0 %	100.0 %

Irakasgaia ikastean lortuko diren emaitzak

* Conocimiento de los principales escenarios de aprendizaje automático.

* Identificar el tipo de técnica a aplicar en cada escenario de clasificación.

* Conocer los pasos básicos, standard, de un pipeline-flujo de análisis de datos,

* Uso de librerías de R-project para la creación de un corpus y su "document-term matrix" asociada, y la posterior aplicación de técnicas de aprendizaje automático sobre ella.

Ohiko deialdia: orientazioak eta uko egitea

Continuous evaluation:

First, it is needed that the student attends, at least, 80% of the sessions. The evalution consists in an individual project, resumed in he following lines:

Starting from raw text (e.g. tweets or comments in social networks, html text, a set of text files, etc.), it is needed to import an reate a corpus. The corpus needs to be based in a supervised problem, composed of texts-documents with differente labels. The corpus will be preprocessed with basic text-mining filters (e.g. removing stop-words, stemming, removing of sparse terms, etc.). R-project's “tm” (“text-mining”) package will be used for this purpose. Corpus will be transformed to a matrix-format, in order to be processed by machine learning specialized software, in our case, popular R's “caret” package. A classical supervised pipeline will be applied, consisting at least in the following steps: load and data exploration, variables' preprocessing, corpus partition for validation, feature extraction and selection, application of class-imbalance techniques, learning and tuning of classification models, statistical comparison.

The output of the project will be a “notebook”, which alternates the implemented code with description of its functionalities and design decisions taken.

Single-final evaluation:

Individual project: when the student can't attend the lessons and he/she asks for a single final evaluation, this will consist in the development of the individual project previously exposed.

Ezohiko deialdia: orientazioak eta uko egitea

Individual project: when the student can't attend the lessons and he/she asks for a single final evaluation, this will consist in the development of the individual project previously exposed.

Irakasgai-zerrenda

1. Principales escenarios de clasificación. Formalismos y aplicaciones en cada escenario: clasificación supervisada, clustering, "weakly supervised classification" ('positive unlabeled learning', 'learning from label proportions', 'partial labels', etc.)

2. Técnicas y filtros generales para el preprocesamiento de datos. Software: WEKA

3. Principales técnicas para la selección de variables. Software: WEKA

4. Validación de modelos de clasificación. Uso de tests estadísticos para la comparativa entre clasificadores. Software: WEKA, R, recursos web

5. El paquete 'tm'(text-mining) del software R. Construcción mediante operadores de 'text-mining' una 'document-term' matrix para su posterior análisis mediante técnicas de aprendizaje automático. Notebook-tutorial

6. 'The machine learning approach': clustering de términos y clasificación de documentos. Uso del paquete "caret" de R. Notebook-tutorial

7. Primeros pasos en "deep learning" para la clasificación de documentos. Uso del paquete 'h2o' de R. Notebook-tutorial

Bibliografia

Oinarrizko bibliografia

*M. Kuhn, K. Johnson (2013). Applied Predictive Modeling. Springer.

*ParallelDots, online text analysis APIs for several tasks: sentiment analysis, tags' prediction, keyword generator, entity extraction, comparing similarity of texts, different emotions analysis, intent analysis, abusive text prediction, etc. https://www.paralleldots.com/text-analysis-apis

* sentiment140: an interesting project for automatic sentiment categorization of tweets: http://help.sentiment140.com/

* Stanford TreeBank project. "Recursive deep models for semantic compositionality over a semantic treebank". https://nlp.stanford.edu/sentiment/

* RDataMining website: Text mining with R: Twitter data analysis: http://www.rdatamining.com/docs/text-mining-with-r

* Awesome sentiment analysis: A curated list of Sentiment Analysis methods, implementations and misc. https://github.com/xiamx/awesome-sentiment-analysis

* "5 things you need to know about sentiment analysis and classification": https://www.kdnuggets.com/2018/03/5-things-sentiment-analysis-classification.html

* Bing Liu's website on "Opinion mining, sentiment analysis and opinion spam detection: the machine learning approach". https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

* 18 NLP key terms, explained for ML practitioners and NLP novices: https://www.kdnuggets.com/2017/02/natural-language-processing-key-terms-explained.html

Bilaketa barra

Hizkuntzaren eta Komunikazioaren Teknologiak Erasmus Mundus Masterra (LCT)

Gaia

XSLaren edukia

Ikasketa Automatikoa (II)

Gaiari buruzko datu orokorrak

Irakasgaiaren azalpena eta testuingurua

Irakasleak

Gaitasunak

Irakaskuntza motak

Irakaskuntza motak

Ebaluazio-sistemak

Irakasgaia ikastean lortuko diren emaitzak

Ohiko deialdia: orientazioak eta uko egitea

Ezohiko deialdia: orientazioak eta uko egitea

Irakasgai-zerrenda

Bibliografia

Oinarrizko bibliografia

Bilaketa barra

Breadcrumb

Gaia

XSLaren edukia

Ikasketa Automatikoa (II)

Gaiari buruzko datu orokorrak

Irakasgaiaren azalpena eta testuingurua

Irakasleak

Gaitasunak

Irakaskuntza motak

Irakaskuntza motak

Ebaluazio-sistemak

Irakasgaia ikastean lortuko diren emaitzak

Ohiko deialdia: orientazioak eta uko egitea

Ezohiko deialdia: orientazioak eta uko egitea

Irakasgai-zerrenda

Bibliografia

Oinarrizko bibliografia