Subject

XSL Content

Machine Learning (II)

General details of the subject

Mode: Face-to-face degree course
Language: English

Description and contextualization of the subject

El curso pone el foco en un conjunto de t¿icas inspiradas en la inteligencia artificial y la estad¿ica. En la ¿ltima d¿da, estos campos han experimentado un crecimiento notable, particularmente relacionado con el an¿sis de grandes cantidades de datos mediante t¿icas y algoritmos de base matem¿ca, estad¿ica y de optimizaci¿eur¿ica. La aplicaci¿e t¿icas de aprendizaje autom¿co est¿mpliamente expandido en ¿as como la bioinform¿ca, finanzas, y tambi¿el procesamiento de textos.

El alumnado estudiar¿as principales t¿icas para la miner¿de datos, y aumentar¿us habilidades en usos de populares herramientas de software que implementan estas t¿icas. Todo ello mediante la demostraci¿obre aplicaciones reales de procesamiento de texto.

Teaching staff

Name	Institution	Category	Doctor	Teaching profile	Area	E-mail
INZA CANO, IÑAKI	University of the Basque Country	Profesorado Pleno	Doctor	Bilingual	Science of Computation and Artificial Intelligence	inaki.inza@ehu.eus

Competencies

Name	Weight
Habilidad para manejar las estrategias y herramientas basadas en conocimiento para el procesamiento del lenguaje humano.	30.0 %
Habilidad para el manejo y la adaptación de los métodos simbólicos y basados en corpus (aprendizaje automático) más relevantes para la investigación en las tecnologías de la lengua.	70.0 %

Study types

Type	Face-to-face hours	Non face-to-face hours	Total hours
Lecture-based	10	15	25
Applied computer-based groups	20	30	50

Learning outcomes of the subject

Conocimiento de los principales escenarios de aprendizaje autom¿co.

Identificar el tipo de t¿ica a aplicar en cada escenario de clasificaci¿Conocer los pasos b¿cos, standard, de un pipeline-flujo de an¿sis de datos,

Uso de librer¿ de R-project para la creaci¿e un corpus y su "document-term matrix" asociada, y la posterior aplicaci¿e t¿icas de aprendizaje autom¿co sobre ella.

Temary

1- General terms on the "data science" world: the "data science" term, relation among AI and data science, the big data term, kaggle repository, kdnuggets.com, data science for a better world...

2- Principal classification scenarios: supervised classification, unsupervised classification (clustering), weakly supervised classification (alternative scenarios). For each learning scenario: structure of the data matrix, type of annotation, real world applications.

3- Semi-supervised classification: usefulness in NLP tasks. Software, RSSL package in R.

4- One-class classification and outlier detection: usefulness in NLP tasks. Software, R packages.

5- Using statistical tests to compare the accuracy of different classifiers. Software: R, online statistical tests in the web

6- Feature selection techniques. Techniques for selecting a "competitive" subset of original features.

7- General techniques and filters for data preprocessing. Preprocessing filters for any kind of data: missing data imputation, one-hot encoding, discretization, imbalanced class distributions...

8- "A short introduction to the tm (text mining) package in R: text processing". How to construct by text mining operators a proper corpus, and transform to a document-term matrix for further machine learning analysis. Starting from raw text such as files, html pages, twitter... A tutorial using R software.

9- "The machine learning approach: clustering words and classifying documents with R". A tutorial using R software, caret package.

10 - "First steps on deep learning for NLP by R’s h2o package (+word2vec)". A tutorial using R software. Voluntary work

Bibliography

Basic bibliography

*M. Kuhn, K. Johnson (2013). Applied Predictive Modeling. Springer.

*ParallelDots, online text analysis APIs for several tasks: sentiment analysis, tags' prediction, keyword generator, entity extraction, comparing similarity of texts, different emotions analysis, intent analysis, abusive text prediction, etc. https://www.paralleldots.com/text-analysis-apis

* sentiment140: an interesting project for automatic sentiment categorization of tweets: http://help.sentiment140.com/

* Stanford TreeBank project. "Recursive deep models for semantic compositionality over a semantic treebank". https://nlp.stanford.edu/sentiment/

* RDataMining website: Text mining with R: Twitter data analysis: http://www.rdatamining.com/docs/text-mining-with-r

* Awesome sentiment analysis: A curated list of Sentiment Analysis methods, implementations and misc. https://github.com/xiamx/awesome-sentiment-analysis

* "5 things you need to know about sentiment analysis and classification": https://www.kdnuggets.com/2018/03/5-things-sentiment-analysis-classification.html

* Bing Liu's website on "Opinion mining, sentiment analysis and opinion spam detection: the machine learning approach". https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

* 18 NLP key terms, explained for ML practitioners and NLP novices: https://www.kdnuggets.com/2017/02/natural-language-processing-key-terms-explained.html

XSL Content

Suggestions and requests

Search Bar

Master in Language Analysis and Processing

Subject

XSL Content

Machine Learning (II)

General details of the subject

Description and contextualization of the subject

Teaching staff

Competencies

Study types

Learning outcomes of the subject

Temary

Bibliography

Basic bibliography

XSL Content

Search Bar

Breadcrumb

Subject

XSL Content

Machine Learning (II)

General details of the subject

Description and contextualization of the subject

Teaching staff

Competencies

Study types

Learning outcomes of the subject

Temary

Bibliography

Basic bibliography

XSL Content