XSL Content

Computing infrastructures for big data processing28273

Centre
Faculty of Informatics
Degree
Grado en Inteligencia Artficial
Academic course
2024/25
Academic year
3
No. of credits
6
Languages
Spanish
Basque
Code
28273

TeachingToggle Navigation

Distribution of hours by type of teaching
Study typeHours of face-to-face teachingHours of non classroom-based work by the student
Lecture-based4060
Applied laboratory-based groups2030

Teaching guideToggle Navigation

Description and Contextualization of the SubjectToggle Navigation

Infrastructures for Massive Data Processing (IMDP)

Among data analysis tools and platforms, we can find those dedicated to “big data”, that is, massive data processing. The term “massive” is key, as it indicates that these are not volumes of data that can be processed on conventional platforms, such as a desktop computer or a server.

Infrastructures for Massive Data Processing are complex in structure, as they commonly consist of multiple interconnected services. These services provide processing capacity, storage capacity and fault tolerance, among others. Not only do we have to understand how their structure is organized,but it is also necessary to be able to determine the most suitable platform to be deployed, which can be very varied: real hardware (computer clusters), virtual machines, containers, etc., both in your own data processing center and in the cloud.



Subject context

IMDP students will have taken the following mandatory subjects. First course: “Introduction to Computer Networks and Operating Systems” (provides knowledge about basic aspects of OS, storage and networks). Second course: “Parallel and Distributed Systems” (high-performance computing, parallelism, distributed systems); “Databases” (structured storage); “Software Engineering” (development and implementation of software projects). In addition, together with this subject, students will be studying “Big Data Application Development”.

The previous subjects provide the necessary support for IMDP, which is complemented by another subject dedicated to the exploitation of the infrastructures.

Skills/Learning outcomes of the subjectToggle Navigation

1. Knowledge of the needs of massive data processing systems

2. Knowledge of massive data processing platforms

3. Knowledge of the implementation and deployment choices of said platforms

4. Skills in the use of different service deployment platforms

5. Skills in the use of cloud and edge systems

Theoretical and practical contentToggle Navigation

Note: the order and structuring of the topics may undergo alterations



1. Platforms for the deployment of services:

a. Physical infrastructures

b. Virtual infrastructures

c. Cloud deployments

2. Analysis of the needs of massive data processing applications. Streaming vs. batch.

3. Services for the storage and analysis of big data

a. Description of big data processing environments

b. Deployment of these environments in datacenters and in the cloud

4. Edge, fog computing: cooperative processing between centralized resources and peripheral resources

MethodologyToggle Navigation

The approach of the subject is mainly practical. The description of service platforms and environments for their deployment will be accompanied by theoretical foundations necessary to understand aspects such as storage alternatives, connectivity alternatives, security, fault tolerance, performance, etc. Under this premise, classes will be organized in

4 credits of master class for the description of platforms and deployment environments, along with the necessary theoretical aspects. These classes will be supported by PowerPoint presentations and “hands-on” demonstrations.

2 laboratory credits, with a computer, to put into practice the knowledge explained in the master classes

In addition, students will carry out individual practical work, as a complement to what they have learned in the lecture and laboratory sessions.

Assessment systemsToggle Navigation

  • Continuous Assessment System
  • Final Assessment System
  • Tools and qualification percentages:
    • Written test to be taken (%): 25
    • Realization of Practical Work (exercises, cases or problems) (%): 75

Ordinary Call: Orientations and DisclaimerToggle Navigation

By default, students will be evaluated according to the continuous modality, although there is not much difference between continuous and final. In both cases it is necessary to submit several practical assignments and take a final exam.

Throughout the course, between 2 and 4 individual assignments will have to be handed, with a cumulative weight of 75% of the subject. The exam has a total weight in the grade of 25% and will be taken on the date indicated by the center.

To pass the subject, it will be necessary to obtain an average, weighted with the previous weights, 5 points out of 10 and, in addition, to have obtained 35% of the maximum grade for that part in each of the two parts.

As has been said, in continuous evaluation, work must be carried out and submitted throughout the course. In global evaluation, they will be delivered one week before the exam day.

As indicated by the UPV/EHU regulations, students who wish to do so, must submit a written document to the professor stating their desire to reject continuous evaluation (thus passing to final evaluation), for which they will have a period of 9 weeks from the beginning of the course.

If a student decides to resign from the ordinary call, they must do so (1) change to global evaluation and (2) after said change, not submit any practice or take the exam.

Extraordinary Call: Orientations and DisclaimerToggle Navigation

It will be evaluated in the same way as in the ordinary call: submitting all the works one week before the exam date, and taking an exam on the date indicated by the center.

If any test has been passed in the ordinary session, whether it is a practical or the exam (which means having obtained at least 50% of the maximum grade for that part), it will not be necessary to resubmit it or take it in the extraordinary evaluation.

It will be understood that a student renounces the extraordinary call when they do not hand in any work or take the exam.

Compulsory materialsToggle Navigation

There are none, but it is highly recommended that students have their own computer equipment with adequate features to be able to carry out the practices there. That includes enough memory and processing capacity to run virtualization environments.

BibliographyToggle Navigation

Basic bibliography

None

In-depth bibliography

Sourav Mazumder, Robin Singh Bhadoria, Ganesh Chandra Deka (editors). Distributed Computing in Big Data Analytics. Concepts, Technologies and Applications

Bahaaldine Azarmi. "Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture".

Anupam Chattopadhyay, Chip Hong Chang, Hao Yu (editores). "Emerging Technology and Architecture for Big-data Analytics"

Mark Grover, Ted Malaska, Jonathan Seidman, Gwen Shapira. Hadoop Application Architectures: Designing Real-World Big Data Applications 1st Edition.

Journals

Journal of big data (Springer)
Big data research (Elsevier)
IEEE Transactions on Knowledge and Data Engineering

Web addresses

Given their dynamism, they will be published annually in eGela.

GroupsToggle Navigation

16 Teórico (Spanish - Tarde)Show/hide subpages

Calendar
WeeksMondayTuesdayWednesdayThursdayFriday
16-30

14:00-15:30 (1)

15:30-17:00 (2)

Teaching staff

16 Applied laboratory-based groups-1 (Spanish - Tarde)Show/hide subpages

Calendar
WeeksMondayTuesdayWednesdayThursdayFriday
16-30

17:00-18:30 (1)

Teaching staff

31 Teórico (Basque - Mañana)Show/hide subpages

Calendar
WeeksMondayTuesdayWednesdayThursdayFriday
16-30

09:00-10:30 (1)

10:30-12:00 (2)

Teaching staff

31 Applied laboratory-based groups-1 (Basque - Mañana)Show/hide subpages

Calendar
WeeksMondayTuesdayWednesdayThursdayFriday
16-30

12:00-13:30 (1)

Teaching staff