2023-10-04 11:13:31 +01:00
|
|
|
4 de Outubro 2023 - #CP
|
|
|
|
|
|
|
|
## Ex. 2
|
2023-10-04 11:23:31 +01:00
|
|
|
#### a) Limitações vetoriais
|
|
|
|
A -> consecutive elements in a row -> consecutive access in the vector
|
|
|
|
C -> same element
|
|
|
|
B -> consecutive elements in a collumn
|
|
|
|
|
|
|
|
Não vai ser vetorizável.
|
|
|
|
|
|
|
|
|
2023-10-04 12:13:31 +01:00
|
|
|
#### b) Enable vectorization
|
2023-10-04 11:33:31 +01:00
|
|
|
result of change cycles to i, k , j :
|
2023-10-04 11:43:31 +01:00
|
|
|
A -> same element
|
2023-10-04 11:33:31 +01:00
|
|
|
C -> consecutive elements in a row -> consecutive access in the vector
|
|
|
|
B -> consecutive elements in a row -> consecutive access in the vector
|
|
|
|
|
|
|
|
i k j
|
|
|
|
0 0 1
|
|
|
|
|
2023-10-04 11:43:31 +01:00
|
|
|
Vai ser vetorizável.
|
|
|
|
|
|
|
|
128b
|
|
|
|
8B -> 64b
|
|
|
|
2 elements
|
|
|
|
|
2023-10-04 12:03:31 +01:00
|
|
|
Without vectorization:
|
|
|
|
![[Pasted image 20231004115725.png]]
|
|
|
|
|
|
|
|
With vectorization:
|
2023-10-04 11:53:31 +01:00
|
|
|
![[Pasted image 20231004115135.png]]
|
2023-10-04 12:13:31 +01:00
|
|
|
Estimated: ( n^3 / 2 )* 8
|
|
|
|
#### c)Measure and analyze results
|
2023-10-04 12:03:31 +01:00
|
|
|
|
|
|
|
| N | Version | Time | CPI | \#I |
|
|
|
|
| --- | -------- | ---- | --- | --- |
|
|
|
|
| 512 | base_v()| 0.492484818 | 0.91 | 1113554887 |
|
|
|
|
| 512 | vect() | 0.081604350 | 2.88 | 578275097 |
|
|
|
|
|
|
|
|
>[!note]- Commands run
|
|
|
|
>module load gcc/9.3.0
|
|
|
|
>gcc -O2 -ftree-vectorize -msse4 mmult.c
|
|
|
|
>srun --partition=cpar perf stat -e cycles,instructions ./a.out
|
|
|
|
|
|
|
|
|
2023-10-04 12:13:31 +01:00
|
|
|
#### d) Vectorization fine-tuning
|
|
|
|
Ganhos de 4 vezes mais.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Ex. 3
|
|
|
|
#### a)
|
|
|
|
2 operações em FP
|