592 lines
8.7 KiB
Markdown
592 lines
8.7 KiB
Markdown
---
|
|
dg-publish: true
|
|
---
|
|
25 Outubro 2023 - #CP
|
|
|
|
---
|
|
# Ficha 6
|
|
https://learn.microsoft.com/en-us/cpp/parallel/openmp/reference/openmp-clauses?view=msvc-170
|
|
## Ex 1
|
|
Versão original, result:
|
|
|
|
```
|
|
T1:i50 w=11
|
|
T1:i51 w=12
|
|
T1:i52 w=13
|
|
T1:i53 w=14
|
|
T1:i54 w=15
|
|
T1:i55 w=16
|
|
T0:i0 w=10
|
|
T0:i1 w=18
|
|
T0:i2 w=19
|
|
T0:i3 w=20
|
|
T0:i4 w=21
|
|
T0:i5 w=22
|
|
T1:i56 w=17
|
|
T0:i6 w=23
|
|
T1:i57 w=24
|
|
T0:i7 w=25
|
|
T1:i58 w=26
|
|
T0:i8 w=27
|
|
T1:i59 w=28
|
|
T1:i60 w=30
|
|
T1:i61 w=31
|
|
T1:i62 w=32
|
|
T0:i9 w=29
|
|
T1:i63 w=33
|
|
T0:i10 w=34
|
|
T1:i64 w=35
|
|
T0:i11 w=36
|
|
T1:i65 w=37
|
|
T0:i12 w=38
|
|
T0:i13 w=40
|
|
T0:i14 w=41
|
|
T0:i15 w=42
|
|
T1:i66 w=39
|
|
T0:i16 w=43
|
|
T1:i67 w=44
|
|
T1:i68 w=45
|
|
T0:i17 w=46
|
|
T1:i69 w=47
|
|
T1:i70 w=49
|
|
T1:i71 w=50
|
|
T1:i72 w=51
|
|
T0:i18 w=48
|
|
T1:i73 w=52
|
|
T0:i19 w=53
|
|
T1:i74 w=54
|
|
T0:i20 w=55
|
|
T1:i75 w=56
|
|
T0:i21 w=57
|
|
T1:i76 w=58
|
|
T0:i22 w=59
|
|
T1:i77 w=60
|
|
T0:i23 w=61
|
|
T1:i78 w=62
|
|
T0:i24 w=63
|
|
T1:i79 w=64
|
|
T0:i25 w=65
|
|
T1:i80 w=66
|
|
T0:i26 w=67
|
|
T1:i81 w=68
|
|
T0:i27 w=69
|
|
T0:i28 w=71
|
|
T0:i29 w=72
|
|
T0:i30 w=73
|
|
T1:i82 w=70
|
|
T1:i83 w=75
|
|
T1:i84 w=76
|
|
T1:i85 w=77
|
|
T0:i31 w=74
|
|
T1:i86 w=78
|
|
T0:i32 w=79
|
|
T1:i87 w=80
|
|
T0:i33 w=81
|
|
T1:i88 w=82
|
|
T0:i34 w=83
|
|
T1:i89 w=84
|
|
T0:i35 w=85
|
|
T1:i90 w=86
|
|
T0:i36 w=87
|
|
T1:i91 w=88
|
|
T0:i37 w=89
|
|
T1:i92 w=90
|
|
T0:i38 w=91
|
|
T1:i93 w=92
|
|
T0:i39 w=93
|
|
T1:i94 w=94
|
|
T0:i40 w=95
|
|
T0:i41 w=96
|
|
T0:i42 w=97
|
|
T0:i43 w=98
|
|
T0:i44 w=99
|
|
T0:i45 w=100
|
|
T0:i46 w=101
|
|
T0:i47 w=102
|
|
T0:i48 w=103
|
|
T0:i49 w=104
|
|
T1:i95 w=105
|
|
T1:i96 w=106
|
|
T1:i97 w=107
|
|
T1:i98 w=108
|
|
T1:i99 w=109
|
|
w=110
|
|
```
|
|
Enquanto o for se desenlaça, as threads 0 ou 1 vão "apanhando" aleatoriamente o print e o w é sempre incrementado como uma uma variável global conforme a iteração do for. No final w=110.
|
|
|
|
## 1.1 Versão com private(w)
|
|
```
|
|
T1:i50 w=0
|
|
T1:i51 w=1
|
|
T1:i52 w=2
|
|
T1:i53 w=3
|
|
T1:i54 w=4
|
|
T1:i55 w=5
|
|
T0:i0 w=0
|
|
T0:i1 w=1
|
|
T0:i2 w=2
|
|
T0:i3 w=3
|
|
T0:i4 w=4
|
|
T1:i56 w=6
|
|
T1:i57 w=7
|
|
T1:i58 w=8
|
|
T1:i59 w=9
|
|
T1:i60 w=10
|
|
T0:i5 w=5
|
|
T0:i6 w=6
|
|
T0:i7 w=7
|
|
T0:i8 w=8
|
|
T1:i61 w=11
|
|
T0:i9 w=9
|
|
T1:i62 w=12
|
|
T0:i10 w=10
|
|
T1:i63 w=13
|
|
T0:i11 w=11
|
|
T1:i64 w=14
|
|
T0:i12 w=12
|
|
T1:i65 w=15
|
|
T0:i13 w=13
|
|
T1:i66 w=16
|
|
T0:i14 w=14
|
|
T1:i67 w=17
|
|
T0:i15 w=15
|
|
T1:i68 w=18
|
|
T0:i16 w=16
|
|
T1:i69 w=19
|
|
T0:i17 w=17
|
|
T1:i70 w=20
|
|
T0:i18 w=18
|
|
T1:i71 w=21
|
|
T0:i19 w=19
|
|
T1:i72 w=22
|
|
T0:i20 w=20
|
|
T1:i73 w=23
|
|
T1:i74 w=24
|
|
T1:i75 w=25
|
|
T1:i76 w=26
|
|
T1:i77 w=27
|
|
T0:i21 w=21
|
|
T1:i78 w=28
|
|
T0:i22 w=22
|
|
T1:i79 w=29
|
|
T0:i23 w=23
|
|
T1:i80 w=30
|
|
T0:i24 w=24
|
|
T1:i81 w=31
|
|
T0:i25 w=25
|
|
T0:i26 w=26
|
|
T0:i27 w=27
|
|
T0:i28 w=28
|
|
T1:i82 w=32
|
|
T0:i29 w=29
|
|
T1:i83 w=33
|
|
T1:i84 w=34
|
|
T0:i30 w=30
|
|
T1:i85 w=35
|
|
T0:i31 w=31
|
|
T1:i86 w=36
|
|
T0:i32 w=32
|
|
T1:i87 w=37
|
|
T0:i33 w=33
|
|
T1:i88 w=38
|
|
T1:i89 w=39
|
|
T1:i90 w=40
|
|
T1:i91 w=41
|
|
T0:i34 w=34
|
|
T1:i92 w=42
|
|
T0:i35 w=35
|
|
T1:i93 w=43
|
|
T0:i36 w=36
|
|
T1:i94 w=44
|
|
T0:i37 w=37
|
|
T1:i95 w=45
|
|
T0:i38 w=38
|
|
T1:i96 w=46
|
|
T0:i39 w=39
|
|
T1:i97 w=47
|
|
T0:i40 w=40
|
|
T1:i98 w=48
|
|
T0:i41 w=41
|
|
T1:i99 w=49
|
|
T0:i42 w=42
|
|
T0:i43 w=43
|
|
T0:i44 w=44
|
|
T0:i45 w=45
|
|
T0:i46 w=46
|
|
T0:i47 w=47
|
|
T0:i48 w=48
|
|
T0:i49 w=49
|
|
w=10
|
|
```
|
|
|
|
Enquanto o for se desenlaça, as threads 0 ou 1 vão "apanhando" aleatoriamente o print, mas o w de cada thread é da sua própria stack (iniciando-se a 0 para cada stack pois cada thread é "privada"). Após o término das threads, o w mantém-se com o seu valor definido no global (pois as threads são privadas).
|
|
|
|
## 1.2 Versão com firstprivate(w)
|
|
|
|
```
|
|
T1:i50 w=10
|
|
T1:i51 w=11
|
|
T1:i52 w=12
|
|
T1:i53 w=13
|
|
T1:i54 w=14
|
|
T1:i55 w=15
|
|
T0:i0 w=10
|
|
T0:i1 w=11
|
|
T0:i2 w=12
|
|
T0:i3 w=13
|
|
T1:i56 w=16
|
|
T0:i4 w=14
|
|
T1:i57 w=17
|
|
T0:i5 w=15
|
|
T1:i58 w=18
|
|
T0:i6 w=16
|
|
T1:i59 w=19
|
|
T0:i7 w=17
|
|
T1:i60 w=20
|
|
T0:i8 w=18
|
|
T1:i61 w=21
|
|
T0:i9 w=19
|
|
T1:i62 w=22
|
|
T0:i10 w=20
|
|
T1:i63 w=23
|
|
T0:i11 w=21
|
|
T1:i64 w=24
|
|
T0:i12 w=22
|
|
T1:i65 w=25
|
|
T0:i13 w=23
|
|
T1:i66 w=26
|
|
T0:i14 w=24
|
|
T1:i67 w=27
|
|
T0:i15 w=25
|
|
T1:i68 w=28
|
|
T0:i16 w=26
|
|
T1:i69 w=29
|
|
T0:i17 w=27
|
|
T1:i70 w=30
|
|
T0:i18 w=28
|
|
T1:i71 w=31
|
|
T0:i19 w=29
|
|
T1:i72 w=32
|
|
T0:i20 w=30
|
|
T1:i73 w=33
|
|
T0:i21 w=31
|
|
T1:i74 w=34
|
|
T0:i22 w=32
|
|
T1:i75 w=35
|
|
T0:i23 w=33
|
|
T1:i76 w=36
|
|
T0:i24 w=34
|
|
T1:i77 w=37
|
|
T0:i25 w=35
|
|
T1:i78 w=38
|
|
T0:i26 w=36
|
|
T1:i79 w=39
|
|
T0:i27 w=37
|
|
T1:i80 w=40
|
|
T0:i28 w=38
|
|
T1:i81 w=41
|
|
T0:i29 w=39
|
|
T1:i82 w=42
|
|
T0:i30 w=40
|
|
T1:i83 w=43
|
|
T0:i31 w=41
|
|
T1:i84 w=44
|
|
T0:i32 w=42
|
|
T1:i85 w=45
|
|
T0:i33 w=43
|
|
T1:i86 w=46
|
|
T0:i34 w=44
|
|
T1:i87 w=47
|
|
T0:i35 w=45
|
|
T1:i88 w=48
|
|
T0:i36 w=46
|
|
T1:i89 w=49
|
|
T0:i37 w=47
|
|
T1:i90 w=50
|
|
T0:i38 w=48
|
|
T1:i91 w=51
|
|
T0:i39 w=49
|
|
T1:i92 w=52
|
|
T0:i40 w=50
|
|
T1:i93 w=53
|
|
T0:i41 w=51
|
|
T1:i94 w=54
|
|
T0:i42 w=52
|
|
T1:i95 w=55
|
|
T0:i43 w=53
|
|
T1:i96 w=56
|
|
T0:i44 w=54
|
|
T1:i97 w=57
|
|
T0:i45 w=55
|
|
T1:i98 w=58
|
|
T0:i46 w=56
|
|
T1:i99 w=59
|
|
T0:i47 w=57
|
|
T0:i48 w=58
|
|
T0:i49 w=59
|
|
w=10
|
|
```
|
|
|
|
Enquanto o for se desenlaça, as threads 0 ou 1 vão "apanhando" aleatoriamente o print, mas o w de cada thread inicia-se com o valor definido para o definido w global. Dito isto, cada thread trabalhará um w próprio da stack a partir daquele momento. Após o término das threads, o w mantém-se com o seu valor definido no global pois existe antes da paralelização do programa.
|
|
## 1.2 Versão com lastprivate(w)
|
|
|
|
```
|
|
T1:i50 w=0
|
|
T1:i51 w=1
|
|
T1:i52 w=2
|
|
T1:i53 w=3
|
|
T1:i54 w=4
|
|
T1:i55 w=5
|
|
T1:i56 w=6
|
|
T0:i0 w=0
|
|
T0:i1 w=1
|
|
T0:i2 w=2
|
|
T0:i3 w=3
|
|
T1:i57 w=7
|
|
T0:i4 w=4
|
|
T1:i58 w=8
|
|
T1:i59 w=9
|
|
T1:i60 w=10
|
|
T1:i61 w=11
|
|
T0:i5 w=5
|
|
T1:i62 w=12
|
|
T0:i6 w=6
|
|
T1:i63 w=13
|
|
T0:i7 w=7
|
|
T1:i64 w=14
|
|
T0:i8 w=8
|
|
T1:i65 w=15
|
|
T0:i9 w=9
|
|
T1:i66 w=16
|
|
T0:i10 w=10
|
|
T1:i67 w=17
|
|
T0:i11 w=11
|
|
T1:i68 w=18
|
|
T0:i12 w=12
|
|
T1:i69 w=19
|
|
T0:i13 w=13
|
|
T1:i70 w=20
|
|
T0:i14 w=14
|
|
T1:i71 w=21
|
|
T0:i15 w=15
|
|
T1:i72 w=22
|
|
T0:i16 w=16
|
|
T1:i73 w=23
|
|
T0:i17 w=17
|
|
T1:i74 w=24
|
|
T0:i18 w=18
|
|
T1:i75 w=25
|
|
T0:i19 w=19
|
|
T1:i76 w=26
|
|
T0:i20 w=20
|
|
T1:i77 w=27
|
|
T0:i21 w=21
|
|
T1:i78 w=28
|
|
T0:i22 w=22
|
|
T1:i79 w=29
|
|
T0:i23 w=23
|
|
T1:i80 w=30
|
|
T0:i24 w=24
|
|
T0:i25 w=25
|
|
T0:i26 w=26
|
|
T0:i27 w=27
|
|
T1:i81 w=31
|
|
T0:i28 w=28
|
|
T1:i82 w=32
|
|
T0:i29 w=29
|
|
T1:i83 w=33
|
|
T1:i84 w=34
|
|
T1:i85 w=35
|
|
T1:i86 w=36
|
|
T0:i30 w=30
|
|
T1:i87 w=37
|
|
T0:i31 w=31
|
|
T1:i88 w=38
|
|
T0:i32 w=32
|
|
T1:i89 w=39
|
|
T0:i33 w=33
|
|
T1:i90 w=40
|
|
T0:i34 w=34
|
|
T1:i91 w=41
|
|
T0:i35 w=35
|
|
T1:i92 w=42
|
|
T0:i36 w=36
|
|
T1:i93 w=43
|
|
T0:i37 w=37
|
|
T1:i94 w=44
|
|
T0:i38 w=38
|
|
T1:i95 w=45
|
|
T0:i39 w=39
|
|
T1:i96 w=46
|
|
T0:i40 w=40
|
|
T1:i97 w=47
|
|
T0:i41 w=41
|
|
T1:i98 w=48
|
|
T0:i42 w=42
|
|
T1:i99 w=49
|
|
T0:i43 w=43
|
|
T0:i44 w=44
|
|
T0:i45 w=45
|
|
T0:i46 w=46
|
|
T0:i47 w=47
|
|
T0:i48 w=48
|
|
T0:i49 w=49
|
|
w=50
|
|
```
|
|
|
|
Enquanto o for se desenlaça, as threads 0 ou 1 vão "apanhando" aleatoriamente o print, mas o w de cada thread inicia-se com o valor 0 independentemente do valor do w global. Nas threads o valor irá ser incrementado e após estas, o w terá o valor do w da stack da thread realizada por último (ou a última secção - \#pragama section).
|
|
## 1.4 Versão com reduction(+:w)
|
|
```
|
|
T1:i50 w=0
|
|
T1:i51 w=1
|
|
T1:i52 w=2
|
|
T1:i53 w=3
|
|
T1:i54 w=4
|
|
T1:i55 w=5
|
|
T1:i56 w=6
|
|
T1:i57 w=7
|
|
T0:i0 w=0
|
|
T0:i1 w=1
|
|
T0:i2 w=2
|
|
T1:i58 w=8
|
|
T0:i3 w=3
|
|
T1:i59 w=9
|
|
T0:i4 w=4
|
|
T1:i60 w=10
|
|
T1:i61 w=11
|
|
T1:i62 w=12
|
|
T1:i63 w=13
|
|
T0:i5 w=5
|
|
T1:i64 w=14
|
|
T0:i6 w=6
|
|
T1:i65 w=15
|
|
T0:i7 w=7
|
|
T1:i66 w=16
|
|
T0:i8 w=8
|
|
T0:i9 w=9
|
|
T0:i10 w=10
|
|
T0:i11 w=11
|
|
T1:i67 w=17
|
|
T0:i12 w=12
|
|
T1:i68 w=18
|
|
T0:i13 w=13
|
|
T1:i69 w=19
|
|
T0:i14 w=14
|
|
T1:i70 w=20
|
|
T1:i71 w=21
|
|
T1:i72 w=22
|
|
T1:i73 w=23
|
|
T0:i15 w=15
|
|
T1:i74 w=24
|
|
T0:i16 w=16
|
|
T1:i75 w=25
|
|
T0:i17 w=17
|
|
T1:i76 w=26
|
|
T0:i18 w=18
|
|
T0:i19 w=19
|
|
T0:i20 w=20
|
|
T0:i21 w=21
|
|
T1:i77 w=27
|
|
T0:i22 w=22
|
|
T1:i78 w=28
|
|
T0:i23 w=23
|
|
T1:i79 w=29
|
|
T0:i24 w=24
|
|
T1:i80 w=30
|
|
T1:i81 w=31
|
|
T1:i82 w=32
|
|
T1:i83 w=33
|
|
T1:i84 w=34
|
|
T0:i25 w=25
|
|
T1:i85 w=35
|
|
T1:i86 w=36
|
|
T1:i87 w=37
|
|
T1:i88 w=38
|
|
T0:i26 w=26
|
|
T1:i89 w=39
|
|
T0:i27 w=27
|
|
T1:i90 w=40
|
|
T0:i28 w=28
|
|
T0:i29 w=29
|
|
T0:i30 w=30
|
|
T0:i31 w=31
|
|
T1:i91 w=41
|
|
T0:i32 w=32
|
|
T1:i92 w=42
|
|
T0:i33 w=33
|
|
T1:i93 w=43
|
|
T0:i34 w=34
|
|
T1:i94 w=44
|
|
T1:i95 w=45
|
|
T1:i96 w=46
|
|
T1:i97 w=47
|
|
T0:i35 w=35
|
|
T1:i98 w=48
|
|
T0:i36 w=36
|
|
T1:i99 w=49
|
|
T0:i37 w=37
|
|
T0:i38 w=38
|
|
T0:i39 w=39
|
|
T0:i40 w=40
|
|
T0:i41 w=41
|
|
T0:i42 w=42
|
|
T0:i43 w=43
|
|
T0:i44 w=44
|
|
T0:i45 w=45
|
|
T0:i46 w=46
|
|
T0:i47 w=47
|
|
T0:i48 w=48
|
|
T0:i49 w=49
|
|
w=110
|
|
```
|
|
|
|
Enquanto o for se desenlaça, as threads 0 ou 1 vão "apanhando" aleatoriamente o print conforme a iteração, e o w inicia-se com o valor 0 para cada thread. Os w privados das stacks das threads irão ser incrementados independentemente, sendo que no final irá ser somado os valores finais dos ws das threads com o valor global do w. Este valor ser o mesmo do original é pura sorte.
|
|
|
|
___
|
|
## Ex. 2
|
|
Versão original:
|
|
|
|
```
|
|
Dot is 1.2020569031095949
|
|
```
|
|
|
|
### a+b)
|
|
|
|
Run \#1 (--cpus-per-task=2):
|
|
```
|
|
Dot is 1.2020569029595982
|
|
```
|
|
|
|
Run \#n (--cpus-per-task=2):
|
|
```
|
|
Dot is 0.0000000001499965
|
|
```
|
|
With the same cpus per task (cores), the result may vary depending on how each threads randomly "picks up" the iterations of the for.
|
|
|
|
|
|
Run \#2 (--cpus-per-task=4):
|
|
```
|
|
Dot is 0.0000000005999720
|
|
```
|
|
|
|
Run \#3 (--cpus-per-task=6):
|
|
```
|
|
Dot is 0.0000000013500135
|
|
```
|
|
|
|
Run \#4 (--cpus-per-task=8):
|
|
```
|
|
Dot is 0.0000000000719980
|
|
```
|
|
|
|
|
|
Varies with the number of cpus per task (cores) because depending on the number of threads, the distribution of the instruction ```dot += a[i]*b[i];``` per thread will vary, aka the value ```i``` will constantly vary, making the memory space alternate between versions.
|
|
|
|
### c)
|
|
We can change the code to:
|
|
|
|
And the results will always be:
|
|
|
|
### d)
|
|
We can use _reduction_ to do that (and it's more efficient than 'critical').
|
|
|