--- dg-publish: true --- 25 Outubro 2023 - #CP --- # Ficha 6 https://learn.microsoft.com/en-us/cpp/parallel/openmp/reference/openmp-clauses?view=msvc-170 ## Ex 1 Versão original, result: ``` T1:i50 w=11 T1:i51 w=12 T1:i52 w=13 T1:i53 w=14 T1:i54 w=15 T1:i55 w=16 T0:i0 w=10 T0:i1 w=18 T0:i2 w=19 T0:i3 w=20 T0:i4 w=21 T0:i5 w=22 T1:i56 w=17 T0:i6 w=23 T1:i57 w=24 T0:i7 w=25 T1:i58 w=26 T0:i8 w=27 T1:i59 w=28 T1:i60 w=30 T1:i61 w=31 T1:i62 w=32 T0:i9 w=29 T1:i63 w=33 T0:i10 w=34 T1:i64 w=35 T0:i11 w=36 T1:i65 w=37 T0:i12 w=38 T0:i13 w=40 T0:i14 w=41 T0:i15 w=42 T1:i66 w=39 T0:i16 w=43 T1:i67 w=44 T1:i68 w=45 T0:i17 w=46 T1:i69 w=47 T1:i70 w=49 T1:i71 w=50 T1:i72 w=51 T0:i18 w=48 T1:i73 w=52 T0:i19 w=53 T1:i74 w=54 T0:i20 w=55 T1:i75 w=56 T0:i21 w=57 T1:i76 w=58 T0:i22 w=59 T1:i77 w=60 T0:i23 w=61 T1:i78 w=62 T0:i24 w=63 T1:i79 w=64 T0:i25 w=65 T1:i80 w=66 T0:i26 w=67 T1:i81 w=68 T0:i27 w=69 T0:i28 w=71 T0:i29 w=72 T0:i30 w=73 T1:i82 w=70 T1:i83 w=75 T1:i84 w=76 T1:i85 w=77 T0:i31 w=74 T1:i86 w=78 T0:i32 w=79 T1:i87 w=80 T0:i33 w=81 T1:i88 w=82 T0:i34 w=83 T1:i89 w=84 T0:i35 w=85 T1:i90 w=86 T0:i36 w=87 T1:i91 w=88 T0:i37 w=89 T1:i92 w=90 T0:i38 w=91 T1:i93 w=92 T0:i39 w=93 T1:i94 w=94 T0:i40 w=95 T0:i41 w=96 T0:i42 w=97 T0:i43 w=98 T0:i44 w=99 T0:i45 w=100 T0:i46 w=101 T0:i47 w=102 T0:i48 w=103 T0:i49 w=104 T1:i95 w=105 T1:i96 w=106 T1:i97 w=107 T1:i98 w=108 T1:i99 w=109 w=110 ``` Enquanto o for se desenlaça, as threads 0 ou 1 vão "apanhando" aleatoriamente o print e o w é sempre incrementado como uma uma variável global conforme a iteração do for. No final w=110. ## 1.1 Versão com private(w) ``` T1:i50 w=0 T1:i51 w=1 T1:i52 w=2 T1:i53 w=3 T1:i54 w=4 T1:i55 w=5 T0:i0 w=0 T0:i1 w=1 T0:i2 w=2 T0:i3 w=3 T0:i4 w=4 T1:i56 w=6 T1:i57 w=7 T1:i58 w=8 T1:i59 w=9 T1:i60 w=10 T0:i5 w=5 T0:i6 w=6 T0:i7 w=7 T0:i8 w=8 T1:i61 w=11 T0:i9 w=9 T1:i62 w=12 T0:i10 w=10 T1:i63 w=13 T0:i11 w=11 T1:i64 w=14 T0:i12 w=12 T1:i65 w=15 T0:i13 w=13 T1:i66 w=16 T0:i14 w=14 T1:i67 w=17 T0:i15 w=15 T1:i68 w=18 T0:i16 w=16 T1:i69 w=19 T0:i17 w=17 T1:i70 w=20 T0:i18 w=18 T1:i71 w=21 T0:i19 w=19 T1:i72 w=22 T0:i20 w=20 T1:i73 w=23 T1:i74 w=24 T1:i75 w=25 T1:i76 w=26 T1:i77 w=27 T0:i21 w=21 T1:i78 w=28 T0:i22 w=22 T1:i79 w=29 T0:i23 w=23 T1:i80 w=30 T0:i24 w=24 T1:i81 w=31 T0:i25 w=25 T0:i26 w=26 T0:i27 w=27 T0:i28 w=28 T1:i82 w=32 T0:i29 w=29 T1:i83 w=33 T1:i84 w=34 T0:i30 w=30 T1:i85 w=35 T0:i31 w=31 T1:i86 w=36 T0:i32 w=32 T1:i87 w=37 T0:i33 w=33 T1:i88 w=38 T1:i89 w=39 T1:i90 w=40 T1:i91 w=41 T0:i34 w=34 T1:i92 w=42 T0:i35 w=35 T1:i93 w=43 T0:i36 w=36 T1:i94 w=44 T0:i37 w=37 T1:i95 w=45 T0:i38 w=38 T1:i96 w=46 T0:i39 w=39 T1:i97 w=47 T0:i40 w=40 T1:i98 w=48 T0:i41 w=41 T1:i99 w=49 T0:i42 w=42 T0:i43 w=43 T0:i44 w=44 T0:i45 w=45 T0:i46 w=46 T0:i47 w=47 T0:i48 w=48 T0:i49 w=49 w=10 ``` Enquanto o for se desenlaça, as threads 0 ou 1 vão "apanhando" aleatoriamente o print, mas o w de cada thread é da sua própria stack (iniciando-se a 0 para cada stack pois cada thread é "privada"). Após o término das threads, o w mantém-se com o seu valor definido no global (pois as threads são privadas). ## 1.2 Versão com firstprivate(w) ``` T1:i50 w=10 T1:i51 w=11 T1:i52 w=12 T1:i53 w=13 T1:i54 w=14 T1:i55 w=15 T0:i0 w=10 T0:i1 w=11 T0:i2 w=12 T0:i3 w=13 T1:i56 w=16 T0:i4 w=14 T1:i57 w=17 T0:i5 w=15 T1:i58 w=18 T0:i6 w=16 T1:i59 w=19 T0:i7 w=17 T1:i60 w=20 T0:i8 w=18 T1:i61 w=21 T0:i9 w=19 T1:i62 w=22 T0:i10 w=20 T1:i63 w=23 T0:i11 w=21 T1:i64 w=24 T0:i12 w=22 T1:i65 w=25 T0:i13 w=23 T1:i66 w=26 T0:i14 w=24 T1:i67 w=27 T0:i15 w=25 T1:i68 w=28 T0:i16 w=26 T1:i69 w=29 T0:i17 w=27 T1:i70 w=30 T0:i18 w=28 T1:i71 w=31 T0:i19 w=29 T1:i72 w=32 T0:i20 w=30 T1:i73 w=33 T0:i21 w=31 T1:i74 w=34 T0:i22 w=32 T1:i75 w=35 T0:i23 w=33 T1:i76 w=36 T0:i24 w=34 T1:i77 w=37 T0:i25 w=35 T1:i78 w=38 T0:i26 w=36 T1:i79 w=39 T0:i27 w=37 T1:i80 w=40 T0:i28 w=38 T1:i81 w=41 T0:i29 w=39 T1:i82 w=42 T0:i30 w=40 T1:i83 w=43 T0:i31 w=41 T1:i84 w=44 T0:i32 w=42 T1:i85 w=45 T0:i33 w=43 T1:i86 w=46 T0:i34 w=44 T1:i87 w=47 T0:i35 w=45 T1:i88 w=48 T0:i36 w=46 T1:i89 w=49 T0:i37 w=47 T1:i90 w=50 T0:i38 w=48 T1:i91 w=51 T0:i39 w=49 T1:i92 w=52 T0:i40 w=50 T1:i93 w=53 T0:i41 w=51 T1:i94 w=54 T0:i42 w=52 T1:i95 w=55 T0:i43 w=53 T1:i96 w=56 T0:i44 w=54 T1:i97 w=57 T0:i45 w=55 T1:i98 w=58 T0:i46 w=56 T1:i99 w=59 T0:i47 w=57 T0:i48 w=58 T0:i49 w=59 w=10 ``` Enquanto o for se desenlaça, as threads 0 ou 1 vão "apanhando" aleatoriamente o print, mas o w de cada thread inicia-se com o valor definido para o definido w global. Dito isto, cada thread trabalhará um w próprio da stack a partir daquele momento. Após o término das threads, o w mantém-se com o seu valor definido no global pois existe antes da paralelização do programa. ## 1.2 Versão com lastprivate(w) ``` T1:i50 w=0 T1:i51 w=1 T1:i52 w=2 T1:i53 w=3 T1:i54 w=4 T1:i55 w=5 T1:i56 w=6 T0:i0 w=0 T0:i1 w=1 T0:i2 w=2 T0:i3 w=3 T1:i57 w=7 T0:i4 w=4 T1:i58 w=8 T1:i59 w=9 T1:i60 w=10 T1:i61 w=11 T0:i5 w=5 T1:i62 w=12 T0:i6 w=6 T1:i63 w=13 T0:i7 w=7 T1:i64 w=14 T0:i8 w=8 T1:i65 w=15 T0:i9 w=9 T1:i66 w=16 T0:i10 w=10 T1:i67 w=17 T0:i11 w=11 T1:i68 w=18 T0:i12 w=12 T1:i69 w=19 T0:i13 w=13 T1:i70 w=20 T0:i14 w=14 T1:i71 w=21 T0:i15 w=15 T1:i72 w=22 T0:i16 w=16 T1:i73 w=23 T0:i17 w=17 T1:i74 w=24 T0:i18 w=18 T1:i75 w=25 T0:i19 w=19 T1:i76 w=26 T0:i20 w=20 T1:i77 w=27 T0:i21 w=21 T1:i78 w=28 T0:i22 w=22 T1:i79 w=29 T0:i23 w=23 T1:i80 w=30 T0:i24 w=24 T0:i25 w=25 T0:i26 w=26 T0:i27 w=27 T1:i81 w=31 T0:i28 w=28 T1:i82 w=32 T0:i29 w=29 T1:i83 w=33 T1:i84 w=34 T1:i85 w=35 T1:i86 w=36 T0:i30 w=30 T1:i87 w=37 T0:i31 w=31 T1:i88 w=38 T0:i32 w=32 T1:i89 w=39 T0:i33 w=33 T1:i90 w=40 T0:i34 w=34 T1:i91 w=41 T0:i35 w=35 T1:i92 w=42 T0:i36 w=36 T1:i93 w=43 T0:i37 w=37 T1:i94 w=44 T0:i38 w=38 T1:i95 w=45 T0:i39 w=39 T1:i96 w=46 T0:i40 w=40 T1:i97 w=47 T0:i41 w=41 T1:i98 w=48 T0:i42 w=42 T1:i99 w=49 T0:i43 w=43 T0:i44 w=44 T0:i45 w=45 T0:i46 w=46 T0:i47 w=47 T0:i48 w=48 T0:i49 w=49 w=50 ``` Enquanto o for se desenlaça, as threads 0 ou 1 vão "apanhando" aleatoriamente o print, mas o w de cada thread inicia-se com o valor 0 independentemente do valor do w global. Nas threads o valor irá ser incrementado e após estas, o w terá o valor do w da stack da thread realizada por último (ou a última secção - \#pragama section). ## 1.4 Versão com reduction(+:w) ``` T1:i50 w=0 T1:i51 w=1 T1:i52 w=2 T1:i53 w=3 T1:i54 w=4 T1:i55 w=5 T1:i56 w=6 T1:i57 w=7 T0:i0 w=0 T0:i1 w=1 T0:i2 w=2 T1:i58 w=8 T0:i3 w=3 T1:i59 w=9 T0:i4 w=4 T1:i60 w=10 T1:i61 w=11 T1:i62 w=12 T1:i63 w=13 T0:i5 w=5 T1:i64 w=14 T0:i6 w=6 T1:i65 w=15 T0:i7 w=7 T1:i66 w=16 T0:i8 w=8 T0:i9 w=9 T0:i10 w=10 T0:i11 w=11 T1:i67 w=17 T0:i12 w=12 T1:i68 w=18 T0:i13 w=13 T1:i69 w=19 T0:i14 w=14 T1:i70 w=20 T1:i71 w=21 T1:i72 w=22 T1:i73 w=23 T0:i15 w=15 T1:i74 w=24 T0:i16 w=16 T1:i75 w=25 T0:i17 w=17 T1:i76 w=26 T0:i18 w=18 T0:i19 w=19 T0:i20 w=20 T0:i21 w=21 T1:i77 w=27 T0:i22 w=22 T1:i78 w=28 T0:i23 w=23 T1:i79 w=29 T0:i24 w=24 T1:i80 w=30 T1:i81 w=31 T1:i82 w=32 T1:i83 w=33 T1:i84 w=34 T0:i25 w=25 T1:i85 w=35 T1:i86 w=36 T1:i87 w=37 T1:i88 w=38 T0:i26 w=26 T1:i89 w=39 T0:i27 w=27 T1:i90 w=40 T0:i28 w=28 T0:i29 w=29 T0:i30 w=30 T0:i31 w=31 T1:i91 w=41 T0:i32 w=32 T1:i92 w=42 T0:i33 w=33 T1:i93 w=43 T0:i34 w=34 T1:i94 w=44 T1:i95 w=45 T1:i96 w=46 T1:i97 w=47 T0:i35 w=35 T1:i98 w=48 T0:i36 w=36 T1:i99 w=49 T0:i37 w=37 T0:i38 w=38 T0:i39 w=39 T0:i40 w=40 T0:i41 w=41 T0:i42 w=42 T0:i43 w=43 T0:i44 w=44 T0:i45 w=45 T0:i46 w=46 T0:i47 w=47 T0:i48 w=48 T0:i49 w=49 w=110 ``` Enquanto o for se desenlaça, as threads 0 ou 1 vão "apanhando" aleatoriamente o print conforme a iteração, e o w inicia-se com o valor 0 para cada thread. Os w privados das stacks das threads irão ser incrementados independentemente, sendo que no final irá ser somado os valores finais dos ws das threads com o valor global do w. Este valor ser o mesmo do original é pura sorte. ___ ## Ex. 2 Versão original: ``` Dot is 1.2020569031095949 ``` ### a+b) Run \#1 (--cpus-per-task=2): ``` Dot is 1.2020569029595982 ``` Run \#n (--cpus-per-task=2): ``` Dot is 0.0000000001499965 ``` With the same cpus per task (cores), the result may vary depending on how each threads randomly "picks up" the iterations of the for. Run \#2 (--cpus-per-task=4): ``` Dot is 0.0000000005999720 ``` Run \#3 (--cpus-per-task=6): ``` Dot is 0.0000000013500135 ``` Run \#4 (--cpus-per-task=8): ``` Dot is 0.0000000000719980 ``` Varies with the number of cpus per task (cores) because depending on the number of threads, the distribution of the instruction ```dot += a[i]*b[i];``` per thread will vary, aka the value ```i``` will constantly vary, making the memory space alternate between versions. ### c) We can change the code to: And the results will always be: ### d) We can use _reduction_ to do that (and it's more efficient than 'critical').