1 2010-10-24 00:05:20 <Keefe> ArtForz: how packed are your ALUs? looking at the ISA for my kernel, it looks like mine are 91% packed
2 2010-10-24 00:05:36 <Diablo-D3> thats not too unuptimum
3 2010-10-24 00:05:42 <Diablo-D3> the real fucking problem is register rape
4 2010-10-24 00:05:54 <Keefe> i'm not using vectors
5 2010-10-24 00:06:09 <Keefe> dunno how to determine my register usage
6 2010-10-24 00:06:34 <ArtForz> I think I'm ~93%
7 2010-10-24 00:06:39 <Diablo-D3> well, the problem with mining, from what I can tell, is you run out of registers
8 2010-10-24 00:06:45 <Keefe> the .isa file says "GprPoolSize = 0" near the end, but obviously that's not correct
9 2010-10-24 00:07:12 <Keefe> or it doesn't mean what i thought
10 2010-10-24 00:07:15 <ArtForz> mainly semms to be the T unit having nothing useful to do
11 2010-10-24 00:07:40 <Diablo-D3> texture unit?
12 2010-10-24 00:07:51 <ArtForz> nope, the 5th ALU
13 2010-10-24 00:07:53 <Keefe> doesn't the compiler leave the T unit to last when trying to fill ALU's?
14 2010-10-24 00:08:05 <ArtForz> it pretty much has to
15 2010-10-24 00:08:15 <Diablo-D3> is it a special use ALU?
16 2010-10-24 00:08:18 <ArtForz> yep
17 2010-10-24 00:08:26 <ArtForz> Transcendental unit
18 2010-10-24 00:08:27 <Diablo-D3> let me guess, very limited functionality?
19 2010-10-24 00:08:41 <ArtForz> well, not really
20 2010-10-24 00:08:50 <ArtForz> it doens't have floating point add/mul
21 2010-10-24 00:08:59 <Diablo-D3> http://www.beyond3d.com/content/reviews/53/8
22 2010-10-24 00:09:32 <ArtForz> still has basic int ops, but mainly is for sin/cos/...
23 2010-10-24 00:09:50 <Keefe> it can do alot of the same int ops the others can do
24 2010-10-24 00:09:55 <ArtForz> yep
25 2010-10-24 00:10:01 <Diablo-D3> Moving to the artist formerly known as the RysUnit, currently known as the transcendental ALU after it threatened to sue, it remains a rather special chap, being higher precision than the other units (FP40 versus FP32). It can handle transcendentals, just like before, each being single cycle at least according to our measurements. It can also do a single 32-bit INT MUL per cycle, by virtue of it's more accommodating man
26 2010-10-24 00:10:41 <ArtForz> so we have 4 simple + 1 complex 32-bit ALUs
27 2010-10-24 00:10:41 <Diablo-D3> also, this entire article is shit
28 2010-10-24 00:10:46 <Diablo-D3> it was written by someone who doesnt get this shit
29 2010-10-24 00:11:03 <ArtForz> small tiny problem, the external data paths are... 128 bit
30 2010-10-24 00:11:33 <Diablo-D3> I wanna see an updated opencl opt guide from AMD for 6xxx
31 2010-10-24 00:11:47 <Diablo-D3> because the 69xx look potentially interesting
32 2010-10-24 00:12:10 <Keefe> do the TEX sections of the isa code run in parallel with the ALU sections?
33 2010-10-24 00:12:22 <ArtForz> well, kinda but not really
34 2010-10-24 00:12:38 <Diablo-D3> I wish I could read m0's kernel
35 2010-10-24 00:12:43 <Diablo-D3> but its insanely packed
36 2010-10-24 00:12:51 <ArtForz> remember theres several "threads" excuting in a SMT-like fashion
37 2010-10-24 00:13:31 <Diablo-D3> SMT doesnt even entirely define it
38 2010-10-24 00:13:48 <Diablo-D3> its a multi-stage pipeline that can be stuffed due to the insane design of the ALUs
39 2010-10-24 00:13:54 <ArtForz> well, it's kinda SMTs ugly cousin :P
40 2010-10-24 00:14:15 <Diablo-D3> its 5 round 4 item SIMD
41 2010-10-24 00:14:24 <Diablo-D3> and it checks in on it every 20 items
42 2010-10-24 00:15:00 <ArtForz> and then we have independent ALUs and load/store units :P
43 2010-10-24 00:15:15 <Diablo-D3> yes
44 2010-10-24 00:15:25 <Diablo-D3> the load/store units are batshit
45 2010-10-24 00:15:29 <Diablo-D3> but dont get me wrong
46 2010-10-24 00:15:33 <Diablo-D3> its the only way to design this
47 2010-10-24 00:15:39 <ArtForz> = a modern GPU is a pretty complicated beast
48 2010-10-24 00:15:50 <Diablo-D3> nvidia has all this fucking design overhead because they keep making the pipeline longer
49 2010-10-24 00:16:08 <Diablo-D3> instead of allowing multi-issue pipeline shit like AMD did
50 2010-10-24 00:16:12 <ArtForz> well, it feels kinda similar to old parallel supercomputers
51 2010-10-24 00:16:33 <Diablo-D3> ArtForz: well
52 2010-10-24 00:16:41 <Diablo-D3> nvidia, from what I can tell, it feels like normal SIMD
53 2010-10-24 00:16:46 <Diablo-D3> and not a very optimum one at that
54 2010-10-24 00:17:17 <Diablo-D3> amd is VLIW on top of SIMD
55 2010-10-24 00:17:23 <Diablo-D3> heavy on the VLIW side
56 2010-10-24 00:17:26 <ArtForz> while ATI went with crazy paradigm mix
57 2010-10-24 00:17:56 <Diablo-D3> well, VLIW with huge ass register banks that any ALU can access works great
58 2010-10-24 00:18:03 <ArtForz> yep
59 2010-10-24 00:18:19 <Diablo-D3> well optimized code just babysits ALU in/out
60 2010-10-24 00:19:05 <ArtForz> yep
61 2010-10-24 00:19:21 <Diablo-D3> it just means you have these insanely huge ALUs
62 2010-10-24 00:19:46 <Diablo-D3> but, on the flip side, you can send stuff into the pipeline while stuff is still waiting to come out
63 2010-10-24 00:20:03 <Diablo-D3> because it has coupled stages instead of one giant monolithic pipe
64 2010-10-24 00:20:33 <Diablo-D3> and I bet it can exit stages early for various reasons
65 2010-10-24 00:20:52 <ArtForz> what feels weird, the whole thing appears as 2-group register latency in ASM
66 2010-10-24 00:21:23 <ArtForz> = write to a reg in VLIW group 1, you can read it again in group 3
67 2010-10-24 00:22:02 <ArtForz> of course if you don't use indexed regs, you can just use the prev vector/scalar path and also use the output in group 2
68 2010-10-24 00:22:37 <Diablo-D3> the problem is writing the compiler
69 2010-10-24 00:22:44 <ArtForz> yep
70 2010-10-24 00:22:47 <Diablo-D3> a compiler that can actually count timing is difficult as fuck
71 2010-10-24 00:22:51 <Diablo-D3> ask the gcc guys
72 2010-10-24 00:22:58 <ArtForz> yep
73 2010-10-24 00:23:24 <Kiba> why you say yep all the time
74 2010-10-24 00:23:34 <Kiba> being a yesman for Diablo-D3
75 2010-10-24 00:23:40 <Diablo-D3> heh
76 2010-10-24 00:23:45 <ArtForz> well, because thats how it is
77 2010-10-24 00:23:57 <Diablo-D3> ArtForz: I really should look into doing opencl
78 2010-10-24 00:24:00 <Diablo-D3> it cant be THAT hard
79 2010-10-24 00:24:18 <Diablo-D3> I know glsl, I know how to code massively parallel code
80 2010-10-24 00:24:18 <Keefe> my kernel's isa has: 1103 ADD, 364 AND, 1076 BIT_ALIGN, 175 LSHR, 241 OR, 1074 XOR
81 2010-10-24 00:24:28 <ArtForz> if I find the time I wanna move away from OpenCL
82 2010-10-24 00:24:37 <Diablo-D3> what, into straight IL?
83 2010-10-24 00:24:42 <ArtForz> yep
84 2010-10-24 00:24:54 <ArtForz> CAL + IL kernel
85 2010-10-24 00:25:05 <Diablo-D3> yeah, thats too much like coding in assembly for me
86 2010-10-24 00:25:05 <Keefe> was going to ask if you already had
87 2010-10-24 00:25:40 <Keefe> only about 4K ops, not too crazy to attempt :)
88 2010-10-24 00:25:50 <ArtForz> yep
89 2010-10-24 00:26:07 <Diablo-D3> Keefe: well the big thing is
90 2010-10-24 00:26:12 <Diablo-D3> this code SHOULD run quickly
91 2010-10-24 00:26:18 <Diablo-D3> its not especially complex code
92 2010-10-24 00:26:22 <ArtForz> and 90% of that is just the same thing 122 times
93 2010-10-24 00:26:24 <Diablo-D3> its just vastly repetative
94 2010-10-24 00:26:28 <Diablo-D3> yeah
95 2010-10-24 00:26:45 <Diablo-D3> I should bang out m0
96 2010-10-24 00:26:46 <Diablo-D3> ser
97 2010-10-24 00:26:50 <Diablo-D3> I should bang out m0's code on java
98 2010-10-24 00:27:00 <Diablo-D3> just to try out that AMD thing
99 2010-10-24 00:27:05 <Diablo-D3> java bytecode -> opencl
100 2010-10-24 00:27:28 <ArtForz> I kinda like what the CALPP guys did
101 2010-10-24 00:27:57 <ArtForz> C++ -> IL using mainly templating...
102 2010-10-24 00:27:59 <Diablo-D3> not particularly interested in c++
103 2010-10-24 00:29:00 <ArtForz> at least it appears that way
104 2010-10-24 00:29:39 <ArtForz> as in, it's using the C++ compiler to produce IL, then hopes the CAL IL -> ASM compiler is smart enough to optimize the result
105 2010-10-24 00:30:19 <Diablo-D3> well
106 2010-10-24 00:30:24 <Diablo-D3> Im interested in how AMD's shit work
107 2010-10-24 00:30:27 <Diablo-D3> because it doesnt work on nvidia
108 2010-10-24 00:30:34 <Diablo-D3> Im wondering if its outputting IL or ASM directly
109 2010-10-24 00:31:28 <Diablo-D3> ArtForz: btw, I think I need a list of test data
110 2010-10-24 00:36:01 <Keefe> ArtForz: how many alu ops in your kernel isa? my total is 4036 (including a few odd ones, not including tex ops)
111 2010-10-24 00:36:39 <Keefe> i'm thinking you must have squeezed it down to fewer ops, to get 6% more mhps with only 2% better alu packing
112 2010-10-24 00:38:40 <Diablo-D3> I wonder if I can beat the 75m art thinks I should get
113 2010-10-24 00:39:59 <Keefe> i guess i should try m0's code sometime for comparison
114 2010-10-24 00:40:16 <Diablo-D3> because I don't think anybody's code is optimal enough
115 2010-10-24 00:40:34 <Diablo-D3> ArtForz: is there a test list of shit?
116 2010-10-24 00:41:45 <Keefe> test for what?
117 2010-10-24 00:41:56 <Diablo-D3> for mine attempts
118 2010-10-24 00:42:11 <Diablo-D3> if you have x input, the output should be y
119 2010-10-24 00:43:10 <ArtForz> 3931
120 2010-10-24 00:43:50 <Keefe> just modify the bitcoin code such that it does some cpu hashing of the same data at the same time
121 2010-10-24 00:44:01 <Keefe> as the gpu code
122 2010-10-24 00:44:11 <Diablo-D3> Keefe: but I dont use bitcoin for this
123 2010-10-24 00:44:32 <Diablo-D3> nor do I wanna touch that code with a ten foot pole
124 2010-10-24 00:44:50 <Keefe> guess i don't know what you're talking about
125 2010-10-24 00:44:53 <ArtForz> in 848 VLIW clauses
126 2010-10-24 00:45:03 <Diablo-D3> Keefe: the miner isnt part of bitcoin
127 2010-10-24 00:45:07 <ArtForz> so ~ 92.7% packing
128 2010-10-24 00:45:08 <Diablo-D3> it uses the getwork patch
129 2010-10-24 00:45:15 <Diablo-D3> and art's does some other weird shit
130 2010-10-24 00:45:35 <Keefe> mine is custom also
131 2010-10-24 00:46:02 <Keefe> so it's not hard for me to code it to run cpu at the same time for a subset as a test
132 2010-10-24 00:47:12 <Keefe> i have 887 alu and 12 tex clauses
133 2010-10-24 00:47:18 <Diablo-D3> Keefe: yes but
134 2010-10-24 00:47:24 <Diablo-D3> the getwork patch only gets work
135 2010-10-24 00:47:33 <Diablo-D3> theres no way to ask the client for the right answer
136 2010-10-24 00:47:49 <Keefe> i see
137 2010-10-24 00:48:31 <ArtForz> Diablo-D3: blkXXXX.dat has ~ 80k test vectors ;)
138 2010-10-24 00:49:00 <Diablo-D3> ArtForz: yeah, but how do I read it?
139 2010-10-24 00:50:33 <ArtForz> you only need the block header, bitcointools has the needed parts
140 2010-10-24 00:58:42 <Diablo-D3> bitcointools?
141 2010-10-24 00:58:51 <ArtForz> http://github.com/gavinandresen/bitcointools
142 2010-10-24 01:05:49 <Diablo-D3> ugh
143 2010-10-24 01:05:54 <Diablo-D3> ArtForz: how do I get the shit out?
144 2010-10-24 01:08:01 <Keefe> i think i remember a link to download it in one tar
145 2010-10-24 01:08:11 <Diablo-D3> Keefe: not use git you fool
146 2010-10-24 01:08:11 <theymos> Click the "downloads" button.
147 2010-10-24 01:08:29 <Diablo-D3> Im talking about which python script does what I want
148 2010-10-24 01:09:49 <Keefe> you'll probably want to modify one to output just what you want, either that or lots of post processing
149 2010-10-24 01:10:05 <Keefe> isn't there a readme?
150 2010-10-24 01:10:12 <Diablo-D3> the readme is retarded
151 2010-10-24 01:11:44 <Keefe> read dbdump.py and figure out what it can do. that's what i did
152 2010-10-24 01:12:02 <Diablo-D3> fucking python
153 2010-10-24 01:12:09 <Diablo-D3> why the fuck are people using python to begin with
154 2010-10-24 01:12:19 <Keefe> not the worst language :)
155 2010-10-24 01:12:26 <Diablo-D3> its pretty up there
156 2010-10-24 01:12:30 <nameless> |Diablo-D3: Because it's better than ruby
157 2010-10-24 01:12:39 <Diablo-D3> nameless|: thats not a good excuse.
158 2010-10-24 01:12:40 <Keefe> or perl
159 2010-10-24 01:12:44 <Diablo-D3> Keefe: fuck you.
160 2010-10-24 01:12:49 <Diablo-D3> perl > python every day of the week
161 2010-10-24 01:13:24 <Keefe> ugh, i realize it's really popular. but last time i tried to understand perl i really didn't like it
162 2010-10-24 01:13:36 <ArtForz> if you write code, yes, if you have to maintain someone elses code... not so much
163 2010-10-24 01:13:37 <Keefe> i'd rather use c
164 2010-10-24 01:14:02 <Keefe> my native lang is vb.net
165 2010-10-24 01:14:04 <ArtForz> the designers made it pretty damn hard to write unreadable python
166 2010-10-24 01:14:13 <Diablo-D3> Keefe: no wonder you didnt get perl, you're braindamaged
167 2010-10-24 01:14:24 <ArtForz> my fav lang is C
168 2010-10-24 01:14:30 <Diablo-D3> ArtForz: they made is also pretty damn hard to write useful python
169 2010-10-24 01:14:41 <ArtForz> yep
170 2010-10-24 01:14:54 <nameless> |Diablo-D3: It's better than PHP?
171 2010-10-24 01:14:59 <nameless> |It's better than basic?
172 2010-10-24 01:15:07 <Keefe> i'll admit vb.net has made me lazy
173 2010-10-24 01:15:09 <nameless> |It's better than brainfuck?
174 2010-10-24 01:15:13 <ArtForz> it's also better than brainfuck
175 2010-10-24 01:15:19 <Diablo-D3> brainfuck is a different classification of language
176 2010-10-24 01:15:27 <Diablo-D3> python is for complete utter noobs
177 2010-10-24 01:15:34 <Diablo-D3> why would I want to use code by noobs
178 2010-10-24 01:15:34 <nameless> |Diablo-D3: it is a language and you can code it in brainfuck
179 2010-10-24 01:15:42 <ArtForz> might be on par with whitespace
180 2010-10-24 01:15:50 <Diablo-D3> ffff whitespace
181 2010-10-24 01:15:59 <Diablo-D3> speaking of whitespace
182 2010-10-24 01:16:04 <Diablo-D3> fuck you python
183 2010-10-24 01:16:18 <Diablo-D3> and fuck all of you fuckers who think its okay to use tab outside of java.
184 2010-10-24 01:16:23 <ArtForz> thats one area where the python desigenrs fucked up royally
185 2010-10-24 01:16:35 <Diablo-D3> two fucking spaces.
186 2010-10-24 01:16:37 <Diablo-D3> not a tab.
187 2010-10-24 01:16:41 <Diablo-D3> not one or three or more spaces.
188 2010-10-24 01:16:42 <Diablo-D3> two.
189 2010-10-24 01:16:44 <Diablo-D3> TWO.
190 2010-10-24 01:16:45 <ArtForz> either it's spaces or tabs, don't f*ing allow both in a language with syntactic whitespace