1 2014-06-09 00:00:03 <phantomcircuit> iirc it's CURRENT but im not sure
2 2014-06-09 00:00:16 <phantomcircuit> wait no it's not that
3 2014-06-09 00:00:20 <sipa> i don't think so
4 2014-06-09 00:00:32 <phantomcircuit> fun
5 2014-06-09 00:00:33 <sipa> but we can always introduce extra keys to indicate a particular version
6 2014-06-09 00:00:56 <phantomcircuit> sipa, i was thinking about adding things like sequence numbers and proper checksums
7 2014-06-09 00:01:07 <phantomcircuit> but that changes the journal format and probably the sorted tables format
8 2014-06-09 00:01:18 <sipa> i don't want to change leveldb
9 2014-06-09 00:01:23 <phantomcircuit> so there would need to be an indicator of whther to use them or not
10 2014-06-09 00:01:31 <phantomcircuit> sipa, no?
11 2014-06-09 00:01:35 <sipa> no
12 2014-06-09 00:01:38 <btc123> ls
13 2014-06-09 00:01:46 <sipa> i thought you were talking about what we're storing inside leveldb
14 2014-06-09 00:01:47 <phantomcircuit> hehe @ btc123
15 2014-06-09 00:01:53 <phantomcircuit> sipa, oh no
16 2014-06-09 00:01:59 <sipa> i'm sure leveldb itself has version markers
17 2014-06-09 00:02:08 <phantomcircuit> im talking about fixing the durability and consistency issues with leveldb
18 2014-06-09 00:02:25 <phantomcircuit> they've been largely solved by fixes to handling under os x
19 2014-06-09 00:02:28 <sipa> we don't need durability
20 2014-06-09 00:02:48 <sipa> we do need consistency and integrity though
21 2014-06-09 00:02:51 <phantomcircuit> sipa, no but we at least need to be able to detect when entries have gone missing
22 2014-06-09 00:02:52 <phantomcircuit> :P
23 2014-06-09 00:02:59 <sipa> no
24 2014-06-09 00:03:14 <sipa> when the last changes are undone, you're just returning to a previous validation state
25 2014-06-09 00:03:22 <sipa> and will redo whatever validation was done since then
26 2014-06-09 00:03:33 <phantomcircuit> right im talking about entries already in a sorted table being screwed up
27 2014-06-09 00:03:37 <gmaxwell> wumpus: thanks for tracking down the leveldb binary incompatiblity w/ arm.
28 2014-06-09 00:03:40 <phantomcircuit> not the journal
29 2014-06-09 00:04:39 <phantomcircuit> 23:53:33-23:46:04 = (53-46)*60 + (33-4) = 449 seconds
30 2014-06-09 00:04:57 <phantomcircuit> 2014-06-09 00:04:54 UpdateTip: new best=000000000000034a7dedef4a161fa058a2d67a173a90155f3a2fe6fc132e0ebf height=200000 log2_work=68.741562 tx=7316696 date=2012-09-22 10:45:59 progress=0.095681
31 2014-06-09 00:05:05 <phantomcircuit> 2014-06-09 00:04:54 UpdateTip: new best=000000000000034a7dedef4a161fa058a2d67a173a90155f3a2fe6fc132e0ebf height=200000 log2_work=68.741562 tx=7316696 date=2012-09-22 10:45:59 progress=0.095681
32 2014-06-09 00:05:17 <phantomcircuit> er
33 2014-06-09 00:05:21 <phantomcircuit> 2014-06-08 23:58:45 UpdateTip: new best=000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f height=0 log2_work=32.000022 tx=1 date=2009-01-03 18:15:05 progress=0.000000
34 2014-06-09 00:06:05 <phantomcircuit> 351 seconds
35 2014-06-09 00:06:30 <phantomcircuit> sipa, 21% reduction in reindex runtime to block 200k
36 2014-06-09 00:06:32 <phantomcircuit> neat
37 2014-06-09 00:06:41 <sipa> that's way more than i would have expected
38 2014-06-09 00:06:58 <goosoodude> Ok, so. 1 thing to remember about the entire project I'm about to work on: I'm 14, and need to study this a lot. It's 98% thinking and brainstorming, and 2% coding. I will get it done, though!
39 2014-06-09 00:07:04 <phantomcircuit> sipa, sha256 is ~20% of the cpu time
40 2014-06-09 00:07:13 <phantomcircuit> it's a single threaded process
41 2014-06-09 00:07:17 <sipa> i know
42 2014-06-09 00:07:31 <phantomcircuit> so i'd have expected less also
43 2014-06-09 00:07:38 <gmaxwell> phantomcircuit: how much of the remaining sha256 is merkle tree computation?
44 2014-06-09 00:07:41 <phantomcircuit> since it didn't eliminate every call to sha256
45 2014-06-09 00:07:47 <sipa> gmaxwell: half
46 2014-06-09 00:08:02 <sipa> gmaxwell: as every txhash is computed once, and the merkle tree is computed once :)
47 2014-06-09 00:08:15 <phantomcircuit> gmaxwell, http://pastebin.com/raw.php?i=W0vqmQ8V
48 2014-06-09 00:08:21 <phantomcircuit> that's with the patch
49 2014-06-09 00:08:26 <Luke-Jr> goosoodude: btw, please be sure to disclose that in any code you submit. we'll probably need your parents to sign a waiver for legal reasons :/
50 2014-06-09 00:08:35 <phantomcircuit> it's now almost entirely recalculating block headers
51 2014-06-09 00:08:46 <sipa> which are computed 10 times apparently
52 2014-06-09 00:08:47 <gmaxwell> (the reason I ask is because the trees could get a 4-way SIMD implementation which might be a ~2x speedup or so)
53 2014-06-09 00:08:49 <phantomcircuit> and checking hashes when reading from disk
54 2014-06-09 00:09:02 <gmaxwell> not worth doing before getting the redundancy out there.
55 2014-06-09 00:09:04 <gmaxwell> er though.
56 2014-06-09 00:09:20 <phantomcircuit> gmaxwell, i dont think there is any redundancy in the merkle tree calculations now
57 2014-06-09 00:09:39 <phantomcircuit> so that would be a nice improvement
58 2014-06-09 00:09:46 <sipa> phantomcircuit: i count 369 seconds btw
59 2014-06-09 00:09:59 <phantomcircuit> sipa, is my timestamp math wrong?
60 2014-06-09 00:10:03 <phantomcircuit> that's entirely possible lol
61 2014-06-09 00:10:27 <sipa> yes
62 2014-06-09 00:10:30 <sipa> vs 449s
63 2014-06-09 00:11:01 <phantomcircuit> 00:04:54-23:58:45 = (64-58) * 60 + (54-45) = 369
64 2014-06-09 00:11:04 <phantomcircuit> ah yeah you're right
65 2014-06-09 00:11:17 <sipa> still neat!
66 2014-06-09 00:11:35 <phantomcircuit> ~17% reduction
67 2014-06-09 00:14:53 <sipa> there are on average 133 transactions in a block
68 2014-06-09 00:16:02 <sipa> if we can save 8 out of 10 block hash computations, that means saving 8 hashes per block, while we saved 4*133 from transactions per block
69 2014-06-09 00:16:33 <sipa> which means we would get 0.25% reindex speed gain from that
70 2014-06-09 00:16:42 <phantomcircuit> heh malloc is looking more and more expensive
71 2014-06-09 00:17:08 <sipa> we waste dynamic memory all the time
72 2014-06-09 00:17:22 <gmaxwell> the profiling often underreports the true cost of the heap allocations too.
73 2014-06-09 00:17:36 <goosoodude> Ite.
74 2014-06-09 00:18:15 <phantomcircuit> it seems pretty common for profiling tools to report things that just seem completely nonsensical
75 2014-06-09 00:18:32 <gmaxwell> sipa: in the last 1000 blocks the median number of transactions is 314. ... yea, so this perhaps suggests that using the SIMD sha256 for it might not yet be a worthwhile excercise.
76 2014-06-09 00:19:12 <sipa> gmaxwell: my 0.25% number was for avoiding duplicate block hash computations, not merkle tree
77 2014-06-09 00:19:28 <sipa> i'm sure with merkle tree hash speed doubling you can gain more
78 2014-06-09 00:21:49 <phantomcircuit> sipa, it definitely seems like it would be worth doing the same thing for CBlock & CBlockHeader (if possible)
79 2014-06-09 00:22:13 <sipa> for 0.25% gain, imho no
80 2014-06-09 00:22:36 <sipa> but we may find some trivial cases to fix, by just passing an extra hash around
81 2014-06-09 00:23:09 <phantomcircuit> sipa, im guessing it's more than 0.25% though
82 2014-06-09 00:23:40 <sipa> is my math wrong?
83 2014-06-09 00:24:43 <phantomcircuit> hmm actually this call graph is only for the first 50k blocks
84 2014-06-09 00:25:06 <phantomcircuit> i should let this run through a complete reindex before looking at it
85 2014-06-09 00:25:09 <phantomcircuit> impatience :P
86 2014-06-09 00:26:49 <phantomcircuit> sipa, http://i.imgur.com/vDhdzH5.png
87 2014-06-09 00:26:58 <phantomcircuit> got a nice laugh at vprintf
88 2014-06-09 00:27:52 <gmaxwell> sipa: right, ... apparently the 4-way SIMD sha256 has 3.4x more throughput than the openssl scalar sha256, assuming perfect loading. so actually the speedup sounds like it would be pretty good for even as few as 300 transactions.
89 2014-06-09 00:28:46 <phantomcircuit> this is going to take a long long time to reindex under valgrind...
90 2014-06-09 00:29:45 <shesek> how much fees would you estimate is needed for a tx of 15kb?
91 2014-06-09 00:30:11 <phantomcircuit> actually the reindex could be pipelined and the consistency checks run in parallel...
92 2014-06-09 00:43:17 <phantomcircuit> 2014-06-09 00:42:48 UpdateTip: new best=000000000001083432dadda634904778fb72b15ec6ac92ff5e00345b82120a6c height=111570 log2_work=60.5591 tx=307097 date=2011-03-03 15:38:25 progress=0.004016
93 2014-06-09 00:43:24 <phantomcircuit> progress bar is depressingly accurate
94 2014-06-09 00:44:48 <goosoodude> So first off, Luke-Jr? By blockchain obfuscation, do you mean transaction anonymity, or do you mean the obfuscation that Ethereum is taking on?
95 2014-06-09 00:45:27 <Luke-Jr> goosoodude: no, just a cheap XOR of the blockchain data on disk
96 2014-06-09 00:45:34 <goosoodude> ok
97 2014-06-09 00:45:40 <Luke-Jr> goosoodude: so braindead software doesn't mistake it as something else
98 2014-06-09 00:45:49 <goosoodude> Ah, ok. I understand it now.
99 2014-06-09 00:46:11 <goosoodude> So, mainly, keep norton and other awful anti-viruses from picking it up.
100 2014-06-09 00:46:12 <Luke-Jr> eg, right now someone can embed a virus signature in the blockchain to make antivirus delete your blockchain files
101 2014-06-09 00:46:15 <Luke-Jr> yeah
102 2014-06-09 00:46:21 <goosoodude> ok
103 2014-06-09 00:47:15 <btc123> if you just XOR it, then they'll put the XOR'd signature in and it will invert and be detected again ;p
104 2014-06-09 00:48:28 <btc123> but yes, probably need some cheap encryption/obfuscation
105 2014-06-09 00:48:33 <gmaxwell> btc123: the 'xor' would be either per host or per txid.
106 2014-06-09 00:49:09 <btc123> gmaxwell: oh good point. lol
107 2014-06-09 00:49:20 <sipa> per txid seems very hard for the blockchain file
108 2014-06-09 00:49:57 <gmaxwell> (oh I missed that the context was the blockchain file, yea, that would be per host or per block hash)
109 2014-06-09 00:52:00 <goosoodude> Would it be appropriate to figure out WHY the antiviruses are detecting it, and move on from there? Or, I mean, should I just tackle it from what I know?
110 2014-06-09 00:55:08 <gmaxwell> We know why. Because people intentionally put virus signatures in txouts.
111 2014-06-09 00:55:48 <uiop> heh
112 2014-06-09 00:55:55 <goosoodude> ok.
113 2014-06-09 00:56:15 <btc123> hah
114 2014-06-09 00:56:41 <phantomcircuit> block 135k is only 1.2% progress
115 2014-06-09 00:56:43 <phantomcircuit> lol
116 2014-06-09 00:56:44 <btc123> goosoodude: has a virus scanner picked it up from you?
117 2014-06-09 00:56:48 <phantomcircuit> this is going to take hours
118 2014-06-09 00:57:11 <uiop> i wonder how big the smallest virus signatures (that are current, whatever) are
119 2014-06-09 00:57:19 <goosoodude> no. Need to download Norton. Obviously, starting with the king of false positives.
120 2014-06-09 00:57:53 <buZz> startkeylogger ? :)
121 2014-06-09 00:58:12 <btc123> anyway, gmaxwell /sipa have a good solution, just xor the block data with the hash. problem solved
122 2014-06-09 00:58:40 <shesek> Luke-Jr, it seems like eligius's pushtx is choking on large transactions
123 2014-06-09 00:58:50 <phantomcircuit> if the progress meter is correct it's going to take me about 2 full days to completely reindex under valgrind
124 2014-06-09 00:58:54 <phantomcircuit> sigh
125 2014-06-09 00:59:07 <phantomcircuit> guess i should move this to a server
126 2014-06-09 01:00:17 <goosoodude> Who's to say it won't pick up on the xor?
127 2014-06-09 01:00:45 <phantomcircuit> goosoodude, common sense and math
128 2014-06-09 01:01:29 <uiop> oh, a virus sig is just a grep or something..
129 2014-06-09 01:02:14 <phantomcircuit> sigh
130 2014-06-09 01:02:28 <btc123> goosoodude: sounds like you have some learning to do,
131 2014-06-09 01:02:29 <phantomcircuit> with xor the data would be completely random
132 2014-06-09 01:02:36 <phantomcircuit> the only signature there is entropy
133 2014-06-09 01:02:53 <phantomcircuit> in which case the av would delete random data files also
134 2014-06-09 01:03:14 <goosoodude> I do. I got into bitcoin because of a learning experience, and I'm going to continue that :P