Saturday 21 January 2017

KaZaa

Hi,
I was reading more about the paper aiming to give us insight about how Kazaa works, and here are a few points which got me thinking:

1) "KaZaA hashes every file to a hash signature, which becomes the ContentHash of the file"

I understand that Content Hash helps in uniquely identifying a file and its copies and also helps detect bogus files. But the paper does not speak much about how these have been constructed. 

From this paper: Malware Prevalence in the KaZaA File-Sharing Network which I found on the internet
"The KaZaA content hash is 20 bytes in size: the first 16 bytes are the MD5 [15] of the first 300 Kbyte of the file. The last 4 bytes are the value of the custom made hash function of the length of the file"

2) "When a SN receives a query, it may forward the query to one or more of the SNs to which it is connected." followed by 
"Our measurement work has determined that SNs often change their SN-to-SN connections on a time scales of tens of minutes" 
-- Are they implying that an SN ache is maintained with the metadata from previously visited nodes? Because they have stated the contrary on Page 2 of the paper saying "Our investigations have determined that SNs do not
cache metadata when ONs disconnect from them."

3) DBB Files: Does the 'active monitor' reside in the SN and keeps polling for any changes to its copy of the DBB? Is this something obvious or is there something I'm missing out?

4) "We have determined that, as part of the signalling traffic, KaZaA nodes frequently exchange with each other lists of supernodes"

The paper claims that this is to ensure locality-aware ON-SN & SN-SN connections. But I don't under why KaZaa would need its ONs to have this list because unlike Kazaa Lite they are not skipping between SNs.

5) With regards to their own tests, they state:
" When one of the workstations was promoted to a SN, we manipulated the Windows Registries in the other two ONs so that each of the two registries listed only the promoted SN" 
but later on in the paper state that
"First, we have observed that initially at startup, an ON probes candidate SNs listed in its Supernode List Cache with UDP packets for possible connections"
as the reason for noting many ON-SN connections within a stipulated time."

If the first point is true, why would the 2nd point have happened at all? I'm sorry if I am missing out some information here

6) Definition of Round Trip Time

7)  "It can be observed from the plot that 13% of the ON peers are responsible for over 80% of the meta-data uploaded. It is interesting to compare this data with results reported in [6], wherein on University of Washington campus, 8.6% of KaZaA peers were serving 80% of the requests."
I don't get the importance of checking how much meta-data was uploaded, is this not very subjective to the peers themselves and how much sharable data they contain?

I can't seem to come to terms with a few of the conclusions in this paper:
While they themselves claim that they have tested only from one location, isn't it rather a small local test bed? 
They forced the test bed to stick to a particular SN, I am not sure if their own version of KaZaa clients used this. And if they did, how do they claim to use the 200 node list that SN passes to ON?
I also found extrapolation of results in the paper for higher scale networks, but I'm not sure if extrapolating is the best way to proceed.



Your thoughts about these are welcome!
Best Regards,
Ayushi

1 comment:

  1. It is fine to present limited results as long as the paper does not over-claim the significance of that.

    If you have fine-grain comments about experiments or how things really work, then you can always contact the authors!

    ReplyDelete

Note: only a member of this blog may post a comment.