May 4, 2012
Another sysadmin stunt.
0 Comments >>
I've got an old machine at home. It was not turned for quite some time, and since it's running arch i was quite afraid that doing just regular update would suerly break something.
And since all my graphics cards are broken and i don't have any usefull keyboard anymore at home... Well, desperate times calls for desperate measure.
So, what do I have?
- 16 GB USB stick
- Ethernet cable
- Laptop
- SSH server
So, let's start. We've got .tar image of system ready, it just needs to be copied over the current instalation, while the system is running. In other words, system must overwrite itself, while it's running. Sounds impossible? On Windows might be, but here? Rad on ;-)
I've copied a root image to USB stick key. Plugged the stick in, and ... Forgot about it. Seriously.
Then i've did rm -rf /home/* and mounted root on /mnt, and copied all the contents to /home/, bind some filesystems over, and copied /dev over. Something like this
::vbox ~# rm -rf /home/*
::vbox ~# mount --bind / /mnt
::vbox ~# cp -a /mnt/* /home/
::vbox ~# cp -a /dev /home/dev/
::vbox ~# mount --bind /proc /home/proc/
::vbox ~# mount --bind /sys /home/sys/
::vbox ~# mount -t devpts foo /home/dev/pts
Yes, i've created an alternative root. Now, we have to just switch to it. Over the internet. Easy :-)
Ran htop, and killed everything except ssh. Then a little magic comes to play. I've edited /home/etc/ssh/sshd_config to listen on alternative port. Why? You'll see.
And simple magic. Behold! Same trick linux uses during booting!
::vbox ~# cd /home/
::vbox /home/# pivot_root . mnt/
What have we done? We've instructed the kernel to change the root for all procesess. Not something like chroot, but actually change it and put old root into mnt/.
Nice. Now we startup our ssh daemon to listen on another port. Simple
::vbox /home/# /etc/rc.d/sshd start
Disconnect, reconnect on new port and kill of the old one. Our old root almost ready to be unmounted we just have to instruct INIT (pid 1) to switch to that new root. Again, simple.
::vbox /home/# telinit u
Great, so the entire system now runs on another root. So for instance if old root was on device /dev/sda1, we've migrated entire system to run on let's say /dev/sdb2 while it was running. Pretty neat, eh? :)
Now we just unmount all filesystems on old root, umount the root itself, reformat it and install our systems image. Routine stuff. At least for me.
Filled under: None
April 16, 2012
Getting stack trace on crash witout going into gdb.
0 Comments >>
Again, another note for myself, so here it goes.
When developing something, sooner or latter you're going to get some SIGSEGV or segmentation faults. If you're like me, and having terminal integrated into kate, so I can do bash-fu magic while coding... Well...
First, you have to make sure that OS will dump core on application crash. Easy.
ulimit -c unlimited
When application recives SIGSEGV it exits (by default) with exit code 139. Nice.
A little bit of bash-fu, results in this:
<build command(s)> && <execute file> ; if [ $? -eq 139 ]; then gdb -q -n -ex bt -batch <output file> core ; mv core core.bak ; fi;
Or in practice:
n00b ~/Temp >> CFLAGS="-ggdb" make && ./fucku ; if [ $? -eq 139 ]; then gdb -q -n -ex bt -batch fucku core ; mv core core.bak ; fi;
[ ... SNIP ... ]
Segmentation fault (core dumped)
[New LWP 4666]
warning: Can't read pathname for load map: Input/output error.
Core was generated by `./fucku'.
Program terminated with signal 11, Segmentation fault.
[ ... SNIP ... ]
#3 0x08048b19 in InsertKey (kt=0x9ef300c, cktp=0xbf84dd50, chr=97 L'a', kl=5) at fucku.cpp:85
#4 0x08048c34 in encode (inp ... cde") at fucku.cpp:101
#5 0x08048cd6 in main () at fucku.cpp:116
I'm loving it!
Filled under: None
April 6, 2012
Setting up IP over DNS (NSTX) tunneling with Iodine
2 Comments >>
Iodine is an excellent implementation of NSTX concept. NSTX is more commonly known as IP over public DNS. Yes, you've read it correctly. NSTX is tunneling IP packets over public DNS infrastructure.
Why would anyone want to do that?
Public wifi hotspots saying: "You have to pay to use this service" or "You can only use it for 30 minutes". Bleh. They usually have all traffic firewalled, except DNS which is left intact.
So by tunneling IP over DNS you can have unrestrited access to internet even in these networks.
How does it work?
It's simple. You need a domain name and a spare IP address to use as a DNS server. Or mess around with BIND a bit.
Concept is extremly simple. For instance, you want to send IP packet to the internet, however only available service on your local network is DNS chaching server.
Your client encodes IP packet as base32, and sends it as a DNS request. For instance if Iodine nameserver is responsible for: iodine.homenet.org, when trying to transmit IP packet, well send:
mzshgzttmrthez3smvtwizt ... m43eozshgzttmrthgz.iodine.homenet.org
Our "fake" DNS server server will decode the request, reconstructed IP packet, and send it off to internet. In response we might get TXT field with downstream traffic.
Or put in TL;DR; way:
- I want to send data to IP 1.2.3.4
- Encode packet as base32
- Send DNS request as: base32encodedpacket.something.domain.tld. to local DNS server
- DNS server responsible (our fake server) for something.domain.tld. eventually gets the request
- Decodes base32, reconstructing the packet
- Sends it of to the internet.
- In response for recived DNS query sends base64 encoded TXT record, with encoded IP packets destined for client.
- I recive DNS response
- Decode TXT field, reconstructing packets
- ????
- INTERNET!
There, clear?
Now, setting iodine server (fake DNS).
Requirements:
- Spare publicly reachable IP address (can be behind a NAT, but make sure UDP:53 is reachable from internet).
- Linux box
- root
- Installed iodine server (i hope you know how to compile, or install packages on your system)
You don't need your own domain name. Just head off to freedns.afraid.org, register and pick something.
Have you picked your subdomain of choice? Let's say, I have decided, that it will be iodine.homenet.org (which actually is). We add iodine(.homenet.org) as a subzone to our fake DNS server by creating two records: A and NS.
iodinens.ignorelist.com A <your DNS server IP address>
iodine.ignorelist.com NS iodinens.ignorelist.com
This is just an example, your NS record can point to any record that it will eventually point to an IP address. For instance my ns record actually points to knuples.net (as seen from dnstracer):
n00b ~ >> dnstracer -s . -4 z123.iodine.homenet.org
Tracing to z123.iodine.homenet.org[a] via A.ROOT-SERVERS.NET, maximum of 3 retries
A.ROOT-SERVERS.NET [.] (198.41.0.4)
|\___ d0.org.afilias-nst.org [org] (199.19.57.1)
| |\___ ns2.afraid.org [homenet.org] (174.37.196.55)
| | |\___ knuples.net [iodine.homenet.org] (2001:0470:006c:00ea:0000:0000:0000:0002) Not queried
| | \___ knuples.net [iodine.homenet.org] (93.103.205.91) Got authoritative answer [received type is cname]
| |\___ ns4.afraid.org [homenet.org] (174.128.246.102)
| | |\___ knuples.net [iodine.homenet.org] (2001:0470:006c:00ea:0000:0000:0000:0002) Not queried
| | \___ knuples.net [iodine.homenet.org] (93.103.205.91) (cached)
Now comes the testing. Let's see if it works. On our linux machine, we startup Iodine, as:
iodined -P <password> -f <some private IP address> <your subdomain>
Oh, private IP addresses ranges from (pick one unused):
- 10.0.0.0 - 10.255.255.255
- 172.16.0.0 - 172.31.255.255
- 192.168.0.0 – 192.168.255.255
Or put my way:
# iodined -P redacted -f 172.16.13.37 iodine.homent.org
You can check it at this address: http://code.kryo.se/iodine/check-it/
If it works, you're OK. Now for one small more thing.
Setting up routing & IP fowarding. It's actually quite simple, like setting up NAT-ing on your linux router (if you're crazy enough to have your linux box as a router to the internet). I have written a small shell script for it (my machine is behind a NAT, so... Well, yes, i'm doing double NAT-ing when going trough Iodine, but oh well...):
#!/bin/bash
PASSWORD="yourpassword"
DOMAIN="your.domain.tld"
PRIVATEIP="172.16.13.37"
# Setting up iodine
echo -n "Checking external IP addr ... ";
rip=$(wget -O - -o /dev/null http://automation.whatismyip.com/n09230945.asp);
if [ "$rip" == "" ]; then exit 1;
else echo "$rip"; fi;
echo "Startin iodined...";
iodined -c -n $rip -P "$PASSWORD" $PRIVATEIP $DOMAIN
# Setting up correct routing
echo 1 >/proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s "$PRIVATEIP/27" -o $(ip route show | grep default | awk '{ print $5}') -j MASQUERADE
Server is now complete. Startup server and on some test machine type (-r switch is verry important for testing NSXT tunnel!):
# sudo iodine -f -r -P <password> <yourdomain>
You should see something like this:
n00b ~ >> sudo iodine -f -r -P redacted iodine.homenet.org
Opened dns0
Opened UDP socket
Sending DNS queries for iodine.homenet.org to 84.255.210.79
Autodetecting DNS query type (use -T to override).
Using DNS type NULL queries
Version ok, both using protocol v 0x00000502. You are user #0
Setting IP of dns0 to 172.16.13.33
Setting MTU of dns0 to 1130
Server tunnel IP is 172.16.13.37
Skipping raw mode
Using EDNS0 extension
Switching upstream to codec Base128
Server switched upstream to codec Base128
No alternative downstream codec available, using default (Raw)
Switching to lazy mode for low-latency
Server switched to lazy mode
Autoprobing max downstream fragment size... (skip with -m fragsize)
768 ok.. 1152 ok.. ...1344 not ok.. ...1248 not ok.. ...1200 not ok.. 1176 ok.. 1188 ok.. will use 1188-2=1186
Setting downstream fragment size to max 1186...
Connection setup complete, transmitting data.
Try to ping your server via your tunnel.
n00b ~ >> ping 172.16.13.37
PING 172.16.13.37 (172.16.13.37) 56(84) bytes of data.
64 bytes from 172.16.13.37: icmp_req=1 ttl=64 time=30.4 ms
64 bytes from 172.16.13.37: icmp_req=2 ttl=64 time=30.0 ms
^C
--- 172.16.13.37 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 30.092/30.266/30.440/0.174 ms
Nice, eh? You can also try to ping wide internet via your tunnel, to check if NAT-ing your iodine server works fine (again -I dns0 is important here).
n00b ~ >> sudo ping google.com -I dns0
PING google.com (209.85.148.113) from 172.16.13.33 dns0: 56(84) bytes of data.
64 bytes from fra07s07-in-f113.1e100.net (209.85.148.113): icmp_req=1 ttl=52 time=73.7 ms
64 bytes from fra07s07-in-f113.1e100.net (209.85.148.113): icmp_req=2 ttl=52 time=76.1 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 73.759/74.957/76.156/1.229 ms
If you don't get any reply, check your NAT-ing at DNS server.
Setting up client for all traffic to go trough Iodine.
If we want to route all traffic via DNS, we have to fiddle a bit with a routing table on a client. It's quite simple actually.
You set up default route trough your DNS tunnel, and route to your current DNS server as a special route. Again, small script i've wrote to achieve this:
#!/bin/bash
DOMAIN="your.domain.tld"
iodine -r $DOMAIN
DFGW=$(route | grep UG | grep default | awk '{ print $2}');
NSSRV=$(cat /etc/resolv.conf | grep 'nameserver' | awk '{ print $2}' | head -1);
echo Default gw is $DFGW, NS is: $NSSRV...
route del default
route add -host $NSSRV gw $DFGW
route add default gw 172.16.13.37
And that's pretty much it. Now when you are in a restricted network, just run you client setup script, and you're good to go. :-)
Conclusion:
It rocks. Speeds are not top-notch, but hey, i will not use this to download torrents!
Oh, if someone really want's to download those two config scripts:
Filled under: None
March 11, 2012
Copying large files without trashing page cache.
0 Comments >>
Page cache is a portition of unused RAM, that is used to cache data from disk to speed up I/O operations. That's why starting up firefox for the first time you boot your computer takes a long time, after that it is almost instant.
Copying large files can cause excessive trashing of this cache. It's nothing criticall, however it does reduce performance a bit.
Except if you're machine is already doing a lot of I/O or doing I/O on slow device. Then page cache does act as a tempoary perfomance boost, keeping datain RAM untill the OS gets the chance to write it to disk.
What if you don't want to trash cache while copying say large files? Let's say beacause you have a Virtual Machine instance running doing a lot of I/O? Sure, linux does offer O_DIRECT which tells linux to try to avoid page cache when doing I/O on given file descriptor, but using it from shell?
mbuffer - measuring buffer
mbuffer is simple program, which can act as a buffer between two programs in unix pipeline. Linux itself does have 64KB buffer between each program in pipeline, however sometimes this just isn't enough.
It can also read directly from file, write directly to file, and you can specify O_DIRECT. Perfect!
~$ mbuffer -i complete-1-2011.squash -s $((1024*1024)) -b 32 --direct -o /mnt/complete-1-2011.squash
Write 1MB size blocks, use 32 blocks in memory as a cache, and tell the OS not to do it's own caching, thus leaving it in intact state (and being to serve write-heavy Virtual Machine with too little ram in my case).
Wonderfull :-)
Filled under: None
February 2, 2012
Complex systems fail in a complex way.
2 Comments >>
... or in hilarious way. Sometimes even both!
Today, a small JavaScript bug in private Travian server managed to knock down entire server for about 30 minutes. Funny thing to say, but I was actually being DDoSed by my own players!
Server received so many request that not only it completly saturated CPU but also ate all gigabyte of ram and started to trash extensively. For comparison, peak legit traffic never caused Apache to consume more than 200MB of memory.
Of course signs of dangers were shown every once a while since travian server was first ran. They manifested themselves as a big CPU spikes on munin. But this time it was different. A fortification was being build for one player who was under attack. Many players were logged into single account, watching countdown until army arrival. When counter reached zero all hell broke loose.
When any of counters in Travian reaches zero, JavaScript issue a page refresh. Due to a bug, timer is not stopped, but continues to count upwards, issuing refresh every second.
Usually this was not a big problem, because pages were always served within a second, thus restarting all counters. In case if it weren't it wasn't such a big problem either, one request / sec was coming from one user and it was easy to handle (sooner or later page would be served within a second thus resetting the timer).
This time, timer expired for five users at the same time, thus five users were issuing page request in quick burst. Five requests per second in short burst is not a problem for this machine, however they are almost never served within a second (usually it is around 1.2-1.5s). Since these refreshed were not caused by human, but by JavaScript it was expected by JavaScript for page to be served within a second.
Of course requests were not handled within a second, and another burst of requests were fired by JavaScript, adding to the previous already active ones. This in turn even more increased latency, and in turn causing browsers of those five players to issue more and more request adding to already existing ones.
Needless to say it completely saturated CPU killing all hopes of ANY page being served within a second. Sooner or later timers from other players reached zero as well, thus starting same behavior.
Since each new request added to the old ones and decreased performance of overall system, server load rose almost exponentially (in about a minute it shoot itself from 5 to 20) and with it was killing it's performance. In few minutes time, the server was almost unresponsive and was trashing extensively. And new requests were still coming in.
How was solved? :-)
I've killed webserver, fixed the javascript and started it again.
So long story short: Due to JavaScript bug got DDoSed by my very own players. YAY! -.-"
Filled under: None
January 30, 2012
Making poorly written application scale horizontally.
0 Comments >>
So, some kids have decided to write a Travian 3.5 clone. I assume they're kids as their code quality is less than bad. I've took it nonetheless and setup a small private Travian server.
Query performance is horrible, architecture is badly designed & written, and they've done some stuff that no-one in right mind wouldn't do for MMOG. So, how to make it scale? Since we're running this on gigaherz sempron from 2004 it is damn importat to be able to add another (just as poor) node into system.
First, we must understand what the application does.
DATABASE MAYHEM.
Database indexes? Never heard of, of course (what were you expecting from kids). Average page hit sends about 100-300 queries to database. I've wrote a simple SQL profiler, used it to subvert all scripts communcation with database, just to see where bottlenecks are and where to define indexes.
This is something which you SHOULD NOT do (snippet from profiler):
TOTAL SCIRPT EXECUTION TIME: 0.28582906723022 s
-> DB: 0.16272 s (56.93%)
-> PHP: 0.12311 s (43.07%)
SQL QUERY PROFILING INFORMATION (127 QUERIES):
- 0.0000000000000, CONNECTED TO MYSQL: travian@localhost
- 0.0005202293395, SELECT uid FROM s1_deleting where timestamp < 1327886379
- 0.0016539096832, DELETE from s1_users where timestamp >= 1327886379 and act != ''
- 0.0012409687042, UPDATE s1_vdata set `wood` = `maxstore` WHERE `wood` > `maxstore`
- 0.0010890960693, UPDATE s1_vdata set `clay` = `maxstore` WHERE `clay` > `maxstore`
- 0.0010509490966, UPDATE s1_vdata set `iron` = `maxstore` WHERE `iron` > `maxstore`
- 0.0010240077972, UPDATE s1_vdata set `crop` = `maxcrop` WHERE `crop` > `maxcrop`
- 0.0010919570922, UPDATE s1_vdata set `wood` = 0 WHERE `wood` < 0
- 0.0010509490966, UPDATE s1_vdata set `clay` = 0 WHERE `clay` < 0
- 0.0010318756103, UPDATE s1_vdata set `iron` = 0 WHERE `iron` < 0
- 0.0010809898376, UPDATE s1_vdata set `maxstore` = 4000 WHERE `maxstore` <= 4000
- 0.0010581016540, UPDATE s1_vdata set `maxcrop` = 4000 WHERE `maxcrop` <= 4000
- 0.0009779930114, SELECT * FROM s1_vdata WHERE loyalty<>100
- 0.0006051063537, SELECT * FROM s1_odata WHERE loyalty<>100
- 0.0009360313415, SELECT * FROM s1_hero
- 0.0017030239105, SELECT * FROM s1_vdata where celebration < 1327886379 AND celebration != 0
... and list goes on and on ...
Oh, i've must have been lucky, it's only 127 queries. Usually it's more.
So, what it does with all that queries? Interesting, the events are handled at the time a page hit happens, not by some independed deamon. This is bad. Bad. Just bad. This processing usually takes more than 50-75% of script's execution time.
Riping that code off into independed deamon (as it should be done) is nearly impossible due to the fact the whole code is just one big spaghetti mess.
THREAD-SAFETY IS A VOODO, I GUESS.
So, now that we know that events are being handled at hit ... What happens when two hits happen at the same time?
It doesn't take a rocket scientist to figure out that you can end up with fucked-up database with a little bit of bad luck. Or if not that it can be sourse of nasty bugs that are nearly impossible to debug. Great, right?
So, now we need to ensure a way that only one script can execute at the time, thus securing data in the database (i haven't come to scalability part yet, hang with me). Locks were designed for this.
So, what kind of lock do we need?
- Simple to implement
Do not forget that the clone itself is big bunch of spaghetti code. So the solution must be simple as possbile, to avoid introducing some new bugs. And beacause we would like to get stuff done without (too much) swearing. - Robust
Lock must be released in any case of script's termination, be it normal or abnormal. If PHP crashes, lock must be released. If PHP scripts is terminated half way, lock must be released. If meteor hits webserver, lock must be released. - Scalable
If we run script on 10 machines, still must be only one script at the time executing.
So, what options do we have for such a lock?
We could use table locking. However it's a bit complex as you have to lock ALL the tables, thus expanding logic even to operations done from phpMyAdmin. Sucks. Another option is to lock only stuff we need, when we need it, but my collection of cursewords is not big enough to tackle this task.
Transactions? Good luck implementing them in spaghetti code.
Operating system? Simple, however not robust and hard to scale. For example, if PHP scripts abnormally terminates, lock may not be released. It is released when apache worker process ends, however this is not by rule the same moment request ends. And how to implement multi-machine locking?
Custom locking daemon? Script connects to it, requesting for a lock. Scalable: Can be. Timeout: Could be done. Robust: Yes (if PHP script terminates, it closes connection, thus releasing a lock). Simple to implement? No.
What else?
MYSQL TO THE RESCUE <3
Tries to obtain a lock with a name given by the string
str, using a timeout oftimeoutseconds. Returns1if the lock was obtained successfully,0if the attempt timed out (for example, because another client has previously locked the name), orNULLif an error occurred (such as running out of memory or the thread was killed with mysqladmin kill). If you have a lock obtained withGET_LOCK(), it is released when you executeRELEASE_LOCK(), execute a newGET_LOCK(), or your connection terminates (either normally or abnormally). Locks obtained withGET_LOCK()do not interact with transactions. That is, committing a transaction does not release any such locks obtained during the transaction.
So, why not use database instead of custom locking daemon? Acording to MySQL documentation it provides all the locking primitives we need. They are simple to implement, scalable and robust.
When we open connection to database, we request a lock. If another request already has a lock taken, we simply wait untill that request finishes execution. All done by MySQL.
This is wonderfull, as now we have solved issue of thread-safety. By making application single-threaded. F**k.
GOING MULTI-THREADED.
So, now that we have solved problem of updating events with one lock, we can go and think how to make this scale.
As said before, handling of events is done at page hit. And takes about 50-75% of scripts execution time. The rest is just plain reading from database and it's about 30 queries and relativlly small processing time.
So, under current model one script gets a turn, and each one processes events even tough script before it already has checked them and processed them.
Isn't this a waste of time?
Checking when the last processing was done and if time difference is too small, simply do not run this processing? Could be done...
I've used a bit different approach. We first check if lock is already held, and if it is we skip the processing of events and go straight to displaying data for user. Since we're only reading at that part it doesn't matter if we get maybe a little bit skewed picture. MySQL itself will make sure that SELECT returns correct data if UPDATE hits database at the same time. :-)
Written in pseudocode it's like this:
if (not lock_held()) {
acquire_lock();
do_event_processing();
release_lock();
}
do_stuff_for_user();
Or how it's seen from profiler:
SQL QUERY PROFILING INFORMATION (37 QUERIES):
- 0.0000000000000, CONNECTED TO MYSQL: travian@localhost
- 0.0000000000000, AUTOMATION STARTUP!
- 0.0004410743713, SELECT IS_FREE_LOCK('travian');
- 0.0000000000000, LOCK ALREADY HELD, SKIPPING!
... snip ...
SQL QUERY PROFILING INFORMATION (123 QUERIES):
- 0.0000000000000, CONNECTED TO MYSQL: travian@localhost
- 0.0000000000000, AUTOMATION STARTUP!
- 0.0005500316619, SELECT IS_FREE_LOCK('travian');
- 0.0002398490905, SELECT GET_LOCK('travian', 3);
- 0.0005450248718, SELECT uid FROM s1_deleting where timestamp < 1327889875
... snip ...
There, scalable system, and i've added just few lines into application init code.
How good does it scale? Verry well, actually. Interesting, because of this, system is far more responsive under medium load than under no load at all.
WHAT KIND OF SORCERY IS THIS?
It's in overall more resposive under medium load than under no load. Why?
It all falls down to how big is your chance of acquring a lock. Under medium load, there are many threads competing with each other for a lock, and thus your chance of geting a lock are far lower beacause of that.
Under no load, there is no-one to compete with for a lock, thus lock is automatically yours and with it all the burden of event processing.
This still means that there is always a request that is going to take a short and has to carry a burden of processing events for entire server. But with lot of requests competing with each other, the chances of the poor guy being you are lower. Thus, you get a feeling that application is more responsive. At expense of someone else.
The basic idea behind it is that if someone is already processing events, there is no need for you to wait for your turn to do the same, since any events that does happen are going to be already processed by whoever is holding a lock. In your turn you would be only processing empty tables, and wasting CPU cycles. Thus it is more efficent to just go and server player with the data he requested.
HOW ABOUT FEW 100 EVENTS EVERY SECOND?
Does it scale that far? Yes and no. On servers of that size, the delay of let's say a second or two to process so many events wouldn't probably too disturbing due to sheer ammount of active users and requests.
Just every once in a while page would load a lot slower than usual. But with so many players to achieve this size of events to be processed, your chances of having a bad luck and actually getting a lock are pretty slim.
This is not perfect, magic solution that solves everything. If application is poorly written, you can't do wonders. But you can do a lot to make the problem much less noticable. :-)
Now excuse me, while I configure Squid proxy. I'm adding another node to the game.
And if someone wants to play (you get free gold!): http://travian.ignorelist.com/
Filled under: None
January 11, 2012
MIC Interpreter.
3 Comments >>
MIC-1 is a microarchitecture defined by Andrew S. Tannenbaum in his book Structured Computer Organisation. Here at UNI we took a look at this architecture as during some classes...
As joke, I've asked during classes what do we get if we manage to write an emulator for it. Yes, entire micoarchitecture emulator. Luckuly, it's extremly simple, so it isn't hard to write one, however it does require a little bit more understanding of microarchitecture itself as it's for example solving some simple tasks. Anyhow, I got promised an 10 for practical part of this class (or A).
Needless to say, i've was more interested in brainfood that this task provided, rather than actual grade, however, well, a prize was nice (especially since this is a class that verry few gets such good grades - usually between 0-3 per year).
Grabbed a beer, wrote an interpreter, and later tackled the bigger problem. Writing an compiler for it. Since my compiler writing skills are nonexistent at best, the code of compiler pretty much reflects that. :-)
In any case, i'm putting this online ... Don't really have a good reason, just beacause I can probably.
Click me gently, please!
Filled under: None
January 5, 2012
Searching firefox browsing history like a boss, or otherwise known as using SQL.
0 Comments >>
So, I had need for some specific site I found whie ago (parsing related). I could not however remeber it's URL, and browsing history wasn't much help eiter as I couldn't remember site nor date when I last opened.
All I remembered it was personal site of somebody and that file I was looking for had an .c extension. And finding that site trough ~82.000 sites in my browsing history is not exactly something I would be excited about.
Sooo, is there any better method to search firefox history data rathen than what user interface that was provided by Firefox? Wooot, it is :)
Firefox stores most of it's data in SQLite database. In case of history, that's pretty much file: ~/.mozilla/firefox/blablablabla.default/places.sqlite
So, if we have SQLite tools installed that's all we have to do:
$ sqlite3 ~/.mozilla/firefox/blablabla.default/places.sqlite
And then we can execute almost any query we can imagine on that database, thus fucking around with history in any way we please. Like so:
n00b ~ >> sqlite3 .mozilla/firefox/x9fhowq6.default/places.sqlite
SQLite version 3.7.9 2011-11-01 00:52:41
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> SELECT COUNT(*) FROM moz_places;
82810
sqlite> SELECT url FROM moz_places WHERE url LIKE '%~%.c';
http://XXXXXX/~XXXXX/XXXXXXXXXXX.c
sqlite>
Ahh, finally found that piece of code.
Oh, PS: Happy 2012.
Filled under: None
December 14, 2011
Google doesn't simply walk into Mordor!
0 Comments >>
This one made me trully laugh! Google devs, do you have too much time again?

For those who don't know what's so funny: http://knowyourmeme.com/memes/one-does-not-simply-walk-into-mordor
Filled under: None
December 1, 2011
Streaming ALSA audio over the network.
0 Comments >>
So, my speakers exit at my stationary computer is pretty much fried. Kaput. Dead. There is nothing wrong with sound card, but just contacts at the exit are fried.
And my stationary computer is the only computer that holds vast ammount of music and i'm not too impressed at an idea of copying all that to laptop or mini (EEE PC). Not to mention the games. I want to play games as well.
So, what about transmitting audio data over the network? Bandwith should not be a problem. Does the linux has required infrastructure to do this EASILY? I don't want to install additionall software, thank you. It turns out it is.
Many sound cards have a so called loopback. This means that everything is meant to go out to speakers output also get's routed back into recording input, as if it came from external source. You just have to configure it properly.
On linux this is piece of cake. Just run the folowing commands, it will set Caputre source to mixer (configure loopback).
CONTROL="`amixer controls | grep 'Capture Source'`"
amixer cset "$CONTROL" "`amixer cget "$CONTROL" | grep "'Mix" | head -n 1 | sed -e 's/.*\#\([0-9]\).*/\1/'`"ΕΎ
Great, so sound card is now configured. Now fire up terminal at destination computer (computer where you will stream sound to), and type in the following:
nc -nvvlup 1234 | aplay
We're using UDP as a transport protocol, beacause of it's more suitable for low-latencies (if we lost few microseconds of audio, or few microseconds of audio are mixed i doubt you'll hear it) than TCP.
On source computer puch in terminal this, connect speakers to target computer, press enter and stand in awe on how easy it was.
arecord -t wav -f cd | nc -nvvu <target computer IP> 1234
This will transport audio over network in CD quality. It requires around 170KiB/s of bandwith. On local networks this shouldn't be a problem. Also play with arecord buffer options depending on what you want (nearly 0ms latenciy or better quality).
It's probably not the cleanest solution, but it's easy to setup and it works.
Oh, i guess i don't have to mention (i will do it anyways) do not feed output to itself as you can burn your speakers.
Filled under: None
November 24, 2011
Cheating on trackers?
0 Comments >>
This is not post to describe works of trackers in technical details, neither how to actually to cheat on them. There is enough material of that on google. It's just a post to to explaing general ways how trackers works and their relation to client. This knowledge is the key (besides imagination) for creation of quite awesome and bullet-proof cheating systems. :-)
It's not that in Slovenia there is no local BitTorrent trackers. All are semi-closed and some of then actually monitor your ratio (amount of downloaded vs amount of uploaded). As with many sites in this country, security of some is really fail, but still, breaking into their servers and modiyfing database is not exactly stealthy is it - not to mention it's a call for bigger trouble than just banned account. :-)
Some may argue: "Well, there are countless programs on creating fake ratio on bittorent trackers - ratiomaster being one example". That is no fun. Besides, you actually have to keep them running to create fake results. Lame! And it doesn't even solve hit'n'run problem1.
What is a purpose of BitTorrent tracker anyways?
In a modern sense - none. With DHT, PEX and similar technologies integrated into BitTorrent clients they're pretty much relic of history. But let's forget all that...
So, you download a torrent, open it BitTorrent client, what happens next? You know what P2P means, right (tip: if not, stop reading, this is not for you)? So from where does does client get list of peers sharing same file? From tracker of course.
Each torrent has attached some-form of ID. It's usually a hash of content (called infohash) to differentiate it from other torrents. So, to bootstrap BT client says to tracker: "Hey, i would like to download torrent with infohash <something>. Do you have list of peers that are also sharing this torrent?", and BT tracker says back: "Sure, here you have it. Have fun". From tracker side his job is done. He supplied you with list of clients that also share this infohash. It's up to client to connect to swarm and get that file from others.
But from where does the tracker get this list?
From other clients who asked for the same list, of course. So to next client who will ask him for peer list it will include also yours IP. Just because you asked him too.
He is not connected to BitTorrent network in any way. He doesn't have to be.
But how does then closed tracker knows who downloaded/uploaded how much and does per-user accounting?
He doesn't. Well, he does know, but he does not know if that information is true or not.
Wait, what? You're saying they're guessing?
No. See, let's say that semi-closed tracker has a torrent named ABC.torrent. User 1 and user 2 downloads this torrent. Does the checksums match when comparing torrent that User 1 downloaded with that of User 2?
No. Why?
For per-user accounting. Tracker modifies each torrent adding some form of obfuscated user-id to the end of URL for the tracker. This enables tracker to track each user requests for peer list by looking at this user-id and connecting it with the real username at database.
So for instance if user 2 sends you a torrent that he downloaded from semi-closed tracker, you can than upload/download torrents from that tracker under his identity. You just have to extract his user-identification-hash (tip: if he's not your friend you can get him banned that way. Or just leech on expense of someone else. You just have to get one torrent he downloaded from that site. His torrent that is).
You still didn't answer. How does he do per-user accounting?
By looking at what client told him. See, client doesn't just say: "Hey, i'm downloading a torrent with infohash blabla, give me peer list". He says this: "Hey, i'm downloading a torrent with infohash blabla, i have so far downloaded 1234 bytes and uploaded 213 bytes. Give me the peer list please?". And client doesn't do this once. He does this every once in a while (every 30 minutes usually).
This is to enable public torrent trackers to give out more efficient peer list to new clients and to already completed clients (for instance: new peer gets list of seeds, while seeds gets list of peers).
Of course on semi-closed trackers he also provides his identity for tracker to identify user who made that request.
So... that's how ratiomaster works?
Yes. You feed him the torrent, he get's tracker URL and infohash and consistently lies to tracker about how much he downloaded / uploaded. And tracker have no good way of knowing wheter he is lying or not.
Ratiomaster is not state-of-the-art program. It can be with some knowledge of tracker protocol scratched together in 30 minutes or less while being horribly drunk (or high, depending on your preference).
No good way? So you're saying there is a way for tracker to weed out liars?
Yes. There is. But it's similar to heuristics. Sometimes you fail, sometimes there is false alarm.
For instance you can from timing of requests and difference between uploaded/downloaded values calculate the the average speed with which user is downloading/uploading torrent. If that speed drastically change, it can be sign of ratiomaster.
They can also compare values given by all other users to find out that there is someone lying. The problem is, they can't figure out WHO is lying (since they don't know who to trust and who not - only thing that is known is that numbers mismatch). Of course this is not a problem if torrent is not active much (ie: almost 0 transfers happening) - it's easy to find who lies. ;-)
Only positive way of finding the liars would be to know what is actually hapenning inside the swarm - in other words to participate in it. But this would eat out resources of tracker, it's bandwith, make code 1000x more complex2 and put it legally questionable position.
None of these solutions are of course effective against well motivated / bored programmer.
Just ... How does tracker detects hit'n'run?
With imagination: it can't.
By standard: Easy.
You see, when you shut down your client, it also does one final request to tracker (even if it did one second ago). It essentially states: "Hey, i have been sharing torrent with infohash 1234 and downloaded X and uploaded Y bytes. But i'm stoping now, thank you".
This is to notify the tracker that his address is no longer valid and it should not be handed out anymore.
Private trackers can compare downloaded value and uploaded value on quit message and if they're out of specified values it can be registered as an hit'n'run.
So... What can i do with this knowledge?
Create no-torrent-does-affact-my-ration-ever-and-never-detect-hit-n-run-while-actually-downloading software. Out of the box thinking and searching for tracker protocoll specification is left as exercise for reader. :-)
You mean torrent client?
Oh god, no, that would be horrible waste of time.
What then?
It involves two words: Proxy & caching. And rewriting requests. That's four words!
Two sounded more drammatical, didn't it? :-)
Does it violates standards?
Obviously?
Share?
As I said before, figuring out the solution is left as an exercise for reader. Use some imagination for christ's sake. ;-)
1: Hit'n'run is when user closes their client immediatly after download is finish.
2: The primitive tracker software can be coded in less than 50 lines of PHP code. Been there, done that. Writing a tracker that will participate in swarm is another story.
Filled under: None
November 22, 2011
I can't help myself.
0 Comments >>
I should have stop screwing around and actually do something else.
It's a program we had to write at UNI. It's something like FTP, just with an exception that the protocol (we had to write server too) is extremly ugly. Ugly. No, seriously, ugly.
But, i couldn't help myself to include at least an stupid ... Well... Additional feature to annoy ususpecting user (user being an professor during my defence of this assigment)...

Fortunatly it didn't affect my grade. It was still 100%. Even altough it was half of the program written drunk and the other half with a huge hangover.
Oh, btw. mono rocks. I don't have to run my VM to do assigments :-)
Filled under: None
November 10, 2011
Just another reason why i love linux.
2 Comments >>
Apart from ability to fuck with it in any way I imagine. It's their amazing sense of humor developers retain even in situations where you should normaly panic.
I'm having actually two instalactions. Both of them are Arch linux. One is for regular use, another is for fuck-around, and if i want to see if GNOME3 has become more usefull since it's first release.
So, i was doing update of real installation from my testing setup (i'm writing this from GNOME3). No problem, just mount requred filesystems and do a chroot. But i forgot to mount /boot.
This is what i got:
(25/38) upgrading linux* [######] 100%
WARNING: /boot appears to be a seperate partition but is not mounted.
You probably just broke your system. Congratulations.
I still can't stop laughing. Brilliant.
Oh, and i'm slowly starting to like GNOME 3. It still has some stuff to do since 3.0, but progress in 3.2 is evident.
* linux = kernel.
Filled under: None
October 30, 2011
Setting up MCP with Minecraft Forge on Linux.
0 Comments >>
It's more note to myself, but here it goes:
In case if you fail to sucesfully install MinecraftForge onto MCP (Minecraft Coders Packs) under linux...
1.) Install MCP, then cd into extracted folder.
2.) Get a Minecraft.jar with installed Modloader and Modloader MP, place it in jars/
3.) If your default version of python is not 2.7, do this: sed -i 's/python/python2.7/g' *.sh
4.) Run decompile.sh
5.) unzip ../minecraftforge.zip
6.) cp -a forge/src/* src/
7.) cd src;
8.) find -name '*.java' -exec dos2unix \{} \;
9.) patch -p2 <../forge/minecraft.patch
You're done. Part 8 is important, otherwise patch may fail horribly. And entire recompilation aswell.
Oh, forgot to note. I'm staying on 1.7.3. I'll maybe upgrade someday to 1.8.1 - not beacause of features, beacause of new lighting engine. But i am not going to play 1.9. At least if list of upcoming features doesn't change... Or if they don't promise higher block IDs. Well, screw it, i'm rather backporting interesting mods.
Filled under: None
September 5, 2011
When the cat is away...
0 Comments >>
... the mice shall play...
It's quite funny how this applies to Minecraft SMP (Survival MultiPlayer).
Well, griefing is qute a problem on Minecraft servers. For those unfamiliar, since Minecraft is a sandbox game, this means that anyone can do anything. Including to other people creation.
It doesn't have to be destruction. It can simply be entering their house, without permission.
Of course, there are countless plugins for Bukkit, to combat this, but I found them quite CPU/Memory intensive. So i'm running a logging tool HawkEye, which logs into database, everything users do. Everything. It's inspiration was another Bukkit plugin called BigBrother1. So now you get idea, how monitored players can be.
Of course i had warned players that they are beign monitored. Not with big, red, screaming letters, but if they have read the rules, they are informed.
But it logs a SHITLOAD of stuff. And shoveling trough it is inpractical. So, i took a five minute break from my studies, srached together a simple script that takes as an input a center where players are mostly located and then finds all anomalies in their movement.
I'm ashamed to show code in this state2,3, however results are quite interesting >:-)
1: The one from 1984 probably.
2: Had crashed server once due to pushing MySQL and itself too far.
3: unset(), free() and similar are your good friends!
Filled under: None
August 17, 2011
Did rm -rf / and we're still working!
0 Comments >>
Let me just start with one quote:
Multitasking: The ability to screw up several things at once.
And since Linux is multitasking operating system powering from Laptops to Supercomputer down to mobile phones, it also have a secret (pssh, i didn't tell you that!) ability to fuck up several things at the same time! It happend by my input, but it still crashed pretty hard.
So, I have a nightly backup cycle with rdiff-backup. It creates incremental backups to save disk space. Every month, a new set of backups from scratch is created (old ones are just compressed into .squash filesystem and left lying on disk, untill I decide their faith1), just in case IF something somwhere fails.
So I was merging some old backups and I had them lying on another machine, simply beacause it operation demanded more space than i had available on server. Some NFS + /mnt magic, no problem. The job is done, files are copied back now it's time to wipe NFS mountpoint.
Since I was already doing several things at once, i cd'ed to /mnt (well, i thought i was) and typed:
nohup rm -rf * &
So, rm is remove, r is for reciursive (enter subdirectories) and f is for force (don't ask, just wipe anything on your way). Since I managed to mistype cd /mnt i was still stuck in the original directory. And this was root ( / ).
In other words, rm started to wipe EVERYTHING on the server itself. So, I had few chats open, write a response to them and go and check back to run htop ... Wait, wut?
:: /# htop
htop: error while loading shared libraries: libm.so.6: cannot open shared object file: No such file or directory
Wait, wut? Well, i think I did some screwing up with this package (i'm trying to run minecraft server and this was one of dependencies), well, let's reinstall it...
:: /# pacman -S libm
-bash: /usr/bin/pacman: No such file or directory
Wait, wut? What is this? Check $PATH. It's OK... Is it there? Wait, what?
:: /# cd /usr/bin/
:: /usr/bin# ls -l pacman
-bash: /bin/ls: No such file or directory
And then all of a sudden:
:: /usr/bin#
Connection to teh.zupa.cow.sez.mooo.com closed.
n00b ~ >> ssh -l root jbox
Connection closed by 93.103.205.91
At that point everything became clear. The system was being wiped. I had to think quick, so I jumped off my chair, threw myself across the room, trying to hit the power switch on server's PSU and turned it off. Got some cuts from landing in the process :-)
OK, so we stopped rm. Now what?
Well, let's dismount the server, take a disk out (altough i'm using custom kernel, it's a generic build so it will work on any x86 machine) and put it into my PC to see the extend of the damage2. It get's hooked up, booted and i get greeted with a nice hangup and only GRUB at top left corner3. Great, so at least 512 bytes of the entire system is still intact.
I popped in Ubuntu flash drive, booted, now what? Well, since root and boot partitions were probably wiped, there is no harm in doing fsck on them. Just to be on the safe side.
How about a data partitions, where backups are held? Should we run fsck? Common sense tells us: yes. But what if the backup procedure on it was runnig at the same time the wipe happend? This means that fsck could also wipe our (probably) damaged local backup. I also have offsite backups, but they're for emergency cases and getting to them is not an easy task4. Mounting a damaged filesystem could actually give us good info as to what is damaged and what not.
So i mounted it, despite mount.ext2 protests how i really should do fsck first. Checked root partition aswell. Half wiped. /usr almost completly gone, no signs of /lib or /bin or even /sbin. /boot was empty, that would explain GRUB's hanging.
So, how about backups. Looks like they're intact. At a glance. It was damaged, but not to unrecovable extend. Here's the stuff that got screwed up in the backup and was wiped by rm. Of course i couldn't have known about these missing files, if i would do fsck on this partiton before mounting.
- Kernel modules: Half of them missing and ext2 driver complaining when tried to access them. Not a big thing, i haven't updated kernel for a while so i can get them from old backups
- Locales for french, romanian, hungarian and some other languages. Won't be missing them.
- Few man pages. Not the end of the world, kinda anoying, will be fixed at next update of affected packages.
- Munin! Oh noes, my yearly server graphs are gone! Atleast that's what i thought. rm haven't got time to reach them, so restore from backups wasn't even necesary.
- pacman (packet manager) database got a bit screwed up. Lucky me, i'm running arch as an OS on server. Pacman is using verry simple filesystem layout for package database, so i just copied corrupted files from my laptop.
- Few unimportant logs i never bother to read.
- Rdiff-backup missing files and some diffs got wiped. This means that entire backup for this month is in inconsistent state. Better to start from scratch.
So i just copied all these stuffs from backup, wiped this month worth of backup (it's inconsistent anyways), rebooted and... Well, my PC booted up the same way as my webserver.
Heard BIOS beeps. OK... Heard GRUB beeps, few seconds later light for hard-drive started blinking. So far so good... And finally long 5 seconds beep indicating that server is booted up and running. Done some screwing around, works fine. So just popped disk back into server, hooked server back to the power and we're back in buisness.
Lessons leared:
- BACKUPS can save your ass! DO THEM.
- Offsite backups are even better. DO THEM BOTH, CLOUD FTW5.
- When SSH starts to fails you, power down the server. SSH (by my experience) is one of the last things that will fail on collapsing server. The only reason why I haven't lost the backup partition was beacause I reacted quickly enough when SSH stopped working.
- And most important: Linux is good at multitasking. You are not. ;-)
And nothing of value was lost. Except for my uptime record. I still feel sorry for it, now i'm gonna have to wait for another half a year...
Downtime: 60 minutes... Not bad for doing some other stuff while restoring backups, eh ? ;-)
1: I haven't deleted any of them, just merged to further save space. Yes, this means i still have entire system backups from, let's say 23. april 2010.
2: Server is completly headless - yes without any graphics card - so I have no way of knowing what's happening to it untill we get past GRUB. I have few spare cards lying around but I was in no mood for digging trough my drawers.
3: For any troubleshooters: if you get ONLY that this means that grub was unable to load stage1.5
4: Encryption and stuffz. Last thing i need is a quest for private keys. I do have them, but they're not stored in place easy to get to ;-)
5: Just don't forget to encrypt your backups before uploading it. No, ROT13 doesn't count as encryption. And don't store your private key on your gmail.
Filled under: None
July 17, 2011
IPTables MAC filtering and Rickrolling.
3 Comments >>
So, in some posts back i've described how to setup unprotected Ad-Hoc Wi-Fi network for puropose of simple connection sharing. I used the same setup at UNI, untill someone started to use my WiFi for downloading porn1.
So, what should I do to stop him? Lock the WiFI? But, that's no fun! Run a sniffer and send him a screenshot of his Facebook chat hisotry and profile settings? Already done, it's far more fun to watch when done in person. What else? Rickrolling! Yes! A bit chewed, abused and old, but it's a first thing that crossed my mind.2
Yes, let's do it! Since we've already setup our computer as gateway to the internet, we can pretty much do anything with traffic that passes trough us. Drop, edit, etc... The same effect can be done for instance if we're not actually the gateway to the internets via ... ARP spoofing for example.
So we download an webserver (we're going to host a copy of video ourselfs, since we're going to block access to outside network), let's say Apache.
OK, we download video3 from youtube in flv format, then we grab a copy of FlowPlayer or any other flash based players. If we're a big fan of HTML5 we can use just video html tag, but at a price of less support from browsers.
So the plan is the folowing:
If a computer from allowed mac addresses (my netbook) wants to access interents
it's traffic is passed trough
else
if it's tcp & port 80 (http)
rewrite destination address to our IP
else if it's DNS traffic
pass it trough
else
drop it
So request for http://www.facebook.com/troll/spam/eggs/ will be routed to to http://127.0.0.1/troll/spam/eggs/ transparently. Simple .htaccess file will help us fix problems with any 404s we might hit during the way.
ErrorDocument 404 /index.html
index.html is of course our mashup of flash, HTML, flv video and Rick Ashley. Looks like this (slashes before paths to file are important : not flow... but /flow...):
<html><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<script type="text/javascript" src="/flowplayer-3.2.6.min.js"></script>
<title></title>
</head><body style="background: #adb2d8">
<center style="font-family: Verdana">
<br />
<a href="/rickroll.flv" style="display:block;width:648px;height:430px;border:1px solid #00f" id="player"></a>
<script>
flowplayer("player", "flowplayer-3.2.7.swf",{
plugins: { controls: null },
play: true
});
</script>
<div style="color: #005; padding-top: 20px; width: 648px;">
<b>The internets:</b><br />
<span style="font-size: 1.3em; font-weight: bold;">Serious business.</span>
</div>
</center>
</body></html>
Then comes the IPTables part. First, we're going to allow known "nice" MAC addresses trough. Then, we're going to redirect TCP:80 (http) to our router itself. It can also any other site on the interents, but it must respond well to different Host: than expected. This means, no, shared hosting is out of option.
We're going to block any access to the outside network. Anything that isn't TCP:80 or UDP:53 (dns) ain't comming trough. DNS must still be maintaned functional otherwise no client machine will be able to setup any connection anyhwere. We can also setup local DNS server and drop DNS altogether... But how to drop IPv4 traffic in PREROUTING table? -j DROP doesn't work. Redirect to nonexisting IP adress does and it has same efect.
It looks like that (replace wlan0 and MAC adress with the intefrace of internal network):
# Allowing traffic from whitelisted MACs pass trough.
# One of those pear each allowed MAC.
iptables -t nat -A PREROUTING -i wlan0 -m mac --mac-source 00:25:A3:78:BC:3D -j ACCEPT
# Redirecting TCP:80 to router loopback (127.0.0.1)
iptables -t nat -A PREROUTING -i wlan0 -p tcp --dport 80 -j REDIRECT
# Allowing DNS traffic to google public DNS ONLY.
iptables -t nat -A PREROUTING -i wlan0 -d 8.8.4.4 -p udp --dport 53 -j ACCEPT
# Redirect any other traffic to invalid adress, efectivly droping it.
# Only way attacker could have unobstructed access to internet would be
# to use IP over DNS (NSTX), or to take MAC of any computers on whitelist.
# Both choices bring it's own set of new problems.
iptables -t nat -A PREROUTING -i wlan0 -j DNAT --to-dest 0.0.0.0
And we're done. Just cat /var/log/httpd/access.log every once in a while and laugh at people's confusion. End result looks something like that. WWW pack can be downloaded here.
Just to get one thing straight. Trough this post i used a word "router" a lot. Router is meant as any device that pass packets between network. It's probably your laptop, if your laptop provides access to internet.
Disclaimer: You may rickroll yourself during setup. Author did this to himself 2 times during writings of this post.
1: Among other things.
2: A not yet executed on a grand scale. At least by me.
3: I seriously hope you didn't click on that link.
Filled under: None
July 15, 2011
CleverBot vs CleverBot showdown.
0 Comments >>
So, for all of you who haven't yet heard of CleverBot, it is an A.I. chat bot. It claims to be pretty clever (as it's learning from real people) so I was wondering how it would end up, if two of CleverBot were talking to eachother.
However, at the site they tried to do anything to make automatic interaction with bot as pain in the ass as possible. But hey, i'm a geek, why should this stop me? So I fired up Firefox, Firebug and Kate and started to reverse-enginner their JavaScript powered front-end.
The problem is that you can't just emulate a browser at HTTP level and send everything as POST, as you actually have to run some JavaScript doing some arithmetic magic and send the result back to server. Nice.
So after trying to figure out what the fuck do they expect from me, I gave up and started to fuzzy the shit out of that code. I've learned that i've wasted 2 hours trying to figure out what it does, since ... Well, it's output was almost always the same. Hell, it could be emulated with 1 line of PHP code. Neato.
So, I got 50 lines of code fo communicating class and... Well, about 10 lines for part that actually makes two bots talk to eachother. Great, now what? People tried to troll CleverBot for years, pumping so many bullshit into it, so ... It would be awesome if CleverBot actually managed to troll itself.
So I created thread1 on /g/, published the code and almost fell of the chair as I saw how CleverBot can actually sometimes produce meaningfull result, sometimes simply hillarious, or even naturally come to "singing" parts of a song when talking to itself. Oh, and it proven once for all that computer can have cybersex2 with another computer (or itself). Note, that this was excusevly bot-bot communication without any human interaction.
So, for all you who'v actually missed the fun, i've setup a site where two cleverbots talk to eachother. It's over here. As someone dug it out and posted it on 4chan it got almost completly flooded so it might run a bit slow. Nevertheless if you ever wonedered how it would look if AI talked to itself, here is you chance. Or you can just laugh and browse/search the logs.
http://n00bz.pwnz.org/cbot2cbot/index.php
Well, if you want to, you can also download the source class as well... It's no big deal, but it'll save you some hours reversing cleverbot JavaScript. Click me softly.
But for then end, here is a quote from wikipedia:
Cleverbot differs from traditional in that the user is not holding a conversation with a bot that directly responds to entered text. Instead, when the user enters text, the algorithm selects previously entered phrases from its database of prior conversations. It has been claimed that "talking to Cleverbot is a little like talking with the collective community of the internet".
http://en.wikipedia.org/wiki/Cleverbot
So based on what results it was returning on /g/ and in logs of my site ... Well, I guess that pretty much sums the internet up. For example this.2,3
EDIT: Wow, 500 downloads in two days?
EDIT2: Search over the logs online.
EDIT3: It took 48 hours for some mods at netgexupdate to rip it off. Oh, internet. I'm not suprised neither mad, just find it funny.
1: Someone actually thought it was funny to archive it.
2: Seriously, what sick fuck tougt him that?
3: I have to repeat, see 2.
Filled under: None
July 12, 2011
Automatically generating tag cloud from text using PHP.
4 Comments >>
This is an part of post from old blog. It was written as an introduction the "Rickrolling on open-wifi'n'stuffz". Should come soon.
I guess it's no big secret that i'm a inherently lazy person. And content on this blog will have to get organised sooner or later. So, how to organize it with the least ammount of effort? Categories? No! Plus, they provide no benefit with SEO. I'll still implement them, as is it tested method. But i'm still searching for something better.
Tags? Yes, tags! They are the future! Easy, you can add them to virtually anything you think, plus you can use them directly to help search enginges crawl your site. Awesome, right?
Just, they're heavier than categories, as you actually have to think what appropriate tags should be. As my laziness duty calls, i started to wonder if there is any way computer could do this for me? It doesn't have to guess right in 99,999% of cases, just to throw some basic keywords out of a given text would be nice. I can add or remove few keywords later on...
So, a simple algorithm counting ammount of number would do the trick. Apart from filtering few conjuction words for example. Or adverbs... Basically words that doesn't contribute much to the content. This can be easily solved using a dictionary.
Well, let's cound some words... This written snippet should do the trick...
function gentags($text, $minlen = 2, $threshold = 2, $maxwords = 25)
{
// First, some cleanup!
$text = strtolower(strip_tags($text));
$text = preg_replace(array("/[^a-zA-Z0-9\s]/", "/\s\s+/"), array("", " "), str_replace("\n", " ", $text));
// Include the "forbidden" words.
$forbidden = file("ignore.words");
// Count zee words!
$wres = array_count_values(str_word_count($text, 1));
// And now weed out the rats.
$words = array(); $w = 0;
foreach ($wres as $word => $occurance)
{
if (strlen($word) <= $minlen) continue;
// Look if its forbidden words. Actually ignore words shoud be more called
// ignored prefixes. But no matter.
foreach ($forbidden as $fword)
{
$fword = str_replace("\n", "", $fword);
if (strlen($fword) > strlen($word)) continue;
if (substr($word, 0, strlen($fword)) == $fword) continue 2;
}
if ($occurance > $threshold) @$words[$word] = $occurance;
}
arsort($words);
// Get the first $maxwords words
$words = array_chunk($words, $maxwords, true);
// Returning array is sorted key(word)->value(occurence count)
return $words[0];
}
OK, so let's see in pratice. For previous post, the keywords returned are:
email (occured: 8 times)
facebook (occured: 6 times)
google (occured: 5 times)
circle (occured: 4 times)
update (occured: 4 times)
send (occured: 3 times)
make (occured: 3 times)
post (occured: 3 times)
allow (occured: 3 times)
status (occured: 3 times)
OK, it guessed pretty well. How about the one before that?
code (occured: 4 times)
blog (occured: 4 times)
some (occured: 3 times)
still (occured: 3 times)
new (occured: 3 times)
not (occured: 3 times)
codebase (occured: 3 times)
complete (occured: 3 times)
OK, should do the trick. And now that that we have done most of the heavy lifting we can also have a little bit of fun and also draw a tag cloud. Why not? :-)
function gen_tag_cloud($tags)
{
// Get the value of the most occuring.
$tmparr = $tags;
$valmax = array_shift($tmparr);
$valmin = array_pop($tmparr);
// Some mumbo-jumbo, just so that tags don't become toobig or too small.
$spread = $valmax - $valmin;
if ($spread == 0) $spread = 1; // http://goo.gl/yfeVG
$step = ($max_size - $min_size) / ($spread);
$keys = array_keys($tags);
shuffle($keys);
$tags = array_merge(array_flip($keys), $tags);
$src = "";
foreach ($tags as $word => $value)
{
$fsize = round($min_size + (($value - $valmin) * $step));
$src .= '< span style="font-size: '.$fsize.'px;" >'.$word.'< /span > ';
}
return $src;
}
Th snippets and a demo are available here. It works good only in English. Try some Slavic languages (for example Slovene or Croatian) and it'll fail miserably.
Edit: updated a code to use str_word_cout as suggested by Bergi.
Filled under: None
July 11, 2011
Posting from Google+ (plus) directly to Facebook wall
0 Comments >>
So, let's say i've simply couldn't resist all the buzz around Google+, so i've got myself an invite and checked it out. I'm actually qute suprised (in positive way) how they pulled it off, if it takes off i'm giving a boot to FB. I really like the circle stuff (around most of the buzz have been in media anyways), as I have few people on Facebook, who really shounldn't view every post I make. So I usually just don't make such a post.
But I came across an interesting feature, that allows me to make status updates from Google+ that are instantly visible on Facebook as well.
Facebook allows users to update their status by sending an e-mail to user-specific e-mail address. Yup, and you've guessed, Google+ allows you to define people in circles that aren't actually present on the site. So, google will just send them an e-mail.
So, here's how to set it up:
- Login to Facebook, and then go here.
- Scroll down a bit, you should see an "Upload via e-mail". There is your e-mail to wich you can send, well, e-mails, that cause update on your wall. It's in a form of dialogXXXXXX@m.facebook.com
- Create a new circle in Google+, add only the folowing e-mail as an contact. Just paste an e-mail in it.
- Everything you now share with this circle, is also posted on Facebook.
Altough it's a nice feature, it does have some drawbacks. Currently status update must be 50 characters or less. Anything more will be truncated by Facebook. Which kinda sucks, as you have even less space than at twitter, but, hey, it's better than nothing, isn't it? :-)
Oh, almost forgot. Anyone need an invite?
Filled under: None
July 9, 2011
Knuplez is back and this blog with him!
7 Comments >>
Knuplez iz back!
First thing, i'm no longer writing in slovene. Let's try something different shall we? I've been spamming the internet for three years in slovene and it ended rather horrible. Almost working codebase, completly inconsistent content, that looked like 4chan had been on visit etc... So I consider myself a bit more grown-up1, since the times I first started this blog as a 15 years old kid. This of course means, i will not make new post every 5 minutes, as i'm not good-content-factory. :-)
Oh, did you notice it's a completly new design? Yes, it is! Minimalistic, as it should be. Well, at least I always wanted minimalistic design that didn't look like complete crap. I'll be honest tough, it's not entierly mine, just took the basic setup and attached shitload of CSS.
I can haz new blag engine too. nBlog2 and as said in my TO-DO it's codebase is resuable. Actually went as far and wrote own tiny URL Routing "framework" (a bit similar to way CodeIgniter handles things, but i don't want to carry CI baggage with it2 - framework itself is only 50 lines of code) and build blog on top of it. So I got a extensible codebase, plus a lot of code that can be reused in another projects. Suprisingly code is even smaller than with original nBlog, yet offering almost similar features. Who knows, maybe even CMS will come out of this.
Blog is not complete yet. No categories for instance. I'll crunch that code together when I get some time, still have some exams at UNI to complete. It still should be just few 10 lines, but still... And i've gotta buy myself a real domain. :)
During course of next few weeks i'll probably translate some old blog posts into english and reblog them here. With correct timestamp that is.
1: If nothing else, i've at least learned that Dire Straits is better than any other rock band i've heard so far. That counts for something, doesn't it?
2: Not that CI is bloated. It's just not minimalistic enough for my taste.
Filled under: None
June 16, 2011
Setting up ad-hoc wireless internet sharing on linux
0 Comments >>
So, here at my dorm rooom we enjoy fast 100mbit internet connection (per student, not shared). Great, but you can't actually pump that much of data trough wireless so they simply didn't build infrastructure for it.
As I have usually laptop and a netbook with me here at UNI and use both I somehow have to connect both to the internet. I'm too lazy to actually carry a switch with me, so I just connected a netbook to a wired connection, created unprotected ad-hoc wireless network and used my laptop wi-fi to connect trough it. Why ad-hoc? It's universall, all chipsets supports it.
I will not now dig into technical diffirences between ad-hoc wireless networks and infrastructure types. For simple uses as simple internet sharing, copying some files or small-scale lan-party it doesn't matter.
Setup for internet connection sharing using ad-hoc network is actually pretty simple if you're not scared of terminal. Altough I belive it can be done also easily in Ubuntu, i was never too much of fan of Network Manager1 and prefer to do it in my own way. We need to install dnsmasq first. We can install it the folowing ways (for other distributions, take a look at manual):
Ubuntu:
sudo apt-get install dnsmasq
Arch Linux:
pacman -S dnsmasq
So, now we have dnsmasq. Gret, let's set it up. Open up a text editor (as root) and edit the file /etc/dnsmasq.conf . You don't need a lot, just the folowing lines:
interface=wlan0
dhcp-range=10.13.37.50,10.13.37.150,255.255.255.0,12h
dhcp-option=6, 8.8.4.4 8.8.8.8
This will set DHCP/DNS server to hand out IP addresses in the range of 10.13.37.50-150, with netmask /24 (255.255.255.0). For DNS we'll use Google public DNS. So now it comes setting up wireless network. First we have to bring it down, configure it, and set it back up. As root of course.
# Bring the wireless interface down
ifconfig wlan0 down
# Set it as an ad-hoc with SSID of "FooCorp"
# on channel 11 (warning, channel number matters!)
iwconfig wlan0 mode ad-hoc
iwconfig wlan0 essid "FooCorp"
iwconfig wlan0 channel 11
# Bring it back up with /24 private network
ifconfig wlan0 up 10.13.37.1/24
Wait for few seconds to get it up. Aftert you can try with iwconfig command and see if it worked. Output should look something like this:
wlan0 IEEE 802.11bgn ESSID:"FooCorp"
Mode:Ad-Hoc Frequency:2.462 GHz Cell: 1E:97:D2:22:FA:92
Tx-Power=14 dBm
Retry long limit:7 RTS thr:off Fragment thr:off
Encryption key:off
Power Management:on
Pay attention to ESSID and Cell values. General rule of thumb is that ESSID must be filled with your supplied ESSID and first value in cell must not be 00. In case if it does't, it might be networkmanager. If you're using ubuntu, you're using it. Try to disable it with simple command executed as root:
stop network-manager
Now we must startup dnsmasq and configure iptables to enable network fowarding. Easy stuff
Ubuntu:
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERDAE
echo 1 >/proc/sys/net/ipv4/ip_forward
/etc/init.d/dnsmasq restart
Arch Linux:
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERDAE
echo 1 >/proc/sys/net/ipv4/ip_forward
/etc/rc.d/dnsmasq start
This comes as a router part. Now we must connect to it. Windows should be able to connect using standard setup (at least tested on 7). For Linux might also works, depending upon the mood of networkmanager. In case if it doesn't, there is a way to connect computer manually to the network.
First we must shut down networkmanager - already showed how. Then we repeat same step we used at configuring ad-hoc network on our router (same commands). We must verify if it actually connected so we just fire up iwconfig. If Cell numbers match, well, then we're in business. Finally we fire up DHCP client and setourselfs shiny new private IP address using command:
Ubuntu:
dhclient wlan0
Arch Linux:
dhcpcd -K wlan0
Note that these steps can also be used when we wan't to create a wireless network on-the-go just for sharing some files or playing games. With some colleauges from UNI we've used this method to play Flatout22 during our daily commute to UNI and back.
1: The underlying system that handles the network configuretion in many distributions. Always lefts bitter taste. Ironically i'm using it beacause KDE has sexy front-end.
2: Yes it runs flawlessly under wine.
Filled under: None