Gitorious, so far…
It seems that Git is getting more and more mindshare by the day, which is great because I’m loving working with it. It’s a little over three months since I made Gitorious.org public and I’ve been having lots of fun with it since then.
In particular I’m happy about the wide range of projects there, ruby based projects are in the majority there including semi-official mirrors of both the Rails, RSpec and MacRuby projects. But some python things, such as gdb-python, a bit of lisp along with some Erlang, C/C++ and two linux kernel mirrors. Good times.
The last two are interesting because they are some of the biggest git repositories around, yet they only take up about 200 megs worth of diskspace. Heck, take all the repositories combined on Gitorious, and the cache for the web frontend of gitorious.org is still bigger than those. However neither disk nor bandwidth charges are anywhere near hurting my wallet, and it’ll stay that way for quite a while. I don’t contribute as much as I should back to most of the open-source projects I use on a day to day basis at the $dayjob (or otherwise) so I consider Gitorious as my way of giving back. Long term I have some ideas that would allow gitorious to give back even more (things like this is awefully inspiring), without resorting to cheap tricks such as ads all over the place.
So far, most of my focus on the Gitorious codebase has been on stability and speed (it’s really quite snappy now I think), but also a few new features such as merge requests and searching. But soon it’s time to add some of the bigger things on the list, that’ll help dealing with managing an open source project hosted on Gitorious.
But first I want to talk about “the competition”, namely github, “competition” is in quotes because I honestly don’t see it like that (git is distributed after all), however a lot of people seem to lay it out like that whenever the two are mentioned in the same sentence. It says a lot about the workflow that git presents, that we both had the same idea and ran with it, only to release each others thing publicly a week or so apart.
But I find it slightly peculiar that a lot of open-source projects (ruby/rails projects in particular) has jumped on it, despite it being closed-source. What’s the point of being myspace for hackers (not that that’s a particular flattering comparison to begin with) if I can’t hack on it? But that’s me being seemlingly more idealistic about this stuff than most people. Launchpad is closed-source and seems to be doing well despite it being a total mess to use, and even the Apache Foundation offers their incubator projects an option to use JIRA and/or Confluence (“The Enterprise Wiki”—that cracks me up everytime). Anyway, not crying about this at night, just finding it interesting. What’s really important is that more people discover the advantages of a distributed SCM such as Git, even for internal (“dayjob”) projects, regardless of whether they host their code on a third-party server or on their own using Gitosis and gitweb, a custom Gitorious install (I hear there’s a few already) or just a plain old git repository somewhere.
I don’t want Gitorious to end up like the mess that is Launchpad, but I do think there’s a few good idaes floating around when it comes to dealing with the practicalities of running, or contributing to, an opensource project that’s worth looking into. In particular the notion of a distributed bug tracking system is too cool to pass up, even if distributed just means that I can track bugs across projects and different repositories. Imagine Jane having cloned Bobs project publicly and fixing that damn bug #2353, all Bob has to do now to fix the bug is to pull in Janes changes into the mainline repository. Boom, no need to mess around with patch files.
Having the ticket system truly distributed is of course something to strive for, but I think I’ll start with a slightly less lofty target for Gitorious and use tracer bullets from there to hit the sweetspot of a ticketing system that fits git, and humans, well.
Gitorious source pushed - and a freebie!
I’ve pushed the source to Gitorious to… Gitorious! Yurii has already made a clone of it and I think he’s hacking on some SVN mirroring he needed for one of his projects. Very cool.
I’ve also added a project called Tumbline, which is an 80% done tumblelogging application I wrote during the summer when I was really unhappy about Tumblr (where I host Application Error), they have since shaped up a bit and I’ll probably continue using them. But I’ve open sourced the application I wrote in case someone wants to use it for something, rather than it collecting dust in my ~/Projects directory.
Enjoy!
Gitorious - open source project hosting
Since writing this post I’ve slowly been implementing some of the ideas of my take on a way to do open source collaboration on the repository level, based on git in particular.
I love open source, from the things being created to the concept in itself. Project forges like SourceForge and Rubyforge are great ways to publish a project and handle the infrastructure around it, such as mailing lists, bugtrackers and tarball releases.
But they’re also filled with dead projects. Some of these projects have been forked, or are actively maintained elsewhere, but most of the time you aren’t so lucky. They’re also rather centralized in the sense that the project owner or maintainer, has to actively accept patches, or hand out commit bits, in order for the repository to stay up with developments. As a project maintainer this can be hard in the long run, particular if you’ve for some reason lost interest in the project, or are just too plain busy with other things. I know this far too well from my own opensource projects.
Distributed source control provides one possible way around this, because every clone (or checkout in svn-speak) of a repository is a full-blown repository, you can just publish your updated repository instead and if people like your stuff better they can just pull from that instead of the “mainline” repository! Likewise, a project maintainer can just pull in these changes into the mainline repository to keep the project going forward and easily accept contributions.
DSCM tools like git are great at this, since every clone is a full repository it has to be extremely good at merging any commits you make when pushing upstream, hence pulling in other commits from clones works just as well. This also means that forking is not really such a big issue, because any forks can easily be pulled back in upstream (because of the shared commit history), in fact, forking (in the essence of the word) is the only way to work with DSCM. Of course, the social aspects of forking, such as disagreements of project direction, is an entire different issue that has to dealt with on the social level.
Gitorious is a free git hosting solution I’ve built, that allows anyone to create a project, and in turn, allowing anyone to create a clone of that project’s mainline repository for their own contributions. The project owner, or anyone with write access to the mainline repository, can then pull in these changes into the mainline repository if they like what they se. Or they can provide feedback directly on commits if they’re unsure about the approach taken, or just wanting to communicate something.
I’m hoping it will be useful for git users and I’m very interested in seeing this being used and hear peoples ideas for improvements.
I’ve got many more things I want to do with Gitorious, an improved repository browser and better ways to communicate with contributers are some of the next things on the list, but what’s there today has everything to get you started.
A quick stroll through DTrace
DTrace has been getting a lot more press recently, since Apple has ported it to Leopard, it’s also been getting a lot of mentions in the Ruby community since Apple has included the DTrace providers for it. Yet, surprisingly few seem to actually use DTrace much (yours truly included really). So here’s a short intro to DTrace and D (not to be confused with the other D).
The essence of DTrace is probes, these are event handlers that fire whenever their particular event happens, you can then register interest in these probe events with a particular action, like printing it, aggregating usage counts and whatever other way you decide to use this information.
Since there’s over 450 000 probes in Leopard, there’s a lot of information you can gather and the trick is to start at a high level and drill down—“hmm, why are there 800 syscalls? hmm, what function caused this? what is it writing? what did it do right before it made the call to write()?” and so, one question leads to next with DTrace.
We can get a list of all the probes currently available on our system, by running dtrace -l, or drilling down with the -P flag
$ sudo dtrace -l | wc -l
454839
$ sudo dtrace -l -P syscall | head -5
ID PROVIDER MODULE FUNCTION NAME
17590 syscall syscall entry
17591 syscall syscall return
17592 syscall exit entry
17593 syscall exit return
So let’s start with asking what syscalls are currently being made by all the applications currently running (unless otherwise told to, DTrace will listen forever so finish it with ctrl+c):
$ sudo dtrace -n 'syscall:::entry{trace(execname)}'
dtrace: description 'syscall:::entry' matched 427 probes
CPU ID FUNCTION:NAME
1 17698 ioctl:entry dtrace
1 17698 ioctl:entry dtrace
1 17682 sigaction:entry dtrace
1 17682 sigaction:entry dtrace
1 17682 sigaction:entry dtrace
1 18258 __semwait_signal:entry Little Snitch U
1 17686 sigprocmask:entry WindowServer
1 17696 sigaltstack:entry WindowServer
The probes are specified in a provider:module:function:name format, with an empty entry being a wildcard, so asking for all syscall function entries would mean asking for syscall:::entry, we could get all write syscall entries by asking for syscall::write:entry and its (function) returns by asking for syscall::write:return for the write() function.
So the above output isn’t all that useful since it’s too much information for us puny humans to parse effectively. Luckily DTrace provides means of aggregating things with the @[key(s)] notation, where key(s) is an arbitary comma-seperated list of D expressions and the value is an aggregating function like count() that simply counts the number of times something happens. So to aggregate the number of syscalls on the application name we can use execname:
$ sudo dtrace -n 'syscall:::entry{ @[execname] = count() }'
dtrace: description 'syscall:::entry' matched 427 probes
^C
DirectoryServic 2
Finder 2
...
WindowServer 46
launchd 48
natd 81
SystemUIServer 113
Adium 131
ruby 356
pmTool 584
We can even expand this to see what probe function is being called using the probefunc expression:
$ sudo dtrace -n 'syscall:::entry{ @[execname, probefunc] = count() }'
dtrace: description 'syscall:::entry' matched 427 probes
^C
Finder kevent 1
Safari gettimeofday 1
Terminal mmap 1
...
ruby select 10
dtrace ioctl 14
WindowServer sigaltstack 15
WindowServer sigprocmask 15
ruby __semwait_signal 141
pmTool __sysctl 291
Ruby seems to be waiting in a semaphore/thread, lets take a look at its current stacktrace. We can do this by specifying a predicate for our probing, think of it as a conditional. So, by only registering interest in a proble if the execname == "ruby" predicate is met, we print the stack:
$ sudo dtrace -n 'syscall:::entry/execname == "ruby"/{ ustack() }'
dtrace: description 'syscall:::entry' matched 427 probes
CPU ID FUNCTION:NAME
0 18258 __semwait_signal:entry
libSystem.B.dylib`__semwait_signal+0xa
libruby.1.dylib`rb_thread_group+0x29f
libSystem.B.dylib`_pthread_start+0x141
libSystem.B.dylib`thread_start+0x22
0 18258 __semwait_signal:entry
libSystem.B.dylib`__semwait_signal+0xa
libruby.1.dylib`rb_thread_group+0x29f
libSystem.B.dylib`_pthread_start+0x141
libSystem.B.dylib`thread_start+0x22
Yep, looks like an rb_thread allright. And that makes perfect sense since I had a mongrel running there in the background.
Let’s take a look at what Ruby providers are available (you need a running ruby process to see this):
$ sudo dtrace -l -P "ruby*"
ID PROVIDER MODULE FUNCTION NAME
19708 ruby48398 libruby.1.dylib rb_call0 function-entry
19709 ruby48398 libruby.1.dylib rb_call0 function-return
19710 ruby48398 libruby.1.dylib garbage_collect gc-begin
19711 ruby48398 libruby.1.dylib garbage_collect gc-end
19712 ruby48398 libruby.1.dylib rb_eval line
19713 ruby48398 libruby.1.dylib rb_obj_alloc object-create-done
19714 ruby48398 libruby.1.dylib rb_obj_alloc object-create-start
19715 ruby48398 libruby.1.dylib garbage_collect object-free
19716 ruby48398 libruby.1.dylib rb_longjmp raise
19717 ruby48398 libruby.1.dylib rb_eval rescue
19718 ruby48398 libruby.1.dylib ruby_dtrace_probe ruby-probe
We wildcard the ruby provider name since they’re per app specific (the 48398 part is the PID). Which is cool if you’re running more than one ruby process, so you could poke around figureing out why one is eating cpu and the other isn’t (Here’s an explanation of the Ruby probes). Let’s see what method calls are being used the most in a typical Rails request:
$ sudo dtrace -n 'ruby*:::function-entry{ @[copyinstr(arg0), copyinstr(arg1)] = count() }'
dtrace: description 'ruby*:::function-entry' matched 1 probe
^C
...
Array pop 24
File::Stat size 24
Inflector inflections 24
Inflector inflections_without_route_reloading 24
...
Hash [] 557
Hash []= 623
String to_s 723
Hash key? 4379
Here we make the aggregation list keys out of the class and the method name, which is specified as argN. args[] is an array of arguments for the probe, argN is a shortcut for that array, in this case the arguments are what the probe made them up to be (class and method name, arg 2 and 3 are sourcefile and line number), but it could also be the arguments for a function call. copyinstr() simply means “make a string out of this pointer reference”.
Back to poking around, Hash lookups and String#to_s isn’t all that interesting for us right now, but I’m kinda curious about what it is stat()’ing 24 times for a request? Let’s try and find out:
sudo dtrace -n 'ruby*:::function-entry/copyinstr(arg0) == "File::Stat" && copyinstr(arg1) == "size"/{ ustack() }'
dtrace: description 'ruby*:::function-entry' matched 1 probe
CPU ID FUNCTION:NAME
0 19708 rb_call0:function-entry
libruby.1.dylib`rb_eval_string_wrap+0x43f9
libruby.1.dylib`rb_eval_string_wrap+0x5173
libruby.1.dylib`rb_eval_string_wrap+0x23ee
libruby.1.dylib`rb_eval_string_wrap+0x34ba
libruby.1.dylib`rb_eval_string_wrap+0x1f53
libruby.1.dylib`rb_eval_string_wrap+0x2dbb
libruby.1.dylib`rb_eval_string_wrap+0x4d65
libruby.1.dylib`rb_eval_string_wrap+0x5173
libruby.1.dylib`rb_eval_string_wrap+0x23ee
libruby.1.dylib`rb_eval_string_wrap+0x149d
libruby.1.dylib`rb_eval_string_wrap+0x4d65
libruby.1.dylib`rb_eval_string_wrap+0x5173
libruby.1.dylib`rb_eval_string_wrap+0x23ee
libruby.1.dylib`rb_eval_string_wrap+0x4d65
libruby.1.dylib`rb_eval_string_wrap+0x5173
libruby.1.dylib`rb_eval_string_wrap+0x23ee
libruby.1.dylib`rb_thread_trap_eval+0x959
libruby.1.dylib`rb_yield+0x21
libruby.1.dylib`rb_ary_each+0x1e
libruby.1.dylib`rb_eval_string_wrap+0x455f
0 19708 rb_call0:function-entry
libruby.1.dylib`rb_eval_string_wrap+0x43f9
libruby.1.dylib`rb_eval_string_wrap+0x5173
libruby.1.dylib`rb_eval_string_wrap+0x23ee
libruby.1.dylib`rb_eval_string_wrap+0x1f53
libruby.1.dylib`rb_eval_string_wrap+0x4d65
libruby.1.dylib`rb_eval_string_wrap+0x5173
libruby.1.dylib`rb_eval_string_wrap+0x23ee
libruby.1.dylib`rb_eval_string_wrap+0x149d
libruby.1.dylib`rb_eval_string_wrap+0x4d65
libruby.1.dylib`rb_eval_string_wrap+0x5173
libruby.1.dylib`rb_eval_string_wrap+0x23ee
libruby.1.dylib`rb_eval_string_wrap+0x4d65
libruby.1.dylib`rb_eval_string_wrap+0x5173
libruby.1.dylib`rb_eval_string_wrap+0x23ee
libruby.1.dylib`rb_thread_trap_eval+0x959
libruby.1.dylib`rb_yield+0x21
libruby.1.dylib`rb_ary_each+0x1e
libruby.1.dylib`rb_eval_string_wrap+0x455f
libruby.1.dylib`rb_eval_string_wrap+0x5173
libruby.1.dylib`rb_eval_string_wrap+0x23ee
24
By adding the predicate of our target class and method we get only what we’re interested in, and print the stack using ustack. Unfortunately this being Ruby it’s not all that useful to us, since it’s pretty much all rb_eval-inner-ruby-runtime-here-be-dragons-stuff (I would love a ustack helper for ruby, like there is for python), that doesn’t make much sense to us. I wonder which file it’s doing this in though?
$ sudo dtrace -n 'ruby*:::function-entry/copyinstr(arg0) == "File::Stat" && copyinstr(arg1) =="size"/{ printf("%s in %s", copyinstr(arg0),copyinstr(arg2))}'
dtrace: description 'ruby*:::function-entry' matched 1 probe
File::Stat in [...]gems/mongrel-1.0.1/lib/mongrel/handlers.rb
File::Stat in [...]gems/mongrel-1.0.1/lib/mongrel/handlers.rb
File::Stat in [...]gems/mongrel-1.0.1/lib/mongrel/handlers.rb
# (output slightly truncated)
OK, so that tells us where this is happened, but not what it’s stat()’ing, luckily File::Stat sounds like something that might be doing a syscall, and we have probes for that, here’s a script that matches up the ruby function-entry with looking at syscalls at the same time:
#!/usr/sbin/dtrace -s
#pragma D option quiet
ruby*:::function-entry
/copyinstr(arg0) == "File::Stat" && copyinstr(arg1) == "size"/
{
self->interested = 1;
self->rubymethod = copyinstr(arg1);
self->rubyclass = copyinstr(arg0)
}
syscall::stat*:entry
/self->interested/
{
printf("%s from %s#%s\n", copyinstr(arg0), self->rubyclass, self->rubymethod);
}
By defining the variable interested whenever we’re in the function-entry we’re interested in, we can use that variable as a predicate for our syscall::stat*:entry (stat* is wildcarded because there’s things like stat64() as well), making it executable and running it we see:
$ chmod +x who_be_stattin.d
$ sudo ./who_be_stattin.d
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/javascripts/application.js from File::Stat#size
RAILS_ROOT/public/javascripts/application.js from File::Stat#size
RAILS_ROOT/public/javascripts/application.js from File::Stat#size
RAILS_ROOT/public/javascripts/application.js from File::Stat#size
Aha! It must be the Rails mongrel handler that checks the size of the asset files, before it sends them down the wire (it’s from a local ./script/console). Not so interesting after all, but at least we learnt a bit along the way.
Remember the ruby providers from up there? the ruby-probe one? That one is basically a plugin that lets you fire your very own probes in your app, using the (Apple shipped) DTracer class:
>> def with_my_probe &blk
>> DTracer.fire("my-probe-entry")
>> yield
>> DTracer.fire("my-probe-return")
>> end
=> nil
>> with_my_probe{ puts "moo" }
$ cat probe_my_probe.d
#!/usr/sbin/dtrace -s
ruby*:::ruby-probe
/copyinstr(arg0) == "my-probe-entry"/
{
self->interested = 1;
}
syscall:::
/self->interested/
{
/* default action is to just print it */
}
ruby*:::function-entry
/self->interested/
{
printf("%s in %s", copyinstr(arg1), copyinstr(arg0));
}
ruby*:::ruby-probe
/copyinstr(arg0) == "my-probe-return"/
{
self->interested = 0;
}
$ sudo ./probe_my_probe.d
dtrace: script './ruby_probe_test.d' matched 860 probes
CPU ID FUNCTION:NAME
1 454800 rb_call0:function-entry puts in Object
1 454800 rb_call0:function-entry write in IO
1 17598 write:entry
1 17599 write:return
1 454800 rb_call0:function-entry write in IO
1 17598 write:entry
1 17599 write:return
1 454800 rb_call0:function-entry fire in Module
# (While running 'with_my_probe{ puts "foo" }' in irb)
But more on that later. In the meantime do go off exploring your OS and applications with DTrace, you’d be surprised how quickly you can loose an hour or two just by asking “why is that doing this here?”...
CouchDb views in Ruby instead of Javascript
I’ve just pushed CouchObject 0.5 out to the rubyforge mirrors, here’s the History.txt file:
== 0.5.0 2007-09-15
* 2 major enhancements:
* Database.filter{|doc| } for filtering the on doc, in Ruby!
* couch_ruby_view_requestor, a JsServer client for CouchDb allowing you to query in Ruby
* 1 minor enhancement:
* Added Database#store(document), the parallel of Document#save(database)
Those two major enhancements are a result of my laziness as reported at the end of the last post, because now you can query your CouchDb views in Ruby instead of Javascript:
$ irb -rubygems
>> require 'couch_object'
=> true
>> db = CouchObject::Database.open "http://localhost:8888/foo"
=> #<CouchObject::Database:0x142d4d8 ...>
>> pp db.post("_temp_view", "proc{ |doc| return doc if doc[\"foo\"] =~ /qux/ }")
#<CouchObject::Response:0x13f8d50
@parsed_body=
{"rows"=>
[{"_rev"=>189832163,
"_id"=>"96193CD461168BD024B64EA367C1E0BF",
"value"=>
{"_id"=>"96193CD461168BD024B64EA367C1E0BF",
"_rev"=>189832163,
"foo"=>"qux"}}],
"offset"=>0,
"total_rows"=>1,
"view"=>"_temp_view:proc{ |doc| return doc if doc[\"foo\"] =~ /qux/ }"},
@response=#<Net::HTTPOK 200 OK readbody=true>>
Boom. The rows key there is our matching document with an attribute of foo that matches /qux/.
You just pass in anything that responds to a #call(the_couch_document) when you define your view request.
But passing around strings will make your eyes sore, so, lets just do this in pure Ruby:
>> pp db.filter{ |doc| return doc if doc["foo"] == "qux" }
[{"_rev"=>189832163,
"_id"=>"96193CD461168BD024B64EA367C1E0BF",
"value"=>
{"_id"=>"96193CD461168BD024B64EA367C1E0BF",
"_rev"=>189832163,
"foo"=>"qux"}}]
Thanks to a bit of RubyToRuby we can send along the block to CouchDb just fine.
But, how is this all done on the CouchDb side of things? It’s actually a whole lot easier than it looks; all CouchDb does when it receives a view query like the above is pass it on to whatever is defined as the JsServer in $COUCH_INSTALL/couch.ini, this is normally SpiderMonkey, but with the CouchObject gem installed it can be Ruby!
# ... # You need full, or relative to couch install dir, paths for now JsServer=/opt/local/bin/couch_ruby_view_requestor # ...
So have a go at it:
$ sudo gem install couchobject
Report issues at the tracker, or check out the Git source and have a play with it:
$ git clone git://repo.or.cz/couchobject.git
CouchObject released!
CouchObject 0.0.1 is out, fresh from the sofa. Sit down, relax and read the RDoc.
$ sudo gem install couchobject
Since the last time I’ve taken it in a slightly different direction, focusing more on getting the basics up and running. You see, I’ve realised that CouchDb isn’t really perfect as a general OODB store (though, nothing is stopping you from storing an objects attributes in CouchDb, the Persistable module still does that). I’ll be waiting for GemStone and Rubinius for an awesome OODB. Instead CouchObject focus specifically on documents as it is right now:
>> CouchObject::Database.create!("http://localhost:8888", "roflcopters")
=> {"ok"=>true}
>> db = CouchObject::Database.open("http://localhost:8888/roflcopters")
=> #<CouchObject::Database:0x65b184...>
>> db.all_documents
=> []
Creating and saving a document
>> doc = CouchObject::Document.new
=> #<CouchObject::Document:0x62708c @id=nil, @attributes={}, @revision=nil>
>> doc.engine_noise = "roflroflrofl"
=> "roflroflrofl"
>> doc.url = "http://www.thinkgeek.com/images/products/zoom/roflcopter.jpg"
=> "http://www.thinkgeek.com/images/products/zoom/roflcopter.jpg"
>> pp doc.save(db)
#<CouchObject::Response:0x4cd934
@parsed_body=
{"_rev"=>-1022899809, "_id"=>"4D91304BE683851F0E18871ADA6749D8", "ok"=>true},
@response=#<Net::HTTPCreated 201 Created readbody=true>>
Get the same document by its id, and convert the response to a document (Just to illustrate it)
>> doc_we_created = db.get(doc.id).to_document
=> #<CouchObject::Document:0x14e8c38 @id="4D91304BE683851F0E18871ADA6749D8", @attributes={"url"=>"http://www.thinkgeek.com/images/products/zoom/roflcopter.jpg", "engine_noise"=>"roflroflrofl"}, @revision=-1022899809>
>> doc_we_created.engine_noise
=> "roflroflrofl"
>> doc_we_created.engine_noise = "ROFLROFLROFL"
=> "ROFLROFLROFL"
>> doc_we_created.save(db)
>> db.all_documents
=> [{"_rev"=>1353035433, "_id"=>"4D91304BE683851F0E18871ADA6749D8"}]
Sending a raw request to the db
>> response = db.post("_temp_view", <<EOJS)
function(doc){
if (doc.engine_noise.match(/rofl/i)) {
return doc
}
}
EOJS
# Our temp view query returns a list of rows matched documents
>> pp response.to_document.rows.first
{"_rev"=>1353035433,
"_id"=>"4D91304BE683851F0E18871ADA6749D8",
"value"=>
{"url"=>"http://www.thinkgeek.com/images/products/zoom/roflcopter.jpg",
"_rev"=>1353035433,
"_id"=>"4D91304BE683851F0E18871ADA6749D8",
"engine_noise"=>"ROFLROFLROFL"}}
As you can see the API still needs some ironing out by means of more real world usage. You’ll notice that there’s no nice way of doing view of doing view “queries”. I really really want to create a more familiar Ruby DSL for defining views and sending of temporary views (like the one above). In particular it boils down to one or more of these things:
- A Ruby to Javascript converter (Like this perhaps).
- Ambition is awesome, CouchDb is awesome, sounds like a perfect match to me. LINQ me up.
- Make CouchDb use ruby instead of Javascript for views.
db.select{|doc| doc.title =~ /foo/ }
There’s a Git repository too
$ git clone git://repo.or.cz/couchobject.git
Patches? Yes please, release your inner couch potato.
CouchDb and CouchObjects
I’ve been watching CouchDb for a while, but it wasn’t until recently when it changed it transport format from XML to JSON that I got real interest in doing something with it, something I apparently wasn’t alone about.
One of the things I’m doing with it is a library called CouchObject, and one of the things it does is allowing you to serialize arbitrary ruby objects to and from CouchDb JSON documents by including a module and defining a few methods on your class:
class Bike
include CouchObject::Persistable
def initialize(wheels)
@wheels = wheels
end
attr_accessor :wheels
def to_couch
{:wheels => @wheels}
end
def self.from_couch(attributes)
new(attributes["wheels"])
end
end
The #to_couch method is the one that describes the format we want the class instances’ attributes serialized as a document in the CouchDb database:
{
"_id": "6FA2AFB684A93ECE77DEAAF52BB02565",
"_rev": 1745167971,
"attributes": {
"wheels": 4
},
"class": "Bike"
}
Our #to_couch return result is stored in the attributes key, and the class of the object is the class key, for querying purposes (_id and _rev are CouchDb document attributes).
The from_couch class method is what describes how we should set up our new Bike object that we load from the database, the attributes parameter is the attributes key from the CouchDb document. In this case we just instantiate a new Bike with a number of wheels:
>> bike_4wd = Bike.new(4)
=> #<Bike:0x6a0a68 @wheels=4>
>> bike_4wd.save("couchobject")
=> {"_rev"=>1745167971, "_id"=>"6FA2AFB623A93E0E77DEAAF59BB02565", "ok"=>true}
>> bike = Bike.get_by_id("couchobject", bike_4wd.id)
=> #<Bike:0x64846c @wheels=4>
As I started on this last night there’s still lots of little things to add, like better server and database semantics (in the above #save call, the argument is the database name and the host is hardcoded for now; not pretty).
Another thing I’ve been thinking about doing is a more formal way to describe “models”, something along the DataMapper pattern perhaps, but we’ll see if I actually need it once I get the Persistable module some more features.
Update: I’ve uploaded the Git repository here, I want to add a few things before I do a release.
Distributed SCM == Goodness
Ever since leaving Joyent in favour of Bengler back in february or so, I’ve pretty much switched all my development over to using a distributed SCM (a topic I’ve tumbled about a lot lately too).
While I’ve looked at distributed SCMs before, they never really stuck until I was forced to using them full-time; at first it was Darcs since that’s what they used when I switched jobs, coming from Subversion discovering distributed source control is a three step process;
- Denial—I don’t get it, why!?
- Acknowledgment—Ok, so doing bigger features in different branches that’s easily mergable into the mainline is really nice
- Acceptance—it’s the only way for me to work
Later on I discovered Git and now most of my own local stuff is in Git repositories.
The real dealbreaker when it comes to distributed SCMs and the open source world is pretty much summed up by Linus Thorvalds in his Git talk, where he says something along the lines of:
if you want to implement something just clone the main repository and start hacking, and if people like your stuff better they can just pull from there instead
Now, put this into the context of GForge installations such as sourceforge and rubyforge, where there are a huge number of inactive projects (for various reasons). If I wanted to hack a bit on a dead project I should be able to just register my own repository/branch (depending on SCM terminology) with the project and anyone interested in my new awesome updates could just pull from my repository instead. Or the project owner could just take a look at it and merge it back into trunk if he was happy with it. Or more commonly, just when “Bob” and I work on a big feature and we’ll just push and pull from each other without disturbing the mainline.
Most projects are made up of several smaller internal and/or external projects (frameworks, libraries etc), and most often its different (from yours) real world usage that reveals bugs in someone elses code (or maybe even your own in case of “internal” projects). Wouldn’t it be nice to be able to report these, but still have them in the context of your own project somehow so you knew when it was fixed (and other people knew it was already reported, just in a different project), add to that the multiple repositories in a project from above and you’d most definitely need a way to report, watch and fix a bug in several different places.
Launchpad seems to get a lot of this right right, but it’s proprietary and fairly tightly coupled to Bazaar, which I for various reasons dislike. But, I still want to use a issuetracker and source browser (and so on) that gets all of this distributed stuff, both for my open things and for internal corporate stuff and for anyone else who may be needing it in those settings. Distributed means giving away some control (which you never had to begin with anyway aka the forking non-issue), but it also means you loose that one-stop place to get an overview of what’s going on.
So that’s what I’m hacking on right now. It won’t be Collaboa that gets this functionality, mainly because I want to experiment a bit with this freely, but also because I don’t think my new requirements will fit well with Subversion at all. Which is mainly why Collaboa now has a new maintainer (can’t wait to see what he does with it).
I’m not really interested in cloning launchpad as such, but I do think they do get a lot of things right (and a good amount of things I don’t like/need). So there’s some similarities of concepts in my “thing”, but it’s also a lot more geared towards my needs and workflow, things learnt from building Collaboa and using other issue trackers. And most importantly; being shown how development should work by using distributed SCMs.
Dynamic page caching with Nginx & SSI
Zed Shaw mentioned this on the Rails podcast, but you see, Nginx has this ability to do virtual SSI includes from another url. The documentation on it is right here but half of it is in russian to confuse the spies. It’s pretty straightforward stuff though, so lets play around with it for a bit. Like Zed says on the podcast, this kinda stuff would really be useful in situations where you can page cache pretty much the entire page, except this little part that needs to be dynamic (like a “Welcome <\= current_user.name -\>”)
Server Side Includes where these things we all used in the nineties for sprinkling random dynamic(-ish) stuff over our homepages. Nginx has support for virtual includes that looks something like this
<!--# include virtual="/foo" -->
which will include whatever the url /foo returns straight into the document where the include is defined (you can also throw it into another SSI block if you like, as the docs say).
Our little Rails testapp for this does about 205 req/s without any caching and using render(:partial => "foo") for the “welcome” bit (I feel really bad mentioning Zed Shaw and stupid/naive statistics like the above in the same place, but the precise performance gains aren’t that important. Think big picture stuff for now).
So here’s a little helper for outputting the SSI in our template:
def ssi_include(options={})
#(options hash so we can pass in a SSI block target or whatever, YAGNI really).
options.assert_valid_keys(:url)
%Q{<!--# include virtual="#{url_for(options[:url])}" -->}
end
# and in our view we'd use it like this:
<%= ssi_include :url => {:action => "greet", :name => current_user.name} -%>
Not exactly rocket science. With that and page caching turned on, it’s slightly faster (about 250 req/s), but not that much. Chances are we can cache those fragments in memcache to gain just a bit more.
By now you’ve hopefully realized that that the actual greet don’t even need to come from rails to begin with; with a shared session storage and because Nginx forwards us the cookies to the virtual included url, we can just as well hook up a small Merb or Rack (or whatever) app to fetch the session_id and/or objects needed from the database (or cache storage) and display the correct text, all without the luggage from rails which we really don’t need just to render some tiny text fragments. Doing that in our stupid little test scenario here gives us just over 1000 req/s. That’s a bit closer to the 5K req/s that nginx does for straight up html from disk (on my local machine) than the 205req/s we started off with. Yet another thing to pull out of the olde bag of tricks when you really do need it. I’d be interested in hearing if others have experimented in practice with this kinda approach?
update: passing in the “name” querystring like the example code is about the worst example I could think of, since its static once it’s written to the cached template, but cookies and such are still go
Application Error: The Tumblelog
Since The Exciter updates are far between, I’ve been running a tumblelog for the past two weeks called Application Error. It’s powered by the fabulous tumblr (which I think is going to get huge). Expect mostly ruby-related links and other random stuff there.
Ruby has a nice new Rack
As someone who’ve used Rails and other ruby web frameworks for quite some time, plus my own dabbling in that domain, I’ve seen how we all go and redo our own webserver interface, while those cheeky python kids keep nagging about WSGI.
So, I’ve been playing with Rack recently, and it’s quite inspired by WSGI. At its core, all a Rack application has to do is answer to a message for call with the environment hash as the arguments and return a tuple looking like [status_code, headers, body_array], like this
require "rack"
class Foo
def call(env)
[200, {"Content-Type"=>"text/plain"}, ["Hello world!"]]
end
end
HOST_AND_PORT = {:Host => "127.0.0.1", :Port => 8080}
Rack::Handler::Mongrel.run(Foo.new, HOST_AND_PORT)
in fact, we could even replace that whole class with a lambda that just returns the array:
app = lambda { [200, {"Content-Type" => "text/plain"}, ["Hello lambda world!"]] }
Rack::Handler::Mongrel.run(app, {:Host => "127.0.0.1", :Port => 8080})
And we can run our marvelous application under mongrel. Now, a Rack application is basically anything that responds to #call, the nice thing about this is that we can chain Rack applications together, forming some middleware between our main app and the request being served by the browser. So if we call Rack::ShowExceptions#call before calling Foo#call, like this Rack::ShowExceptions.new(Foo.new) we get some nice views from your nasty little exceptions.
Why is this good? Because as a framework author you’d be able to reuse middleware (Rack applications) from other applications, or as Chris puts it:
Compare “That upload handler you wrote for IOWA is really great, too bad I use Camping.” with “That upload handler you wrote for Rack works great for me too!”
Rack is still a bit rough around the edges, and the API is stupidly simple (“just #call it”), however it does provide a very easy to use API.
Cabinet is a tiny little pseudo-framework I wrote while playing around with Rack last night. Knock yourself out. I think the slogan should be “10x less productive” or “typing boring stuff over convention”. Features Django inspired url dispatching, that’ll make you type lots of regexens for every single thing. “Ruby push-ups” or something like that.
Now, don’t go write your own framework just yet, unless its merely for the sake of fooling around (like “Cabinet” was), ruby already has a bunch; Rails, Nitro, Camping, Merb, Ramaze and the oldskoolers like IOWA and Cerise.
Oslo RUG: Distributed Ruby Slides
Last night I gave a presentation at our small (but awesome) Oslo Ruby group about distributed programming with Ruby.
Topics of the talk included DRb, Rinda::TupleSpace, Rinda::RingServer, tin-can telephones, Joyent’s Bingo!, set_trace_func and a brief thumper server-porn interlude.
Here are the slides (900kb). Beware though, they’re in an abomination of norwegian, swedish and english
Flashing the Nokia 770
I recently bought a Nokia 770 “Internet Tablet” as they call it. No GSM/3G, just WLAN, Bluetooth and USB. And it runs a Debian linux offspring out of the box so it’s very hackable.
And that was kind of the problem for me; I hacked around too much and in a moment of clear stupidity I changed the setuid bit on `sudo` in a freak typing accident of a chmod gone wrong. Oops. So I locked myself out of a lot of fun times, including the package manager not working, and without any ssh server installed (yet) and no clear way of making it boot in single-user mode I decided to reflash the whole thing; returning it to its factory settings. Here’s a readers digests of my approach using OSX. It’s based on things found mostly here and here.
Flashing your device
I downloaded the flasher.macosx version if the flasher along with the it2006 OS image/Maemo 2.0 image
First you may want to backup your settings, bookmarks and whatnots using the builtin backup software. Then we do a complete fresh install by flashing the device with the image:
$ ./flasher-2.0.macosx -F SU-18_2006SE_1.2006.26-8_PR_F5_MR0_ARM.bin -f -R
flasher v0.8.1 (Jun 22 2006)
SW version in image: SU-18_2006SE_1.2006.26-8_PR_MR0
Image '2nd', size 8704 bytes
Image 'secondary', size 87040 bytes
Image 'xloader', size 13824 bytes
Image 'initfs', size 1890304 bytes
Image 'kernel', size 1266560 bytes
Image 'rootfs', size 60030976 bytes
USB device found found at bus 003, device address 002-0421-0105-00-00
Found device SU-18, hardware revision 1802
[..lots of fun stuff..]
100% (58624 of 58624 kB, avg. 814 kB/s)
Finishing flashing... done
And then when the thing boots back up we have to go through those fun initial settings again (language, datetime etc).
Making a smaller initfs
OK, so that was easy, except now our initfs partition is completely full, so we can’t install packages such as “becomeroot” and other fun things, so we have to find a stripped down version if that image. Luckily people smarter than me have already figured that out. (There’s also a smaller image here but I couldn’t get it to work as expected).
The smaller initfs comes in the form of a binary xdelta diff, but I had some issues with the `xdelta` dependencies, so in the end I had to install fink, just for the sake of getting xdelta in a quick way. So:
First “unpack” the image from the device:
$ mkdir it2006-unpacked
$ cd it2006-unpacked
$ ../flasher-2.0.macosx -F ../SU-18_2006SE_1.2006.26-8_PR_F5_MR0_ARM.bin -u
[...]
Image 'initfs', size 1890304 bytes
[...]
Unpacking initfs image to file 'initfs.jffs2'...
[...]
Then we get the smaller initfs image and apply the xdelta
$ curl -O http://fanoush.webpark.cz/maemo/initfs.bootmenu.it2006.tgz
$ tar xzf initfs.bootmenu.it2006.tgz
# Apply the xdelta
$ /sw/bin/xdelta patch initfs.bootmenu.xdelta initfs.jffs2 initfs.bootmenu.jffs2
# flash it onto the 770:
$ ../flasher-2.0.macosx --initfs initfs.bootmenu.jffs2 -f -R
[...]
Sending initfs image (1537 kB)...
100% (1537 of 1537 kB, avg. 859 kB/s)
Flashing initfs... done.
And we’re laughing. But we’re going to laugh even more once we install Ruby 1.8.4 from here:
Bingo!
A few days ago we launched Bingodisk, which is a 100GB WebDAV powered disk in the sky, with a public folder. Useful for storing just about anything and serving it up (or not) to the public.
Building Bingo! is a lot of fun, the bingodisk you’ll get is sitting on a ‘Thumper’ which is just a lovely piece of monster storage hardware and the actual frontend application is a Rails application that talks to the Thumpers via a distributed interface I wrote.
The really nice thing is that it uses a WebDAV interface which means that it’s possibly to mount it as a disk in almost every modern operating system (as usual, we had to jump through hoops to get it working properly in windows, but it does). WebDAV also supports things such as resource locking, and easily moving and/or copying resources. Now, WebDAV is a set of HTTP extensions, which means that it’s easy to talk to from an application. Let’s see how we could do that from a Ruby script using Net::HTTP. Unfortunately Net::HTTP doesn’t support Digest authentication out of the box (Basic auth won’t work with Bingo), but we can fairly easily add that, based on a snippet I found by Eric Hodel.
First you’ll need to get this file. It adds a digest_auth method to Net::HTTP.
Finding properties
To get a list of the resources available we’ll use the PROPFIND HTTP method, which will return a (rather large) chunk of XML, containing locking info, resource name and meta such as size and mtime. Here’s a script that lists the files at a given path:
# list.rb
require 'net_digest_auth'
require 'rexml/document'
include REXML
abort("Usage #{$0} <username> <password>") unless ARGV.size==2
ALLPROPS = <<EOS
<?xml version="1.0" encoding="utf-8" ?>
<D:propfind xmlns:D="DAV:">
<D:allprop/>
</D:propfind>
EOS
url = URI.parse("http://johan.bingodisk.com/bingo/")
Net::HTTP.start(url.host) do |http|
res = http.head(url.request_uri)
req = Net::HTTP::Propfind.new('/bingo/tmp/', {'Depth' => '1'})
req.digest_auth(ARGV[0], ARGV[1], res)
response = http.request(req, ALLPROPS)
puts "#{response.code} #{response.message}\n"
puts
Document.new(response.body).elements.each("//D:response") do |r|
puts r.elements["D:href"].text
end
end
Apologies for the textile parsing error with the ARGV index
And the output:
$ ruby list.rb johan@bingodisk.com secret
207 Multi-Status
/bingo/tmp/
/bingo/tmp/mch.jpg
/bingo/tmp/TextMateBook-beta.pdf
So what this does is that it first requests HEAD to get the things needed for the digest auth, then we create a new Net::HTTP::Propfind request instance and use the digest_auth_ method to set the user and password from the arguments given.
Then fire off the request with a snippet (the ALLPROPS constant) of XML telling the DAV server we want to get all the props.
We’ll get back the “207 Multi-Status” HTTP request code and the XML describing the properties of the resources, on which it does a XPath query using REXML to get the filenames (the D:href element).
PUTting resources on the disk
Now let’s upload something, as expected we’ll want to use the PUT method, here’s a script that takes the username, password and a path for a file to upload into /bingo/public/code/:
# upload.rb
require 'net_digest_auth'
abort("Usage: #{$0} <username> <password> <path/to/file/to/upload>") unless ARGV.size==3
if File.exists?(ARGV[2])
url = URI.parse("http://johan.bingodisk.com/bingo/")
Net::HTTP.start(url.host) do |http|
res = http.head(url.request_uri)
req = Net::HTTP::Put.new("/bingo/public/code/#{File.basename(ARGV[2])}")
req.digest_auth(ARGV[0], ARGV[1], res)
response = http.request(req, File.read(ARGV[2]))
puts response.code + " " + response.message
end
else
puts "No such file #{ARGV[2].inspect}"
end
By running the script we’ll upload the net_digest_auth.rb file you:
$ ruby upload.rb johan@bingodisk.com secret net_digest_auth.rb 201 Created
Nice and easy.
WebDAV might feel a bit more “bulky” than a straight up RESTful interface, but it’s really not that bad and the fact that it’s so well-supported in existing client programs is just friggin’ sweet.
On Redmond developer happiness
This little thing has been popping up in my feed reader again and again during the past week.
The post has over 400 comments, many of them from Microsoft employees. Let’s just say about 150 of those are from actual MSFT employees, and let’s say, for the sake of argument, that 100 of those are actually working on code inside Redmond.
I don’t care how small a percentage 100 employees is out of your total number, but when you have 100+ employees not happy about either their job, the way their company is run or their boss/manager; then you have a problem that will turn into an infectious disease, if it hasn’t already.
Just think about how many LOC an employee potentially pushes out over a year, or how many small or big features (end-user facing or not) they implement over that year. Now, I don’t know about you, but I produce a heck of a lot better code when I’m happy and passionate about my job than when I’m not. So in my eyes the comments on that post say they got 100+ coders turning out bad code, all because of their management.
I am glad neither my professional or personal computing experience depends on Microsoft in any way whatsoever.

