Nando's blog
About Nando Nando's blog Posts about video, movies etc. Posts about computing Posts about music Posts about literature Philosophical posts Posts about programming

What Oracle should do to MySQL

What is probably going to happen now that Sun and MySQL belong to Oracle is:

1) They are going to get a world-class boring buzzword-compliant website.

2) All this.

Java’s opinion of Python

I read some guy talking about Java, “the greatest language in the world”, just like an ugly American might say “the greatest country in the world”. He has motivated me to become an ugly Pythonista.

A Javer and a Cee-Sharper meet a Pythonista and consider Python for a brief moment.

Oracle Certified Java Programmer: ― I guess writing code in Python is quicker, but it might create maintenance nightmares later. Python is optimized for productivity and Java for maintainability.

Microsoft Certified Professional: ― Actually, the absence of great IDEs for Python, such as Visual Studio, might slow down the production of code.

Oracle Certified Java Programmer: ― “Rigid” languages such as Java make you write more code, but that code stays legible till the end of the project lifecycle.

Python apologist guy: ― You are wrong. I chose Python exactly because it is the most readable language available...

Microsoft Certified Professional: ― Nah, creating properties is very hard without an IDE. Another problem in Python is the lack of Generics. I am proud of writing in a language that has LINQ. Python is also missing a nice Reflection library such as the ones in C# and Java! That shows how much more powerful these languages are.

Oracle Certified Java Programmer: ― Yeah, I like Java because I get to program in XML. Hey, in Python you never know the type of a variable or parameter. Java is more explicit, therefore Java code is more readable.

Python apologist guy realizes the hopelessness of it: ― You are right, of course, but you can always use Hungarian Notation: s_name = ‘Nando’; i_age = 33;

Microsoft Certified Professional: ― I don’t like Python or Java. I prefer the C family of languages, invented by Microsoft, that includes C, C++ and C#.

Python apologist guy: ― Wait, in Python you can continue to type a semicolon at the end of every line.

Oracle Certified Java Programmer: ― Hmm, it must have copied Groovy. But that makes Python better than I figured...

Python apologist guy tries to ignore him: ― You can have strongly typed variables too. And braces. See this example:

# MyClass.py
# Proud author: nosklo (Clovis Fabricio)
class MyClass(object):#{
    def __init__(self, s_name, i_age):#{
        assert isinstance(s_name, str);
        assert isinstance(i_age, int);
        if (i_age > 20):#{
            print s_name;
        #}
    #}
#}

Microsoft Certified Professional: ― Hmm, the eye sees the #{ combination easily. OK, this code is readable. But I like .net because it is extremely multi-language.

Python apologist guy: ― Don’t worry, you can even have goto if you wish. Here is a VB-like alternative:

# MyClass.py. Proud author: nosklo (Clovis Fabricio)
class MyClass(object):
    def __init__(self, s_name, i_age):
        assert isinstance(s_name, str)
        assert isinstance(i_age, int)
        if (i_age > 20):
            print s_name
        #end if
    #end function
#end class

Microsoft Certified Professional: ― You can’t convince me because Python is an interpreted language, and I only like compiled languages, like C#. Furthermore, I like when enterprise libraries are ready so I don’t have to write them myself. ― Goes away.

OCJP waits until MCP is far enough. ― Never mind him, he never even understood the importance of Checked Exceptions. Hey, I see there are 2 classes in this file. This must be a maintenance nightmare! The only true way is one class per file.

Python apologist guy: ― Yeah, namespaces are one honking great idea ― let’s do more of those!

Oracle Certified Java Programmer: ― As much as I might like your language, I could never give up world-class enterprise-ready buzzword compliance and a BIG company backing us up.

Python apologist guy: ― Good for you!

Ruby developer: ― Excuse me, I couldn’t help but hear your conversation. I just wanted to say that if you ever need a dynamic language, consider Ruby ― it is more powerful.

Python apologist guy turns his back to OCJP and attacks Ruby developer:Die, heretic scum!

Schemaless Python databases

3 new persistence options for Python

As I struggle to create an object-oriented database for Python on top of Tokyo Cabinet – Pykesto – I have found two other nice persistence mechanisms:

Kineta is a dynamic, schemaless, relational database written in Python on top of Berkeley DB. It is almost done, I have tested it a bit and was impressed. It is full of different ideas, sometimes I felt it was too different from what I am used to, but if you really think about them and try to be fair, you realize these are all good ideas. Really, take a look at Kineta.

The last one – Copycat – is a Prevayler for Python. For those unfamiliar with the Prevayler concept, this is an extremely fast, transparent, ACID persistence mechanism for small databases that fit in memory. In such a case you can be free of all the overhead of traditional databases and just use your objects in the most natural way. Behind the scenes, the framework writes a log of all the operations you do. Every night or so a checkpoint is created, too (allowing the log to be reset). When the system is turned off and on again, the checkpoint is loaded and the log is applied so your objects come back to the state they were before. This might take a lot of RAM, but RAM can be much cheaper than developing with traditional databases.

Copycat is in an alpha state, but is already functional.

EDIT: One more! buzhug is a pure-Python database engine.

And there exists a B-plus Tree implemented in Python.

Corel

Comece devagar a gradualmente aproximar-se de uma situação em que raramente você precise contrariar uma convicção absoluta de NUNCA usar software cujo nome comece com Corel.

― Mas qual a alternativa?

Inkscape é bem mais barato. Senão, tem sempre o Adobe Illustrator.

Installing Tokyo Cabinet and its Python driver

Tokyo Cabinet is the fastest database available, and it comes in several flavors. Here is how to install it in Ubuntu and use it with Python.

After realizing that CouchDB is not appropriate when you know you will need ad hoc queries (which doesn’t mean CouchDB isn’t very cool), I am trying out Tokyo Cabinet.

Here is a post about it, using Ruby.

Tokyo Cabinet offers several types of database, but the version that comes in Ubuntu 8.10 is missing the fixed-length and the table database, so I had to compile it:

sudo apt-get install checkinstall build-essential libbz2-dev
# Dependencies installed; now compile:
./configure --prefix=/usr
make clean
make
# checkinstall -R    # would create an RPM package
sudo checkinstall -D # creates and installs a Debian package
# Now you have a package that you can install AND uninstall, instead of
# sudo make install

Now the library, programs and headers are available, so we can install the Python driver. The most well-known is the old pytc which also doesn’t offer all kinds of databases. tc is the Python driver we need. Here is the author’s blog.

I download the latest code from github, uncompress it, enter its directory, then:

./setup.py install

That will succeed in Ubuntu 8.10 as long as the header files are found.

Unfortunately this driver is only nearly complete. As of 2008-04-03, the table database lacks a close() method, and the query API is being finished (you cannot execute queries yet).

Anyway, what software are we going to write using Tokyo Cabinet? How about an ambitious object-oriented database for Python apps? Something to replace ZODB? I am talking about pykesto and I wish you would help me write it.

However, it will take time to get there, and we might have to start by creating a higher-level interface to Tokyo Cabinet, because right now instantiating, for instance, the table database involves passing ugly flags like this:

import tc
t = tc.TDB('test.tdb', tc.TDBOWRITER | tc.TDBOCREAT)
# The above means open the file 'test.tdb' for writing and
# create it if it does not exist.
t.put('row id', {'col id':'value'})
t.get('row id')
# Out: {'col id': 'value'}
t.close()

Another reason to create a higher-level API is that the query API for the table database also involves lots of flags. I am sure this can easily be made Pythonic. I will put the code at pykesto as soon as I have it.

Running Kurso de Esperanto on Ubuntu 8.10

Today I decided to learn Esperanto. Best way to start seems to be Kurso, a gratis program. So...

...I downloaded Kurso3.0.deb and installed it. You might have 3 issues running this program in Ubuntu:

  1. No way to start the program

Installing the program does not add a menu entry for it. So click the Gnome Panel, pick “Add to panel...”, then “Custom Application Launcher”, then type:

Name: Kurso
Command: kurso

You may also click the button at the left to choose a nice icon. That’s it, now you can easily launch the program.

  1. /usr/share/kurso/tradukoj

The second problem that might happen is a message about lacking write access to the directory /usr/share/kurso/tradukoj. Another symptom is, clicking on the Settings button does not display the configuration screen.

A solution would be to type the following in a console:

cd /usr/share/kurso/
ls -l # prints out the contents of the kurso directory
sudo chmod 666 tradukoj
ls -l

This sets write permissions on that directory. Now you can restart the program, the message is gone and you can open the Settings screen.

  1. Sounds problem

You might get no sound for playback or recording. The solution is:

  1. Install mpg321:
sudo apt-get install mpg321

b) Open the Settings screen, switch to the Sound/Internet tab, and leave it like this:

MP3 player:    mpg321
WAV recorder:  aplay
WAV player:    arecord --duration=3 --rate=44100
Browser:       firefox
e-mail client: thunderbird

If you are really paying attention, you might notice that it doesn’t make sense to pick “aplay” to record and “arecord” to play. This is not my mistake; in this version (3.0) of Kurso, they messed up the 2 labels.

There you go, fully functional Kurso.

Web dev in Python: must read

If you are a Python web developer, here is some stuff you absolutely cannot miss.

Easily start the screensaver in Ubuntu Linux

How to start the Gnome screen saver immediately

The command to start the Gnome screensaver is:

gnome-screensaver-command -a

I tried to make this happen using keyboard shortcuts first, but couldn’t find a way. So I went with the next best thing: a launcher in the Gnome panel.

Right click the Gnome panel and click “Add to panel...”. Pick “Custom Application Launcher”. Another window appears. Fill it like this:

Type: Application
Name: Screensaver
Command: gnome-screensaver-command -a
Icon: <pick any icon...>

Hit OK and now you can start the screensaver by clicking the new icon.

Tuples, lists, dicts and objects: how fast?

Comparison of tuple, list, dict and object instantiation performance in Python.

It is well-known that in Python tuples are faster than lists, and dicts are faster than objects. I wondered how much so. After some tests, the conclusions are:

1. Use a tuple when you need a tuple and a list when you need a list. A list can have items appended or removed, while a tuple is immutable. If you won’t be adding or removing items, use a tuple and your code will run a little faster.

On the other hand, if you make a tuple with the intention of being fast, and then some code has to create another tuple out of your tuple, when it could have added or removed an item from a list instead, then you are actually being slower.

2. Dicts can be twice as fast as objects (but they do much less). Object oriented programming is based on the fact that uniting data and behaviour in a single place leads to better factored, more reusable code. So go on and make classes and objects, it is the only civilized thing to do! However, note that dicts are quite faster than objects (to instantiate).

Needless to say, premature optimization is the root of all evil, so keep these facts in mind only when you are optimizing some code, not when you are writing something new.

Here is a little module, and the tests are in its docstring if you want to reproduce them:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

'''speed_test.py

Comparison of tuple, list, dict and object instantiation performance.

Usage: (results are from Python 2.5.2 in Ubuntu)


python -m timeit -s 'from speed_test import *' 'a = {}'
10000000 loops, best of 3: 0.0792 usec per loop

python -m timeit -s 'from speed_test import *' 'a = []'
10000000 loops, best of 3: 0.081 usec per loop

python -m timeit -s 'from speed_test import *' 'a = object()'
1000000 loops, best of 3: 0.231 usec per loop

This object() instantiation is almost useless, done only to get a rough idea.

Conclusion:
For empty containers, lists and dicts are equivalent.


python -m timeit -s 'from speed_test import *' 'a = (x, x)'
10000000 loops, best of 3: 0.184 usec per loop

python -m timeit -s 'from speed_test import *' 'a = [x, x]'
1000000 loops, best of 3: 0.271 usec per loop

python -m timeit -s 'from speed_test import *' 'a = {x:x, y:x}'
1000000 loops, best of 3: 0.474 usec per loop

python -m timeit -s 'from speed_test import *' 'a = TestObj(x, x)'
1000000 loops, best of 3: 1.17 usec per loop

Conclusion:
For storing 2 values, tuples are 50% faster than lists.
Lists are twice as fast as dicts, but not fair: dicts also store keys.
Dicts are 250% faster than objects.


python -m timeit -s 'from speed_test import *' 'a = (x, x, x, x)'
1000000 loops, best of 3: 0.268 usec per loop

python -m timeit -s 'from speed_test import *' 'a = [x, x, x, x]'
1000000 loops, best of 3: 0.357 usec per loop

python -m timeit -s 'from speed_test import *' 'a = {x:x, y:x, z:x, v:x}'
1000000 loops, best of 3: 0.79 usec per loop

python -m timeit -s 'from speed_test import *' 'a = TestFour(x, x, x, x)'
1000000 loops, best of 3: 1.81 usec per loop

Conclusion:
When storing 4 values, tuples are 33% faster than lists.
And dicts are 2 times faster than objects.

The difference diminishes as you store more and more values.
'''

__author__ = 'Nando Florestan'

x = 42 # we use variables because constants would be much faster.
y = 43 # you don't use so many constants in the real world.
z = 44
v = 45


class TestObj(object):
    def __init__(self, a, b):
        self.a = a
        self.b = b


class TestFour(object):
    def __init__(self, a, b, c, d):
        self.a = a
        self.b = b
        self.c = c
        self.d = d

I dream of a db4o for Python

I wish Python would have an object database that wouldn’t eat up all the RAM.

Such a project could also be defined as “Durus with queries that don’t have to first instantiate objects”.

It would need a custom file format, without using Python’s pickles, to be able to query without instantiating. And I wouldn’t mind losing some dynamic-language-like flexibility if I could have this.

I have been using the SQLAlchemy ORM because it uses very little memory, it is very powerful and very fast. It does everything you need and more. However...

For some projects, I feel the difficulties in defining SQLALchemy models (and dealing with the relational database behind it). I want to use Python types, not relational database types. I would switch to an object database, if only it would query and use little RAM.

(In Python, making a tuple is faster than making a list is faster than making a dict is MUCH faster than instantiating an object.)

If I were to start such a db4o-like project (maybe 2010?) I would have to wrap my head around pointers to objects and their activation. So at first I might use SQLAlchemy as a backend. A table of properties... belongs to a table of objects... which know their Python type. This would probably enable one to make those queries without first instantiating.

About this scheme, ronny tells me to take a look at RDF – neat for dealing with not clearly defined object graphs.

(This is not my area of expertise...)

If you are a real good, but real good Python developer, go ahead and do it! But heads up, you’re gonna face DAMN tricky stuff about reference cycles and efficient collections.

This project is for the future. Python 3.0 on it!