Unihan Sqlite Database

April 27th, 2008 by benny

Once upon a time I wrote an extension for Firefox where you could highlight Chinese characters and be able to right-click and “Pinyinize” the characters. It would then take the characters into Pinyin, the phonetic representation of those Chinese characters.

Now, the only way I could do that was to pretend to make a request to pin1yin1.com and parse the HTML page that comes back. That’s a silly way to do things. There should be a way/service where I could make a query, and have the results come back in a known schema, in JSON or XML or otherwise.

I haven’t been able to find one (if you know, let me know :) ) so I thought I’d see if I could make my own. I haven’t gotten that far, but I found out what database they were using. pin1yin1.com uses the Unihan database. The problem with that is is that its a flat text file where the lines look like:

<Chinese Character> <Key> <Value>

like:

U+340C kDefinition a tribe of savages in South China

It’s totally unusable in most situations so I decided to write a quick python (thanks Hila!) script to do this:

#! /usr/bin/python
#
#  A script to convert/pivot the Unihan.txt file into a sqlite
#  database.
#
#	Author:	Benny Wong <bwong.net>
#	Date:	2008.04.27
 
from pysqlite2 import dbapi2 as sqlite
 
charmap = {}
keys = set()
keys.add('Character')
 
f = open('Unihan.txt', 'r')
 
for line in f:
	if not line.startswith('#'):
		tokens = line.split()
		key = tokens[0].replace('U+', '')
		if not charmap.has_key(tokens[0]):
			charmap[tokens[0]] = {}
 
			unichar = tokens[0].replace('U+', '0x')
			unichar = unichr(long(unichar, 16))
 
			charmap[tokens[0]]['Character'] = unichar.encode('utf8')
 
		charmap[tokens[0]][tokens[1]] = " ".join(tokens[2:])
		keys.add(tokens[1])
 
f.close()
 
keystring = ", ".join(key + " TEXT" for key in keys)
 
conn = sqlite.connect('Unihan.sqlite')
cursor = conn.cursor()
 
cursor.execute("DROP TABLE IF EXISTS Unihan")
cursor.execute("CREATE TABLE Unihan (key TEXT, " + keystring + ")")
 
while len(charmap) > 0:
	key, values = charmap.popitem()
 
	columns = ",".join(values.keys())
	cells = '","'.join(values.values())
 
	sql = 'INSERT INTO Unihan (key, ' + columns + ') VALUES ("' + key + '", "' + cells + '")'.encode('utf8')
	cursor.execute(sql)
 
cursor.execute("CREATE INDEX key ON Unihan(key)")
for key in keys:
	cursor.execute("CREATE INDEX " + key + " ON Unihan(" + key + ")")
 
conn.commit()

I haven’t worked with python much, so if this code is crappy, let me know and how to fix it :) I’m seeing when I’ll have time to actually create the service (if anyone’s interested!) but yeah, here’s the basis that I’m going to be using.

You can easily port this database over from SQLite to MySQL, PostgreSQL, etc. by using “.dump;”

Enjoy!

The Irish and Keytars

April 11th, 2007 by benny

Before this year, I had never been to a concert before. I’ve gone to a few free ones in parks and whatever, but not to an actual full concert. But things change and two weeks ago, I went to two in that week alone: Snow Patrol at MSG and Justin Timberlake at the Continental Airlines Arena in Jersey. Here’s a brief recap of the two:

Snow Patrol

Considering it was a really mellow kind of concert, it was one hell of a concert. The Silversun Pickups and okgo opened…I had missed Silversun but saw okgo and they were pretty good. If nothing else, the lead singer was a riot.

The actual concert was awesome. Even though it was a more sedated concert, the environment was emitting this energy. The whole theater (in MSG, where it was held) was on their feet the whole time just listening to the powerful chords coming from the speakers. The lead singer was also pretty funny, and even brought to girls up to sing the female part in “Set the Fire to the Third Bar”.

Something I found out that I totally did not know was that the band was Irish. Did you know that? You probably didn’t. Well, now you do. But yeah, I was walking into the row and turned to my sister and said, “Yo, there are Irish people at the end of our row!” She quietly turned to me with this face that kind of screamed “Uhm…you claim to be a fan?” From that moment on, almost everyone I saw walking by had some sort of green or clover on their shirts.

Justin Timberlake

So I’m not sure what to say about this concert. Not because it wasn’t good (it was amazing), not because I don’t have stories (I have a few), but because it was just beyond words. For those who don’t know me, I would go as far as to say that I am probably one of the biggest fan of Justin Timberlakes who is both a male AND straight. True, there aren’t that many, but still, I’d like to claim that spot.

But anyway, the performance was just unbelievable. His singing was on point, even while dancing. And his dancing…ridiculous. Sometimes I wonder, is that a black man in a white Tenesee boy outfit? Probably, but we’ll never know.

Another startling discovery: THE MAN PLAYS THE KEYTAR! He already plays the piano and guitar and beatbozes, but the keytar? Really Justin? Who does that? Just amazing.

All in all, a couple of good concerts in one week, what else could you ask for?