《Steve Jobs》读后

January 10th, 2012

花了差不多两个星期读完了Walter Isaacson的《Steve Jobs》。感受最深的一个词:product。一切都是为产品服务,苹果的成功也是建立在它一代又一代成功的产品上的。

This Tech Bubble Is Different

December 27th, 2011

“The best minds of my generation are thinking about how to make people click ads,” he says. “That sucks.”

My favorite quote of the year is from the following Bloomberg Businessweek article:
This Tech Bubble Is Different

Algorithm Analysis

December 10th, 2011

From my experience, usually, only two algorithm analysis techniques are tought in college level algorithm analysis classes: asymptotic analysis and amortized analysis. There are some addtional analysis frameworks or tools that can reveal or explain other interesting aspects of certain algorithms and data structures. Here are three interesting examples.

Competitive analysis
I think the best introduction to this line of analysis is still Sleator and Tarjan’s 1985 paper, “Amortized efficiency of list update and paging rules.” It reveals why a simple list update operation can be as competitive as an optimally designed algorithm that knew the data it operates on in advance. The analysis is simple yet surprising. The paper alone is worth reading for pleasure if you did not know it before.

External memory
RDBMS is the most popular data management system in the past three decases. To my knowledge, every RDBMS includes some kind of B-tree implementation. The external memeory analysis shows why algorithms and data structures like B-tree operate so well in the world of layered storage systems. In other words, if you data does not fit into memory, you may want to use this model to analysis and design your algorithms that need to access data outside memory.

Smoothed analysis
Simplex method is a popular algorithm to solve linear programming problems. It has a exponential time worst case according to asymptotic analysis. However in practice, it works very well. The smoothes analysis offered an theoritical explanation of the phenomena.

The three examples above are interesting for me. After I graduated from college and began to work in the industry, there are times I found the code used in practice was quite different than what I thought and wrote in college. When choices of data structures and algorithms need to be made for problems, there are places algorithms with worse big Os are used, and simple and straight forward data structures are favored over sophisticated ones that may prove better on paper. However, when I look deeper and analysis in the exact context of the problem, many times those choices begin to make sense. The three examples above always reminded me when considering an algorithm and data structure for a real problem, there are usually more things need to be taken into account than simple big O notations. I think this is also the reason we still need to do simulations and experiments when we choose or design algorithms and data structures, at least for many real world problems.

Software Economics

November 5th, 2011

Strategy Letter V by Joel Spolsky

Best piece of economical analysis of software business. Though I feel the power of community and individuls’ willingness to contribute is a little undermined in this analysis. For example, I think there is something in human nature to participate in community work voluntarily. We have many real world examples of such. Joel’s own company Stack Exchange is “Free, Community-Powered Q&A” — to quote their own title.

Why Not Port Microsoft SQL Server to Linux

July 28th, 2011

I just read a blog post from a former SQL Server architect explaining their original decision making process around the issue of porting Microsoft SQL Server to Linux. I have worked at Microsoft on SQL Server for about a year now. This answered some questions I have in mind for some time, and I just can’t agree more with his writing. I can share some of my own thinking here.

Before I join database group at Microsoft, I am a heavy Linux user. I played PC games on Windows 98/XP in high school and college; My first programming course used Visual C++ 6.0 on Windows XP; and, that is all. The rest of my projects are mostly done on Linux. My first encounter of Windows 7 and Microsoft’s server line products happened the first day I worked at Microsoft. How I adapt to the Windows environment is another story. (It is much less difficult than I thought, and I really begin to appreciate the amazing WinDbg.)

Since I began looking at the SQL Server source code, an obvious question came to my mind. How much work was required to port the code to Linux? After all, I was more familiar with GNU/Linux API than with Windows API at the time. To my surprise, the core database engine actually does not have so much OS specific code. Given the Wine project, I suspect it may only took one or two months of work to do the port. Then another question comes to my mind: how much work is required to support the alternative product? My conclusion is that it is almost impossible without changing how we current organizing the engineering effort around the product. For example, it will require a lot more efforts to develop test infrastructures that matches existing test tools on Windows. The only solution I can think of is to have an entire team that dedicates to this new product. The Linux product possibly also need a new business model. Then with so many open source alternatives on Linux, I am a little pessimistic such a product can be very successful in the market.

有惊无险

November 21st, 2010

我开的车是全手动的。包括车锁,车窗,换档等等。但是锁车门不需要用钥匙,只需从里面把车门锁上然后关门就可以了。一般停车,都会先熄火拔钥匙,然后关门离开就行了。今天(星期六)晚上回家,鬼使神差,下了车,关上车门,看到车尾灯还亮着,才意识到车没熄火,钥匙还在车里。更要命的是环环相扣竟是一个死结:家里有备用车钥匙,但是家门的钥匙和车钥匙是在一个钥匙环上,小区的管理处可以开门,但是星期六已经下班了,关房门一定要用门匙所以也没有应急服务。

深秋的西雅图市郊,2摄氏度的气温。车里是温暖的,家里也是温暖的,但隔着一层玻璃,我竟然无计可施。连砸车窗的心都有了。本来都已经绝望了,准备去附近的旅馆住一晚算了。

但是不死心又把每一个车门拉了一遍。当我试到最后一个门(驾驶副座后坐)的时候,竟然,这个门是没锁的。于是钻进车里拔了钥匙,回到了温暖的家。但门开的那一刻我也意识到过去几个月,不管在哪停车,这个后门都是没锁的。我突然很多感慨:我的运气,我的粗心,西雅图的治安,等等。当然如果我信上帝的话,这事就简单了:这门估计是上帝打开的。

参观737的生产线

September 27th, 2010

因为同学的关系,今天去参观了波音生产737的工厂

这个工厂是最后完成737组装的生产线。零部件很多都是别的公司,或波音其它分公司制造。参过的过程中看到一排大木箱,上面写的是上海飞机工业公司,但不知道是什么部件。引擎是最昂贵的部件,大约占飞机造价三分之一。并不是波音生产的。现在737使用的引擎是由GE(美国)和SNECMA(法国)的合资公司CFMI生产的。

其实组装的过程并不复杂。工程师的办公室就在厂房里,机械师在组装时遇到问题,工程师可以随时下到生产线上察看。工程师的办公环境比较普通。

我的感觉是,飞机制造,最核心和困难的地方是设计。而引擎的设计和制造更是技术含量最高的部分。

毕业小结

May 19th, 2010

虽然还没有到毕业典礼正式取得学位,但到今天差不多可以说一切都尘埃落定了。最后一学期只上了一门课,却取得了最差的成绩。

我知道我可以做得更好,就好像当初高考一样。但另一方面,也像我的高三,我完全没有努力去做。可能是因为兴趣不在于此,也可能是心态发生了变化,也可能真的是能力不够。不管怎么说,对纯数学我已经是意兴阑珊了。我不能说我没有收获,虽然成绩不理想,但我觉得现在让我拿起任何一本数学书,如果我想的话,我都能够读下去。但另一方面证明一些抽象的简单结果在我看来越来越没意义。我想我可能会放弃学纯数学了吧。(纯数学,我的解释就是为数学而数学。)

虽然最后一学期非常不理想,整体来说,在美国这两年收获还是很大的。学了想学的知识(NLP,machine learning),也发现了新的兴趣(database,theory)。虽然没有实现当初继续读博士的计划,但毕业前很久就已经找到非常理想也符合我兴趣的工作。面对未来的方向,迷茫少了一些,信心多了一些。一步一个脚印走下去,我觉得我能走出属于自己的通向成功之路。

HFT Seminar

May 7th, 2010

今天上午听了一场关于HFT的seminar。HFT在美国流行到什么程度呢?今天听说超过60%的交易量都是来自HFT。其中大约一半是投资银行和造市商,一半是独立机构。时间很少,而且因为牵涉到商业机密,很多具体的地方都语焉不详。我的有几点印象。可能不准确的地方也很多。

  1. 交易系统本身架构很简单。
    • 输入是交易数据。也有公司利用新闻数据,不过会复杂很多。
    • 第一层处理是把数据转换成自己的格式。
    • 处理完之后计算模型在一个系统。这个系统是一个整体。包括了交易模型、风险控制、自动下单等等。没有多层layer。他提了很多次,多层架构对他们没有意义。
    • 在HTF领域,模型相对简单。时间复杂度可能只有线性。
    • 这个系统本身的数据库是一个自己写的in memory database。(这种似乎很常见:Bloomberg说他们的新闻处理也是自己写的in memory database。)
    • 他后来演示了一下这个数据库,binary大概500k左右,代码180k lines左右。不是SQL。用的是自己的数据处理语言。据说快很多。(这个我表示怀疑。)
  2. 套利模式主要是利用计算机的速度。和交易的规模。
  3. 每天清仓。他给的数字好像是每天不超过30million$。
  4. 他们要考虑到网络延迟。比如从chicago过来的光纤比纽约要慢10ms,欧洲可能要x10。
  5. 当他们的交易量足够大的时候,证交所或者造市商还要向他们付钱。(我的理解是因为他们提供了流动性。)
  6. 第三方提供的接口和数据很不干净。(包括.net, java, c, etc.)
  7. 他们的系统从接到数据,到计算出结果比数据和交易的latency还要快。他给的时间好像是20纳秒。
  8. 但是第三方交易系统的处理能力往往不够,所以他们的系统要把对方的处理能力也考虑进去。
  9. 这个演讲的人比我想的要聪明。中间有人提问他们用不用GPU。他大概讲了一下为什么GPU的计算模型不适用。分析很到位。之前的数据库也是一个人写出来的。
  10. 他们还写自己的java compiler,以及很多自己in house的技术和工具,包括前面的数据库。
  11. 他们招人的时候主要看这个人懂不懂一个程序在电脑里是怎么运行的。每一行程序对memory,cache,CPU的影响。
  12. 测试、容错等等也是难题。

听的过程中一个感觉就是交易所的交易系统似乎非常繁复杂乱,像个菜市场,安全也没有保证。处理这么多的钱的系统,让我觉得有点儿戏。听完以后回来,结果不久道琼斯就因为某错误交易暴跌1000点。

再加一些个人意见:刚才看了一下演讲人的Linkedin资料。他之前在Nvidia做过software engineer。这可能解释了为什么他对GPU计算理解的准确性。他之前工作的另一个公司Kx System也很有意思。这个公司的主要产品就是一个in memory database – kdb,而且它的处理语言也挺有意思:k programming language。我虽然没有用过,但是就描述上,这个数据库就和之前他演示的数据库非常像。

 

Update: 《60 minutes 》最近有一个关于HTF的报道。他们的标题体是speed traders,以快取胜也是我的感觉,与其说是高频交易,不如说是自动快速交易。

http://www.cbsnews.com/video/watch/?id=7368460n

澹泊.宁静

March 27th, 2010

诫子篇

诸葛亮

夫君子之行,静以修身,俭以养德。非澹泊无以明志, 非宁静无以致远。夫学须静也,才须学也,非学无以广才, 非志无以成学,淫漫则不能励精,险躁则不能治性,年与时驰, 意与日去,遂成枯落,多不接世,悲守穷庐,将复何及!

到美国快两年了。突然想起这篇短文,觉得这段时间除了学习以外最大的收获大概就是在一个相对隔绝的状态下思考了很多以前没有想清楚的问题。这样的状态在我以前生活过的其它地方是很难得的。澹泊,宁静。不能说完全做到,但比以前任何时候都更接近。不管将来是继续求学还是工作,希望能保持这种状态。

我的名字据父母说是姥爷起的。一直不明白有什么意义,也没当面问过姥爷。现在想到这篇文章,有种豁然开朗的感觉。「川」这个字的意思是流水,意义大概是冀望我能够「川流不息」保持前进,不要因为自满、畏惧或其它原因而贪恋过去裹足不前。即不要像诸葛亮所说的「年与时驰,意与日去,遂成枯落,多不接世」。下面的引文也都是借「川」字来阐明为学不可停滞。

川学海而至海,故谋道者,不可有止心。
莠非苗而似苗,故穷理者,不可无真见。
– 《围炉夜话》

百川学海而至于海,丘陵学山不至于山,是故恶夫画也。
– 《法言·学行》