2013 Work in Retrospect

December 25th, 2013

Depending on one’s perspective of Microsoft’s attitude towards Java and open source, it may considered an rather odd case that I mostly worked on Java and open source at Microsoft in the past year. Or more specifically, I worked on the Apache Hadoop project. I fixed some test and product issues, participated in some research/incubation project, as well as certain aspects of the Microsoft’s service offering of Hadoop — Azure HdInsight. I am not sure what I can say or cannot say about my work without violate the company policy. But all my work on Apache Hadoop is public and can be viewed here.

Beyond the work around Hadoop, I did not make much progress in some other projects that I am interested. I did not get the chance to study the deep learning which seems an interesting new trend in machine learning which also got some traction in industry. This will definitely be on my todo list for the coming weeks.


November 22nd, 2013

最近看了Gravity这部电影。不知道你有没有这样的经历,生活里遭遇了一些不幸或挫折,就想离开伤心之地,去到一个新的地方。电影里的女科学家Ryan Stone经历过丧女之痛,第一次参与太空任务,修复哈勃望远镜。离开地球,在太空中,对她也许就像是这样一种解脱和逃离。所以当另一个宇航员Matt Kowalski问她最喜欢太空中的什么时,她回答说宁静。但是这宁静之后,危机却不期而至。


Matt是一个理智乐观的男人。他在Ryan心里埋下了希望和乐观的种子。在关键时刻,他更是果断牺牲了自己。这电影后半段,我一直再想Matt后来在想些什么。坦然接受死亡,仔细体验生命逝去的过程?也许他会像费曼那样说到「I’d hate to die twice. It’s so boring.」

最终,电影遵循了好莱坞式的结局,女主角最终成功回到了地球。从太空回到地球,对她也许是一次重生。当Sandra Bullock回到地面,慢慢站起来走向前方的时候,我的心里也有一种感动。




另外这部电影居然打破了George Clooney和Sandra Bullock的票房纪录。他们可都是五十上下的人了。

Whidbey Island Century

July 3rd, 2013

It was my first time on the island. The weather was great — sunshine, clear sky, and not as hot as Seattle; full of scenic views on the route as well. Ride strong most of the part with avg speed > 16 mi/h. Towards the end, I felt a little tired and was dropped by the group. Forgetting my GPS on the day, I took a shortcut to finish the ride. The whole length was about the same, though elevation was about 1000ft short. Still, both distance (105.9mi) and elevation (6,538ft) broke my past records. A really memorable century ride this year!


读书:The Emperor of All Maladies

August 19th, 2012

最近读完了The Emperor of All Maladies: A Biography of Cancer这本书。作者,Siddhartha Mukherjee(former Rhodes Scholar,Stanford,Oxford,HMS) ,自己就是在哥伦比亚大学医学中心和医学院从事癌症治疗和研究的医生和学者。作者以相对通俗和生动的文字勾勒人类对癌症认知的历史,治疗手段发展的过程,期间还穿插了他自身的工作经验和与病人的交往。这本书资料详实,比如,注释就有467条,附录于书后的文献提要差不多有5页之多。但是文字数字之外,作者付出了更多的努力其采访了亲身参与癌症作斗争的医生,学者和病人。比如,作者辗转多方,最终通过互联网,找到了一位在1964年接受白血病VAMP疗程的幸存者Ella。接受VAMP疗程的病人只有5%的一年以上存活率。而这个采访不光让作为读者的我意识到早期癌症治疗虽然成功率低,但并非没有意义,而且也提供了癌症长期幸存者对疾病和治疗的感知。

作为一名中国读者,上世纪至今和癌症斗争的主战场无疑是在美国,但是中国医生也做出了一些有意义的贡献。本书写到了两例。李敏求(Min Chu Li)医生在NCI工作期间第一个用化疗彻底治愈了一种不常见的癌症(Choriocacinoma)。这也是人类第一次采用化疗彻底治愈癌症。瑞金医院的王振义(Zhen Yi Wang)医生与法国医生Laurent Degos是最早在临床实践中发现和使用了针对一种白血病(APL)细胞的化学药物:trans-retinoic acid。这种药物是第一个被临床证明有效,而且只对癌细胞有特殊针对作用的化学药物。


《Steve Jobs》读后

January 10th, 2012

花了差不多两个星期读完了Walter Isaacson的《Steve Jobs》。感受最深的一个词:product。一切都是为产品服务,苹果的成功也是建立在它一代又一代成功的产品上的。

This Tech Bubble Is Different

December 27th, 2011

“The best minds of my generation are thinking about how to make people click ads,” he says. “That sucks.”

My favorite quote of the year is from the following Bloomberg Businessweek article:
This Tech Bubble Is Different

Algorithm Analysis

December 10th, 2011

From my experience, usually, only two algorithm analysis techniques are tought in college level algorithm analysis classes: asymptotic analysis and amortized analysis. There are some addtional analysis frameworks or tools that can reveal or explain other interesting aspects of certain algorithms and data structures. Here are three interesting examples.

Competitive analysis
I think the best introduction to this line of analysis is still Sleator and Tarjan’s 1985 paper, “Amortized efficiency of list update and paging rules.” It reveals why a simple list update operation can be as competitive as an optimally designed algorithm that knew the data it operates on in advance. The analysis is simple yet surprising. The paper alone is worth reading for pleasure if you did not know it before.

External memory
RDBMS is the most popular data management system in the past three decases. To my knowledge, every RDBMS includes some kind of B-tree implementation. The external memeory analysis shows why algorithms and data structures like B-tree operate so well in the world of layered storage systems. In other words, if you data does not fit into memory, you may want to use this model to analysis and design your algorithms that need to access data outside memory.

Smoothed analysis
Simplex method is a popular algorithm to solve linear programming problems. It has a exponential time worst case according to asymptotic analysis. However in practice, it works very well. The smoothes analysis offered an theoritical explanation of the phenomena.

The three examples above are interesting for me. After I graduated from college and began to work in the industry, there are times I found the code used in practice was quite different than what I thought and wrote in college. When choices of data structures and algorithms need to be made for problems, there are places algorithms with worse big Os are used, and simple and straight forward data structures are favored over sophisticated ones that may prove better on paper. However, when I look deeper and analysis in the exact context of the problem, many times those choices begin to make sense. The three examples above always reminded me when considering an algorithm and data structure for a real problem, there are usually more things need to be taken into account than simple big O notations. I think this is also the reason we still need to do simulations and experiments when we choose or design algorithms and data structures, at least for many real world problems.

Software Economics

November 5th, 2011

Strategy Letter V by Joel Spolsky

Best piece of economical analysis of software business. Though I feel the power of community and individuls’ willingness to contribute is a little undermined in this analysis. For example, I think there is something in human nature to participate in community work voluntarily. We have many real world examples of such. Joel’s own company Stack Exchange is “Free, Community-Powered Q&A” — to quote their own title.

Why Not Port Microsoft SQL Server to Linux

July 28th, 2011

I just read a blog post from a former SQL Server architect explaining their original decision making process around the issue of porting Microsoft SQL Server to Linux. I have worked at Microsoft on SQL Server for about a year now. This answered some questions I have in mind for some time, and I just can’t agree more with his writing. I can share some of my own thinking here.

Before I join database group at Microsoft, I am a heavy Linux user. I played PC games on Windows 98/XP in high school and college; My first programming course used Visual C++ 6.0 on Windows XP; and, that is all. The rest of my projects are mostly done on Linux. My first encounter of Windows 7 and Microsoft’s server line products happened the first day I worked at Microsoft. How I adapt to the Windows environment is another story. (It is much less difficult than I thought, and I really begin to appreciate the amazing WinDbg.)

Since I began looking at the SQL Server source code, an obvious question came to my mind. How much work was required to port the code to Linux? After all, I was more familiar with GNU/Linux API than with Windows API at the time. To my surprise, the core database engine actually does not have so much OS specific code. Given the Wine project, I suspect it may only took one or two months of work to do the port. Then another question comes to my mind: how much work is required to support the alternative product? My conclusion is that it is almost impossible without changing how we current organizing the engineering effort around the product. For example, it will require a lot more efforts to develop test infrastructures that matches existing test tools on Windows. The only solution I can think of is to have an entire team that dedicates to this new product. The Linux product possibly also need a new business model. Then with so many open source alternatives on Linux, I am a little pessimistic such a product can be very successful in the market.


November 21st, 2010