UPDATE: Love from Twitter: Tweep Marina Martin showed me this Greasemonkey script that simulates the Labs functions: Just hit “l” and the “Apply Label” function pops up. Hit the first letter of your label and it starts to drill down. Amazing, and useful. Thanks Marina!
I learned tonight that Google Apps users aren’t able to take advantage of the slew of Google Labs features that are made available to Gmail users. Google has recently released new labeling features that I would like to try out, but without the Labs link in Google Apps I’m out of luck.
Welcome to the New Era of Cloud Computing
Google, Fremont Campus 4/30/2008
6:30 PM to 9:00 PM
New technologies for large-scale data storage and processing are allowing companies to manage ever-increasing data set sizes. Scalable “cloud computing” technologies offer low-overhead ways to host your products, while ensuring that your computing base can adapt to changing needs.Open source tools such as Hadoop provide low-cost but high-powered platforms on which to develop your systems. This evening presentation, aimed at technology decision-makers of local high-tech corporations, will explain what you need to know to engineer reliable, scalable distributed systems to manage your data. The presentation will address the following topics:
What is changing about data availability today?
What is cloud computing, and in what form is it available to your company?
What systems has Google developed to manage large-scale data, and what makes them unique?
What open source systems can provide these benefits to you?
Speaker Aaron Kimball, Sr. Consultant, Spinnaker Labs, Inc.
Aaron Kimball is a leading authority on Hadoop-based system deployment. He provides advice, system development, and training to corporations and academic institutions worldwide. In 2007 he developed and taught a new undergraduate course in distributed computing with Hadoop at the University of Washington; this curriculum forms the basis for new courses being presented at top-tier universities across America and around the globe.
See the TechCrunch post here. Essentially this is a little bit of a letdown after the big (unsubstantiated) lead-up over the weekend, due to the Python-only support. Yes, it’s just the first of what may be many languages, but I suppose they could have foreseen the reaction from the Ruby, PHP, or other (C# *cough*) crowds that might be very interested in building out apps within Google’s cloud.
So, if you’re Djangoing, you’re probably dancing…otherwise, hurry up and wait, unless you have the free time to learn a new platform.
We could have used this at Seattle Startup Weekend, where we developed Skillbit (RIP). That was a Django app. For me? If I have to do any heavy lifting using methods out of Programming Collective Intelligence, I may give this a go. Later.
First, a 1-hour video talk given last summer by Jeff Dean about Google’s overall distributed architecture, including Google File System, MapReduce, and BigTable:
Next, a website I found called highscalability.com which talks about a lot of these topics in a blog format. There’s an interesting summary of Google’s architecture with links here. Ironically, this site seems to be down/overloaded a lot.
Next, a whitepaper on BigTable. Lots of details for the inquiring mind, but still approachable for a software person who is not expert in distributed systems, or BigTable in particular. This was linked from the TechCrunch article.
Finally, there’s this separate 1-hour video, also with Jeff Dean, that was given in 2005 at the UW.
I haven’t actually watched this one yet, having opted to watch the 2007 one linked earlier.
Have fun! P.S. I would appreciate notes about other good BigTable orientation information.
Finally, Bigtable supports the execution of client-supplied scripts in the address spaces of the servers. The scripts are written in a language developed at Google for processing data called Sawzall . At the moment, our Sawzall-based API does not allow client scripts to write back into Bigtable, but it does allow various forms of data transformation, filtering based on arbitrary expressions, and summarization via a variety of operators.
Hmmm….this is interesting. Drop some data in to BigTable, tie it to a Sawzall script you’ve created — how to get the results back, if Sawzall can’t write _into_ BigTable? Have to figure that one out.
For a computationally intensive product like the one I’m developing, this is very attractive. And I don’t have to switch platforms like I would to get cloud processing done in Amazon’s EC2. I want to find out more about Sawzall.
I say “killer” only because if Google gets in, it’s going to be good. BigTable, an internal Google database product that they use to support their fast read/writes on petabytes of data (yes, peta-), is going to be released as a consumer offering in the same mode as Amazon’s SimpleDB. See the TechCrunch writeup here.
Good news for web startups? Certainly. Good news for Amazon? Probably, only insofar as a new industry – cloud computing – will support lots of competitors, and Google getting in only further validates the concept (as if it needed validating to begin with).
If Google can make it as easy to use as their other consumer offerings, like Maps, then we’re all in for a treat.
I’d heard from Tony Hung at Deep Jive Interests that there has been another PageRank update, and out of curiosity I went online and checked the page rank of my blog, http://xidey.wordpress.com. I was very, VERY surprised to see a PageRank of 5, whereas before whenever I’d checked it, I saw a big fat goose egg.
5 is very respectable. Deep Jive Interests, which I’ve been reading since I started blogging, got dropped to a 3. Andrew Chen, another very respectable blogger, is at 4. Brier Dudley is a 3. John Cook is a 4.
Is it because I don’t have advertisements? Dunno, but I’m excited about the spike in traffic I’m already seeing. I’m going to stick to my knitting, whether my PageRank goes up or down, because I’m ultimately blogging for myself (and you), not Google.
A patent application lodged by Google in July 2007 but recently made public seeks to patent a method where by robots (computers) can read and understand text in images and video [...] I may be stating the blatantly obvious when I say that if Google has found a way to index text in static images and video this is a great leap forward in the progression of search technology.
Well, I don’t know if it’s “obvious” but Microsoft’s OneNote 2007 has had this technology already in place for a while.
Here’s me using Windows+S to copy the TechCrunch page into OneNote:
Then I right-click and select “Make Text in Image Searchable”, specifying English as my language of choice:
Then I can search for “Duncan Riley” and get the results highlighted within the body of the image itself:
It really is amazing technology, and has been out for a while now. Give it a try!