thoughts in plain text

my thoughts about software engineering, startups, science and people

Data-driven Approach to CFP - Mining PyData Conferences

Data-driven approach is good in the most cases. Most of us have seen people use this approach for some of the business decisions, or something you will care much more than a minor daily decision, but what if your toolkit is so fast and powerful that you can use it easily even for daily tasks.

In this article, I’m going to explain one of such use-cases, and introduce you one of the tools I use for some of my tasks.

So starting with the problem: I was interested to visit a conference, in my case it was one of the local PyData conferences. If you visit a conference you most likely focus on content, so I wanted to have an analysis of the content of the conference. From another side, I was also interested to see, how the focus of the conference was changing over the time and for the very last point, try to find out would it be the kind of conference where I can share my knowledge and experience of using python for data-related tasks.

Original iPython workbook is hosted on Github or also could be viewed on Nbviewer

Scientific Octopress: Writing LaTeX Formulas and R Code

I like blogging platform Octopress, the product is very flexible and also could be hosted even on Github. For my old posts, I’ve added different custom changes to support variety of types of contents and to have a nice design. Basically, for programming posts it’s enough, you could even easily use Github gist for hosting code snippets from posts and lots of other interesting features. But, honestly, it’s only for the programming part of me.

From another side, I do like to read some nice researches and every nice research has solid math behind, that make that paper even more elegant. So my task for today is to be able to share some ideas with possible math proofs of them or even some based papers.

The Story of Your Data by R

Data-data-data! Nowaday companies have lots of sorts of different data. Some of them just store data and looking forward to having some insigths, some already have interesting aggregation and reporting frameworks, there are even some that use data to make data-driven decisions. But, today I’m going to write blog post not about real busines cases and data-driven decisions, but about interesting data visualization.

Among different sorts of data, we usually have time series datasets, for instance, user signups, some transactions and so on. The idea of my Friday evening hacking was to write R script, that could visualize such time series. From another side, I wanted to play around with maps in R, so I decided to create visualization of user signups in terms of user location and grouped by a month of user signup.

The story of your data by R

Twitter Word Cloud With R

Sometimes, during general working days you need more creativity or just to take a look at something from different angle. Since I work all my time with data, I also had some data-related ideas. I was interested to get the picture of the Twitter mentions. The ideal case was to build the word cloud of the most popular words from tweets with certain hashtag.

I do like R language, and use it for the most of questions I have about data. Another point here, that R community is quite big and there are lots of modules for almost every task. So, the main idea was to write a simple R script to analyse a Twitter feed and after build word cloud of the most popular words.

twitter analysis

Building and Scaling a Company. Linkedin Story.

I was always interested in technology stack of large-scale startups. It’s quite interesting to follow the evolutions of the company during not so long periods. From one point, it’s interesting to follow changes of technologies, but much more interesting are reasons which have led to such decisions. So today I’m going to share the story of Linkedin told by Jay Kreps, Principle Staff Engineer at LinkedIn during the InfoQ conference.

Steal Like an Artist: 10 Things Nobody Told You About Being Creative

Initially I thought about writing technical posts here, but in the titles of the blog there also are words “my thoughts about .. people”. So, I’m gonna add some blog posts about interesting books.

About format: I don’t like to write down any reviews or other similar things. Currently I use my kindle to read the most books I have. Also I like to highlight some quotes while reading a book with a kindle. I’m not sure, that independent quotes could represent any full picture of a book, but anyway I’m gonna just share my quotes like points of the book. And if you’re interested you could try to connect some of them and take decision to read it or to skip.

So, the first book I’m gonna open this sort of posts here is Steal Like an Artist: 10 Things Nobody Told You About Being Creative

Steal Like an Artist: 10 Things Nobody Told You About Being Creative

Celery Messaging at Scale at Instagram

Interesting talk by an infrastructure engineer from Instagram. Described idea of feed generation at scale from different points of view, starting with simple and very expensive O(∞), after with Gearman & Python solutions and finally based on Celery and RabbitMQ. Considered different brokers to have reasonably fast time of response from one point and also good replication and even chunking from another. Good overview of configuring Celery for big scale with different routings, queues and concurrency levels.

Actionable Customer Development

Recently found a good presentation about actionable customer development for startups. Yes, it sounds like some buzzwords, but in presentations you could find the first stage of every idea - validation. The most interesting part of the workshop is recommendations about performing correct user interviews to validate the idea.

Usually it’s a very first big mistake of funders, because we so much believe in our idea. We love it so much, that are ready to ask the wrong questions just to find some confirmation of our ideas. But much better to ask correct questions and to find the real user problems or even pains, than just to work on the problem nobody has.

Fail fast! Fail cheap!

Scaling Pinterest

Interesting video from Surge 2012 conference by guys from Pinterest. Yash Nelapati and Marty Weiner tell the story of changes of the startup’s infrastructure from early days and when Pinterest became one of the biggest website. Interesting to follow changes of technologies, especially withing such small group of engineers.