Mobile Apps - 4 Mandatory Tools for Every Mobile Developer

1) Crashlytics

Crashlytics, acquired by Twitter in early 2013, is the best tool for crash reporting. Until Crashlytics came around the biggest challenge for mobile developers was their inability to detect and fix crashes by real users in the open.

There were some tools that were able to send a basic crash report but because the code is compressed when building an app in Google or Apple stores, there was no way to detect which class really caused the crash. Moreover, on mobile some bugs happen only on certain devices depending on the vendor, Android version, App version and more. There was no way for developers to know all this data before Crashlytics came along.

Enter Crashlytics, with an easy integration on Android and iPhone we are now able to monitor all the crashes (that are unfortunately always there) in production. The crashes are sorted by priority, for each crash we can see:

1) Distribution by device.

2) Distribution by Android version

3) And most importantly the full stack trace with the real class names and row numbers. Crashlytics was able to achieve this very smartly by including dedicated plugin for Xcode and Eclipse, which takes care of uploading the symbols file automatically and seamlessly.

image

2) Google Analytics

Having analytics on your mobile apps is mandatory, and if that’s the case, why not use the most reliable tech company on earth? Ohh, and it’s also free. 

Google Analytics (GA) for mobile have easy integration for both Android and iPhone. By just inserting one line of code to your app (together with the SDK of course), you’ll be able to immediately track: 

1) Active Users

2) Sessions

3) New vs Returning Visitors

4) Session Duration

5) Users device details including device model and operating system

6) Many more 

However, there’s much more than just those parameters, with GA you can also define specific events that happen in your app such as: “user read article”, “user successfully connected to Facebook”, and so on. By defining the key events or KPI of your app, you’ll be able to track funnels like how many users did those specific events? Which country are they from? What are the effects of some changes you’ve made in the app on specific key events? Maybe distribute your users into 2 groups and A/B test to decide which group converts the best?

image

3) Appirater

Google and Apple Stores give a lot of weight to the app rating. Most app developers don’t ask their users to rate them at the store, and that’s a big mistake. Having a rating over 4.0 can dramatically improve your app visibility at both stores. But how should you ask your users to do it?

1) You don’t want to disturb a user who just installed your app with this sort of msg. Maybe only in the 3rd or 4th session?

2) Maybe it’s better just to wait X days until prompt?

3) We want to remember the users who already got the “rating prompt” so we won’t bother them again anytime soon.

4) Maybe the best time to prompt the rating is not when the app is just opened, but later on when a user is doing some significant event?

As you can see, even though it may sound simple to just show a dialog with text and link to the store but like anything else, to do it properly you need to invest some thoughts and implement functionality. 

Luckily, Appirater is an open source project that exist for both iPhone and Android which have all the functionality I just described, built in. Plus, it has vast multi-language support in case your app is targeted to more than just English. 

We use Appirater to ask all our users to help us by rating at the store and currently we have a rating of 4.6 on both stores with tens of thousands user ratings. 

image

4) App Annie

App Annie is a free service that allow you to track downloads from both Google and Apple stores. You might ask, why should you use it if all the data already exist for you at Google or Apple developer console? 

First, App Annie saves all your data forever (unlike Apple in the past). It has a much better user interface which allows you to analyze stats easily. Moreover, you can compare your download numbers to your category average, and most importantly, you can get the full picture of your app stats in terms of its rating in each country and you’ll be able to see where and when you app is being featured.

Also, App Annie is a great tool to analyze any app from Google or Apple, You can look at your competitors and better understand their ranking, rating and where they are featured.

image

* Written by Oded Regev

Getting The Best Out of Logstash for NginX

Note: If you’re not familiar with Logstash, please watch the introduction video to get the hang of inputs, filters and outputs and how they operate within Logstash. I do assume you have the basic knowledge about Logstash in this tutorial.

Logstash was a real game changer to our monitoring capabilities at FTBpro.com. We’re using it to aggregate and analyse every system which produces log files in our infrastructure. The setup may take some time, but it is certainly worth the while.

Among all the logs we aggregate using Logstash, we also aggregate our Nginx logs, which I am sure a lot of you would like to but just don’t know exactly how. I wanted to share our configuration and how exactly we set things up to make it work. It has been working for a few good months now with incredible results.

So this is a general sketch of our system: Logstash Architecture

The Server

The server has 4 main components which make it work:

  1. Redis: Used in order to receive and store events form the web servers
  2. Logstash Indexer: Takes events from the Redis and pushes them into ElasticSearch
  3. ElasticSearch: Gets events from the Redis and stores them
  4. Kibana Web: A very cool web application which nicely presents data stored in ElasticSearch

You will have to install Java, Redis and ElasticSearch to have it ready for action. Afterwards, just follow this 10 minutes walkthrough to get the Logstash agents up. Pay attention that you will have two Logstash agents: one to move events from the Redis to the Elasticsearch (Indexer), and another Logstash web process which holds the Kibana Web interface.

After you have everything installed, the server.conf file is really simple:

input {
  redis {
    host => "127.0.0.1"
    data_type => "list"
    key => "logstash"
 
    # We use the 'json' codec here because we expect to read
    # json events from redis.
    codec => json
  }
}
output { 
  stdout { codec => rubydebug}
  elasticsearch {
    index => "logstash-%{+YYYY.MM.dd}-%{type}"
    protocol => http
 }
}

As an input, we have redis which operates on localhost. We’re reading the “logstash” key of it which holds a list of events. As an output, we have ElasticSearch and rubydebug just to be able to watch the log files of the agent if things mess up.

The Clients

The clients (Web servers) have only a Logstash Agent which reads the NginX access log and sends it directly to the Redis on the Logstash Server. The Logstash agent here is the same one we installed on the Server earlier. So get back to that 10-min tutorial and install the Logstash agent on your web server. Pay attention that the Logstash Web (Kibana) is not needed on the Web servers so don’t run it here.

After getting the agent installed, we need to set up the NginX logging to log all the data we care about. We found that the default NginX log format does not supply us with all the data we’d like to have, so we altered it a bit. This is the log format definition on our NginX servers:

    log_format ftbpro '$http_host '
                    '$remote_addr [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    '$request_time '
                    '$upstream_response_time';
    

The differences from the default log are:

  • We added a HTTP host to the log. This is because one NginX server may serve multiple hosts and we’d like to know on which one the request operated.
  • We omitted $remote_user variable from the default log - we saw no use of it.
  • Added $request_time to have response time of static files served by NginX.
  • Added $upstream_response_time to have response time of our application server (Unicorn)

Now that we have all the needed data in the log file, all that needed is to ship it right to the Redis on the Logstash server. This is how the agent.conf file looks on our Web Servers:

input {
        file {
                path => **YOUR NGINX ACCESS LOG FILE PATH**
                type => "web_nginx_access"
        }
}
 
filter {
        grok {
                type => "web_nginx_access"
                match => [
                  "message", "%{IPORHOST:http_host} %{IPORHOST:clientip} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{NUMBER:request_time:float} %{NUMBER:upstream_time:float}",
                  "message", "%{IPORHOST:http_host} %{IPORHOST:clientip} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{NUMBER:request_time:float}"
                ]
        }
        date {
                type => "web_nginx_access"
                match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
        }
        geoip {
                type => "web_nginx_access"
                source => "clientip"
        }
}
 
 
 
output {
  stdout { codec => rubydebug }
  redis { host => **REDIS_SERVER_IP_HERE** data_type => "list" key => "logstash" }
}

A few points worth mentioning about it:

  1. The input, as expected, is the NginX access log file.
  2. Grok

    The Grok filter is one of the most useful filters included in Logstash. It allows you to analyse each log line and fetch fields like client_ip, request and response code from it, by matching the log line against regular expressions.
    We’re using it here to analyse the NginX log line and to extract all the fields we’d like to get out of it to ElasticSearch.
    Logstash ships with a bunch of regular expressions set up for you so you can easily start matching log lines without even writing regular expressions by yourself. You can find the list of given regular expressions here.

    For example, the first part of the string %{IPORHOST:http_host} means we’re matching the pre-defined Grok pattern IPORHOST and want to have it as http_host field in ElasticSearch. So simple.

    Getting the hang of Grok is really a matter of practice. There is also a great web tool called Grok Debugger which allows you to try Grok Patterns on inputs you define.

    The reason that there are two patterns (lines 12, 13) is that Grok will try to match the first and then the second. If NginX serves a static file it won’t have $upstream_response_time. So we need one Grok pattern to “catch” requests that went to upstream (this is the first pattern) and one Grok pattern for static requests served (this is the second one).

  3. Inside the Grok pattern we can explicitly mention data types on fields we would like to analyse as numbers. For example
    %{NUMBER:request_time:float}
    means that request_time field will be indexed as float on ElasticSearch and that we would be able to run numeric functions on it like mean, average etc, and to show it on the graph.
    This is extremely important because otherwise the response_time would have been indexed as a string and not a number and we couldn’t properly display it on a graph. NUMBER is, again, a pre-defined Grok pattern.

  4. The date filter allows us to set the date on the event sent to ElasticSearch. By default logstash puts the time the agent read the log line as the timestamp. This is not good for us because NginX writes the log line after the request has been processed, so the time the log line was written is not the time in which the request arrived to our server.

  5. The geoip filter adds geographical fields on the event. This will later allow us to ask questions like “From which country I got the most requests?”

  6. The output is pretty self explanatory: Send the events to the Logstash Server Redis and to STDOUT for debugging purposes.

Results

Click To Enlarge

  1. The top left graph shows us HTTP responses by time. Each response time - 200OK, redirects, not-found and errors are colored so we can distinguish between them. Dragging the mouse from left to right inside the chart will drill down on the dragged time frame on all of the graphs on the dashboard.

  2. The top right graph shows us only HTTP errors (5xx) responses by time. If we see a bump on this graph it usually means something went wrong. Again you can drag your mouse to drill on the time frame you wish to focus on.

  3. The two bottom left graphs are showing top URLs hits and top user-agents. Clicking on the magnifying glass on the right to each of them will filter all the graphs on the dashboard by this url or user-agent.

  4. The bottom right pie chart shows us the distribution of response codes. Clicking on each section on the pie will filter all graphs on the dashboard to show only requests with the chosen response code.

Click To Enlarge

  1. The left chart simply shows us mean response time.

  2. The right map shows us geo data of requests. The more requests, the darker the color of the country of the map. Clicking on a country will filter all the graphs on the dashboard to show only requests from this country. So simple, but yet so powerful.

Summary

So what do we get out of it? Endless possibilities to analyse and drilldown our web server activity logs. We are able to instantly answer so many questions we couldn’t before. Questions like:

1. What are the top URLs in terms of hits? 404 responses?
2. What agents get the most errors?
3. From what countries are we getting most traffic?
4. What are the slowest responses and to which URLs?

The limit is your imagination and getting immediate results is a matter of a click.

So yes, go get Logstash now. It is a great project, really well maintained and most important - will elegantly provide you with answers you had no way to get before.

* Written by Erez Rabih

"A year ago we started a big change in FTBpro. We completely changed the visual design, moved to a single page architecture and started exploring new ways to minimise load on our servers - both when serving our actual website, and mobile API responses.We’ll focus on how scaling considerations are now an integral part of our architecture, which enabled us to serve 20x more traffic than we did 1 year ago, with the same setup and with no additional costs."

The presentation we gave at Google Campus TLV on the 20th of May 2014 based on http://tech.ftbpro.com/post/78969626647/growing-x20-without-spending-an-extra-penny-on-hosting

Single Page Applications Done Right

Single Page What?!

Not too long ago, FTBpro.com moved its website from classic web architecture (CWA) to a single-page application (SPA). For those who are less familiar with SPA, the major differences between the two are as follows:

  1. First client request:
    • CWA: The server returns a full HTML page
    • SPA: The server returns a set of templates and data which is then rendered to the full HTML page, by Javascript, on the client side.
  2. Subsequent requests
    • CWA: The server returns a full HTML page
    • SPA: The server returns only the data needed to display the request. Since all templates are already on the client side from the first request, no HTML/CSS/JS should be delivered.

To put it in simpler words - while in CWA the client gets a full HTML page on each request, in SPA the client gets everything needed to be able to render the HTML by itself on the first page load, so that subsequent requests might be carried in a very fluent and rich manner. It makes your website more of an application and less of a site.

Pros & Cons

SPA relies heavily on Javascript to perform actions that were server-side only a few years ago. It means that a large part of the “heavy lifting” involved in creating web pages is now the responsibility of the client, for better and for worse. Let’s go over the advantages and disadvantages of SPA over CWA.

Advantages:

  • User experience is much richer and more fluent.
  • Lower server load because HTML rendering is done on the client, and subsequent requests only require the specific data needed to render that page.
  • Definitive separation between client and server - server acts more like an API.

Disadvantages:

  • SEO: Search engines (Google etc…) do not execute Javascript when they index your site. Since in SPA pages are rendered on the client side by Javascript you need to find a way to supply search engines with a full HTML page.
  • Caching: Since pages are rendered on the client side, you can’t cache full HTML pages on the server end. This also impacts CDN services as what they cache is only templates/data and not full HTML pages.
  • Performance: As said, SPA is heavily client-side dependent. It means that your site might load fast on some clients while being painfully slow on others. This becomes a large factor as old mobile devices might have a rough time rendering the pages and introduce large load times for your site.

No Compromises

We knew that we must build our web site as a SPA in order to stay relevant and provide our users with the best possible experience. We also knew we will have to overcome those disadvantages if we wanted to be able to keep our SEO rank high and our users happy. There are several solutions to these problems like PreRender but they are more focused around SEO solutions and less on the other two. We wanted a solution that would enable us to:

  1. Be indexed by Google bots and other search engine crawlers.
  2. Cache full pages HTML on the CDN and on our server cache.
  3. Improve user experience on old mobile devices, which were suffering from the heavy Javascript execution a normal SPA requires.

We figured out that what we really need is a hybrid between CWA and SPA: We want to return fully rendered HTML pages on the first request while subsequent requests behave just like normal SPA. If we could achieve that target, we would enjoy all the advantages of SPA with all the disadvantages eliminated because:

  1. Google bots and other crawlers will easily index our pages since the server returns fully rendered HTML pages.
  2. There is no problem to cache full HTML pages on the server and the CDN will also have a full HTML page cached.
  3. Since we render HTML pages on the server no additional work is needed on the client side (at least for the first request) for the page to be shown. This decreases the dependency on client hardware which mainly impacts old mobile devices.

Since there was no product or service fulfilling all three, we set out on a journey to create one ourselves. The idea was to use PhantomJS,  which is a headless browser capable of running Javascript, to render our SPA pages. The journey ended with 3 separate open source projects: phantom_renderer, phantom_manager and phantom_server. Each has its own important role in the process of producing a full HTML page for our client.

How Does It Work?

FTBpro PhantomJS architecture

  1. Client (user) requests a URL - assume it is /posts/3 from the web server
  2. The web server, via phantom_renderer asks phantom_server to render /posts/3
  3. phantom_server requests SPA version of /posts/3 from the web server
  4. The web server returns the SPA version to phantom_server
  5. phantom_server renderes the SPA to a full HTML page
  6. phantom_server returns a full HTML page to the web server
  7. The web server returns a full HTML page to the client

A quite simple flow, which ends in a full HTML page being sent to the client. Let’s talk a little bit about each of the components and what are their exact roles in the system.

Phantom Renderer

Phantom Renderer is a ruby gem which integrates with rails controllers and communicates with the phantom_server in order to produce a full HTML page. It is responsible on two important things:

  1. Caching: when full HTML pages are returned by phantom_server they are being cached on the server cache so that they won’t have to be produced again.
  2. Managing phantom_server responses: In case of bad (5xx) response from phantom_server a regular SPA page will be returned to the client. The same is applied if phantom_server does not respond in a given timeframe (configurable). This way, even if something bad happens to phantom_server we fallback to a regular SPA.

Phantom Manager

At the beginning we were having a lot of problems with PhantomJS processes over time. Some of their memory got bloated, some of them crashed and some of them just stopped responding. We had to come up with a solution to make those processes more stable or at least monitor them somehow. Phantom Manager does just that: It manages a predefined amount of PhantomJS processes behind an NginX server. For example, when one PhantomJS process stops responding it will remove it from the Nginx Configuration and re-add it only after it raised another process instead. This way we keep a constant pool of PhantomJS processes which are always ready to render a page.

Phantom Server

Phantom Server is a collection of technologies making the whole rendering process happen. This is how it works:

  1. An NginX server listening on port 80
  2. phantom_manager manages PhantomJS processes on ports 80xx. It syncs every action it does on PhantomJS processes with the NginX Config.
  3. Each PhantomJS process runs a customized version of rndr.me
  4. Requests are delegated from NginX directly to PhantomJS processes on ports 80xx

Conclusion

The system has been up and running for about half a year now. It is all open-source and ready to deploy. If you have a SPA site, and experience the same issues as we did, give it a try. We will be more than happy to help with any setup issues you encounter.

Written By Erez Rabih - Web Developer @FTBpro (erez@ftbpro.com)

Growing X20 without spending an extra penny on hosting

If your website is a social network then this post is probably not for you. If you have a blog, a news site or an e-commerce site, it might!

This shot from NewRelic compares the load on the server on the cyan line with the pageviews on the yellow line before and after a push notification is sent. 

image

We acheived these great results by a fanatical use of CDN and integrating it really deep into our servers. We constantly recited that disconnecting the correlation between pageviews and CPU will allow us to scale without really scaling hardware.

A year ago we started a big change in FTBpro’s website. We have changed the design totally, moved to a single page architecture and started exploring new ways to minimise load on our servers. Later we have implemented the lessons we learned and the methodologies we have developed on our mobile API. On the outside, the result is FTBpro.com site and mobile app as you know it today. We had two goals in mind - make the user experience faster and lower the load on the servers. This post is about the latter. 

Disconnect the correlation between Pageview and CPU

We need to make sure CPU power is not wasted - twice, thrice and more for the same data. Why SQL twice when you know nothing changed…

If a page is called twice in a reasonable timeframe don’t rebuild it. This can be achieved via full page caching in one of three levels:

  1. Application Level (e.g. Rails, Java, PHP)
  2. Middleware (e.g. Varnish)
  3. by CDN

The first and second approaches are slightly easier to manage as they are contained within your own server but they don’t eliminate the correlation between Pageview and CPU - they do take it to minimum. If the first page took 200ms to render the cached version could be returned at 1ms or even less. With CDN the cached version is not even hitting the main server so the correlation can be defined as “disconnected”. The downside of using a CDN is usually the hassle of choosing one, setting it up, and get a good contract - there are tons of CDNs out there and many small parameters to distinguish between them.

The naive approach for full page caching is setting an expiration time on a page (e.g. 15 minute) so once every X minutes it is expired and the CDN takes the freshest version from the server. That’s okay and very easy to manage but it has two disadvantages:

  1. Data updates don’t appear straight away to the user.
  2. There will still be correlation between pageviews and CPU, even if low one.

In order to overcome the second disadvantage we can just set the expiration time to never ;-) but now our page will surely be obsolete at some time - as any editorial update won’t be reflected. This can be solved by using a CDN provider that has purge or load APIs. In the server layer attach an expiration event to the classes that are in charge of updating data to these pages (e.g. Post#after_save in a Rails app). Most CDN providers has these APIs but there are two important criteria that differ from one to another:

  1. Speed: some CDNs purge at 200ms, some at 1min, some at 45min
  2. Purge criteria: some CDNs allow purge by exact URL, some allow REGEXPs, some force you to “tag” each URL in the HTTP response headers and purge by those tags (much more work but can give the best results in a few cases)

So what did we do?

  1. Configured the CDN to keep all our pages for ever, never expire.
  2. Modified URL structure of the APIs (mobile & web APIs) to be in a pattern that is purgeable. For example - our CDN couldn’t purge based on query string parameters; We had to modify URLs to have a restful structure and /feed?team=arsenal had to change to /feed/arsenal.
  3. Added “expirators” to our different models. Whenever a post is saved we expire it’s URL & the URLs of feeds that should contain it. e.g. Updating a post about a game between Arsenal and Barca will expire the url of the post itself, Arsenal’s feed, Barca’s feed, Premier League feed and La-Liga feed (both on mobile and web)
  4. Before sending push-notification on a post, it is automatically preloaded to the CDN. At these times we can get up to 100k requests a minute to the website and none of them is reaching our servers.
  5. Added an Application level full-page caching layer with MemCached after realising CDN is constructed of many different independent servers which will all hit our application server if they don’t have the cached version, creating a real load on them.

What happened?

  1. The user experience became much better because all requests are served from light&fast CDN servers that are geographically near the user.
  2. We use the exact same server resources for 100m pageviews we have today as we used for 5m pageviews we had 7 months ago.
  3. We chose a CDN that fit our needs. We pay them a small fraction of what we payed our former CDN with a x20 increase in load.

That’s a good opportunity to praise Edgecast, the CDN which we use. They have exceeded our expectations in every parameter:

  1. Amazing quality of service. They respond fast to emails, they are available on the phone and just stay there and give service for as long as it takes.
  2. Technology. Their user interface is a bit sluggish but it allows us to really go crazy and set different configuration rules based on our wild url structure. And they purge fast - a few seconds to 1-2 minutes per purge. 
  3. Great price. That wasn’t the main criteria in choosing CDN but it happened to be very affordable nonetheless.

by Dor Kalev, CTO @ FTBpro

image

Ruby 2.1 - Our Experience

We’ve recently moved FTBpro’s Ruby on Rails servers to the newest Ruby version on earth - Ruby 2.1. It has been running on our production servers for the past two weeks. Our stack includes: MySQL, MongoDB, Rails 3.2, ElasticSearch Memcached and Redis. We wanted to share our experience of making this change.

Incompatibilities

1. First thing you encounter when you move to Ruby 2.1 is the non-working net/http module. As explained here, Ruby 2.x Net/HTTP library asks for gzipped content by default, but does not decode it by default, which makes some JSON parsing of HTTP requests break. This breaks koala gem, right_aws gem and many other gems which relies on JSON HTTP communication to operate. The solution to this is a small patch to the net/http library. We have put it in our config/initializers/a_net_http.rb so that Rails loads it upon boot.
The patch:

require 'net/http'
module HTTPResponseDecodeContentOverride
  def initialize(h,c,m)
    super(h,c,m)
    @decode_content = true
  end
  def body
    res = super
    if self['content-length'] && res && res.respond_to?(:bytesize)
      self['content-length']= res.bytesize
    end
    res
  end
end
module Net
  class HTTPResponse
    prepend HTTPResponseDecodeContentOverride
  end
end
        

UPDATE: Looks like this is a specific bug with right_http_connection which monkey-patches Ruby’s net/http and breaks it. You can read more about it in this thread.

2. All of the mongoDB users on the room pay attention: current stable version 0.12.0 of the mongomapper gem does not support ruby2.x. We upgraded our gem version to 0.13.0beta2, and in combination with the net/http patch in bullet 1 it works like a charm.

3. If you are a fan of the debugger gem you’ll have to say farewell. It does not support ruby2.x in any manner and causes nasty segmentation faults with long outputs. The good news is that there is a very good replacement: byebug gem. Its interface is almost similar to that of the debugger gem so you’ll feel right at home, and it works well with ruby2.x.

4. If you’re using imagesize gem to determine the height/width of images you’ll have to find a replacement. We already had Rmagick in our gemset which includes image dimensions retrieval so we just used it.

5. We had a weird bug with the BigDecimal library in ruby 2.1. Here is the output of the exact same code under ruby1.9.3 and ruby 2.1:

#ruby 1.9.3
require 'bigdecimal' ; require 'bigdecimal/util'; (0.5.to_d / 0.99.to_d).to_f # => 0.505050505 
 
#ruby 2.1.0
require 'bigdecimal' ; require 'bigdecimal/util'; (0.5.to_d / 0.99.to_d).to_f # => 0.0

Don’t know how to explain this but we’re lucky to have a test suit for this module because we’d never discover it until it got to production.

UPDATE: Wasn’t aware of it but apparently BigDecimal division is a known bug in Ruby 2.1. You should check out this list for more info.

This concludes the changes we had to make to our code so it runs well under Ruby 2.1. Not much, but is it worth the hassle?

The Effects

We observed three, very prominent, improvements in Ruby 2.1 over 1.9.3:

1. Load times are significantly lower. And by “significantly” I mean about forth of the time. The larger your environment is, the larger the difference of the load time. We were nothing less than amazed by this:

* Deployment time dropped from 14 minutes to approximately 5 minutes. This is due to the many rake tasks we run while deploying. We make about 15 deployments to our QA servers daily. That’s 135 minutes, a little more than two hours saved per day for developers waiting for their version to arrive on the QA server.

* Build time by Jenkins CI was reduced from 14 minutes to about 6 minutes. This has shorten the time from opening a pull request to a successful / failed build notice and made the feedback loop a little more bearable.

* Every run of binary that requires rails environment to be loaded takes now forth of the time. This are the measurements I made on our environment:

Ruby 1.9.3: bundle exec rails runner ‘puts “a”’ 41.06s user 2.23s system 98% cpu 43.916 total

Ruby 2.1.0: bundle exec rails runner ‘puts “a”’ 11.07s user 2.04s system 94% cpu 13.823 total

It saves a lot of waiting time for our developers when running rails server / console and various rake tasks.

2. Garbage collection times dropped from 100ms to almost 0ms. This is our New Relic graph for garbage collection. The vertical line marks the deploy which moved us to ruby 2.1: GC - ruby1.9.3 vs 2.1

3. We had a severe problem on deployments during high traffic hours - we’d just go down from time to time while unicorn workers were restarting. Ruby 2.1 amazingly mitigates this problem since environment load time is 1/4 now and Ruby 2.1 GC is copy-on-write friendly which makes unicorn better handle forking. You should definitely read this article which explains how Ruby 2.x affects Unicorn’s forking mechanics.

Weird Stuff

The only thing we can’t explain yet since moving to Ruby 2.1 is some strange unicorn master process behaviour: When we restart it, it always starts off with a different amount of memory size. As a result, every unicorn restart causes the mean response time to differ in about 100ms. Here are two graphs where the vertical line represents a unicorn restarts: Unicorn Restart - Bump Up Unicorn Restart - Bump UpUnicorn Restart - Bump Down Unicorn Restart - Bump Down

This is the only thing we feel uncomfortable with moving to Ruby 2.1.

You Should Also

Ruby 2.1 has some killer advantages over Ruby 1.9.3. It will make you daily operation a lot faster than it is today and can even help you overcome or mitigate other infrastructure problems you’re having. The changes we had to do to move to Ruby 2.1 are really minor comparing to the benefits we got from it. There is no reason to stay behind - make a step forward to Ruby 2.1.

by Erez Rabih, Head of infrastructure @ FTBpro

From Illustrator to a web font. Creating custom scalable web-icons.

State of affairs: We want to switch from cup-up PNG sprites to an icon font and replace the .PNG sprites. Trying to achieve that with minimum effort and if possible no additional software we searched for a web-app to help us.

 

Research:

There is a ton of software and free or demo apps that can help you get your way and add some nice pre-designed vector icons to your project (like fontawesome). However, being a perfectionist using pre-designed icons did not satisfy me. What I wanted is a way to translate the exact vector icons I developed in illustrator and photoshop into a web-font and that seemed to be impossible without expensive font-editing software.

image

 

Solution:

Then we found icomoon app. This baby can take custom .SVG files, combine them with pre-designed icons from several open-source libraries and export them into a custom web-font. Then the session can be stored and downloaded as a .JSON file, making it possible to edit in the future. All free, no strings attached. This is perfect!

(Note, the font-export button is at the bottom.)

image

 

This is how we do it now:

Now I open a sheet in illustrator, exporting every icon to a separate .SVG file and importing them in icomoon.

image

Front-end developers call every icon by its code like so:

<a class=”prev ficon icon-arrow-left”></a>

And voila, now we have scaleable and custom icons all over the website, fitting mobile, tablet and desktop. This saves us a lot of work in development and cutting PNG’s is now history.

Check them all out here: www.ftbpro.com

— Mark Levinson

Designer at ftbpro.com

Push Notifications Explained

There are two types of users at FTBpro: writers & readers.

Our readers want quality content about their favorite team & league.

When we have content that might interest our readers we don’t want them to miss it.

The writers on the other hand would like to get their content read by as large audience as possible.

Fullfiling both these needs is the essence of FTBpro, one effective way we found to accomplish that is through mobile push notifications.

Recently we started a project to rebuild our infrastructure for sending PN, the rest of this post is about this new system.

image

First thing first: what do we exactly need?

We have 21 apps on the App Store and 21 more on Google Play, whenever we send PN it should arrive to all of them.

Every mobile user is a fan of one team from several leagues we support. 

In addition each user can choose the language in which to consume the content on the app.

Now it’s not our intention to spam our mobile users with PN they may not like, to prevent that users should only get PN with content about their favorite team and written in language they can read.

We need the ability to send both immediate PN mainly for breaking news, and also scheduled PN.

Another key requirement is the ability to customized the message & schedule time of the PN per team & league basis, so if we had a post about a lose of Real Madrid to Barca we would like to send a different message to Real fans than Barca’s.

How do push notifications work anyway?

Generally speaking each mobile platform has its own way to send PN.

For iOS devices it’s Apple Push Notification Service (APNs), for Android devices it’s Google Cloud Messaging (GCM). When you want to send PN you have to talk to these services and they in turn send the PN to the mobile devices.

Both APNs and GCM have their own protocols for sending PN, Interfacing directly to these services can be quit tedious. We use Urban Airship.

Urban Airship (UA) is a service that provide us with a convenient way to manage PN for both iOS and Android.

Every app on any platform makes one logical app on UA, we have 21 of those. Using UA the task of sending of PN is reduced to making HTTP POST request to their api.

One very useful feature of UA is tags, tags are just labels that can be associated with any device. The cool thing about tags is that you can tell UA to send PN to all devices associated with one or more tags.

How do we use it? When a user opens one of our apps he is being registered to UA with two tags representing he’s favorite team & league as well as his language . For example a Chelsea fan in English will be registered with ‘team_4_en’ and ‘league_1_en’ tags. Having the tags setted up this way allows us to tell UA to send a PN only to fans of Barca in Spanish for instance.

UA’s api provides us with two endpoints for sending PN: ‘/push’ & ‘/schedules’.

Making POST request to ‘/push’ will result in immediately sent PN, we use this endpoint for PN that need to go out ‘now’.

POSTing to ‘/schedules’  schedules a PN to be sent at a later time, this feature saved us from implementing a scheduling solution on our own.

Ain’t nobody got time for that

Sending PN to all our apps envolves lots of HTTP requests. Making these requests takes time which our web server don’t have, instead we make these requests in the background using Sidekiq.

Sidekiq is background job processing framework for Ruby, it uses threads for its workers giving it advantage over other frameworks, such as Resque that uses one process per worker.

When sending PN the web process enqueues one Sidekiq job for each one of 21 apps, then on a dedicated server 21 Sidekiq worker threads are processing the jobs. each such job is making appropriate requests to UA api for one app and then updates back the status for that app.

The effect of this setup is that when we send PN it arrives to all our apps (almost) at the same time.

image

Persistence

Shai Kerer had already stated that we strive to use the right tool for the job whenever possible.

We store all the PN related data in one MongoDB collection. each document contains canonical data on PN to one app with embedded documents that includes team or league specific data. an example of such document is:

{
  "app": "aston_villa",
  "post_id": 617803,
  "targets":[
    {
      "message": "Transfer Talk: Tottenham Set to Battle for FC Porto Midfielder",
      "locale": "en",
      "scheduled_time": null,
      "team_id": 17,
      "status": "sent"
    },
    {
      "message": "Transfer Talk: Aston Villa Set to Battle for FC Porto Midfielder",
      "locale": "en",
      "scheduled_time": null,
      "team_id": 2,
      "status": "sent"
    }
  ]
}

Using MongoDB allowed us to store the data as we perceive it and not be penalized by expensive joins.

Conclustion

For the few months the new system is live, it has been working smoothly.

Using Urban Airship saved us both time and effort. It allowed us to focus on our specific needs instead of implementing GCM & APNs protocols for sending PN.

We have introduced MongoDB to our ecosystem that will be utilized in future projects.

So overall, you could say it was a good project :)

Gashaw Mola, Web Developer @ FTBpro.com

Count von Count - A real-time counting database!

FTBpro is all about user generated content. Our articles are written by Football fans around the world. Their incentive for writing over and over again is the exposure they know they will receive. They are motivated by the number of reads, comments, likes, tweets or shares their articles will receive. For this reason, these, and many other counters, are very prominent across our site and mobile apps.

image

Along with that, we started working on a new gamification project. The requirement here is that for each action a user makes on our site or mobile app (e.g, reading an article, writing an article that gets featured, sharing on a social network) - he gets a score. 

This compels us to count many different actions for each user on the site and calculate the score - live.

image

If you have read one of our previous posts, you probably know that we have been dealing with the counting issue for quite a long time.

At the early days of our startup, we used to store the numbers in a MySql database. This meant the number of reads of an article was stored in the article’s table. As our scale grew, and the load of the database was increasing, we moved to another solution: extracting the counting to a dedicated Nginx server. 

In this solution, every time an article was read we initiated a request to our counting server with the relevant parameters. The Nginx server logged all the requests to its access.log file, and we had a script running every minute that aggregated the numbers from the recent requests. After the aggregation, the script updated our main app server with the numbers, and they were saved to the same MySql database.

This is no longer good.

Let’s examine it from the end. This counting system stored the numbers in MySql database. Relational database may be good enough in the simple cases of counting reads of a an article, but how can we store a leaderboard? Or all the countries the readers of each article come from? Well, of course there are some tricks that can help you do it, but it’s much easier to save this kind of a data in NoSQL manner.

We would also like the information to be available live. We don’t want to count (boy, this word appears a lot in this article) on background processes for manipulating our data to the relevant format. 

To sum up, we need a live counting system based on some kind of a NoSQL database. It is also has to be scalable (more than hundreds of requests per second) and reliable.

That’s why we developed Count von Count

It is based on OpenResty, an Nginx based web service, bundled with some useful 3rd party modules. OpenResty turns an nginx web server into a powerful web app server using scripts written in Lua programming language. It still has the advantage of the non-blocking I/O but it also has the ability to communicate with remote clients such as MySQL, Memcached and also Redis. We are using Redis as our database for this project, leveraging its following features: 

  • EVAL command evaluates a Lua script in the context of the Redis server. Lua? again this Lua? Yep, this magical language is supported both by Nginx and by Redis. It is also the language for writing addons for World Of Warcraft. It allows us to write all the counter logic in a lua script, which is preloaded to Redis, and is evaluated from the Redis module of OpenResty.
  • Sorted Set datatype is great for leaderboard data modeling. We extensively use it for storing any kind of leaders data, such as top writers and most-read articles. We have different keys for daily, weekly and monthly leaderboards, and each read action makes an update in all of them.
  • Bitmap datatype helps us count real time metrics in a space efficient way. We use it to count the number of daily active users on our mobile applications. Here you can read more about using it.
  • Ttling helps us clean the database from irrelevant objects. 
  • Pipeline requests speed up the whole thing.

Putting it all together

image

  1. In a different server from the app server we have an OpenResty service up and running waiting for counting requests. We make requests to this server both from our client side and  app server, each time we want to +1 or -1 a counter. Based on Nginx EmptyGif module, we return an empty pixel to each request.  Each request holds the action we want to count and extra relevant params. For example, when a user shares a post, the following request is made: http://<counting_server>/post_share?user=700&post=900&author=15&team=arsenal. Since the server returns a gif, the request can be invoked using <img src=…> html element. 
  2. When the Nginx receives the request, he triggers a very minimal Lua script using the LuaModule. The script just parses the request arguments and evaluates a lua script that was preloaded into Redis. The request is also logged to Nginx’s access.log. All the Redis updates are made inside the Redis script to save connection overhead.
  3. The Lua Redis script is a bit more complex and is responsible for updating all the the relevant keys for the given action. For instance, if we take the previously mentioned post_share action, we need to update the number_of_shares field in the following hashes: user_700 key, post_900 and team_arsenal.
  4. For cases of unexpected failures or downtime, we developed a log player, that “plays” the access.log files and updates the relevant Redis data models.

Using the data

Count-von-Count offers an API for retrieving the data. Since we need to show live counters across our site, we wrote a javascript module that collects all the counters in a page, queries our counter server API for the numbers and updates all the counters on the page. In this way, we always show live numbers, as you can see on our user page and post page

Open Source Project

We’ve put a lot of effort in making this project open source. What and how is being counted is configured in a json file, making it extremely easy to embed this project wherever you need. No single line of code is needed! Check it out at https://github.com/FTBpro/count-von-count

We have been using count-von-count in production for several months and we really satisfied with it. It receives millions of requests per day and thousands of requests per minute at peak times.We use it wherever counting is needed., i.e, Player of the Month Widget , Top writers leaderboard and writer’s profile page.

I wish to give a big kudos to the maintainer of the OpenResty project - Yichun Zang. Yichun is also the administrator of the OpenResty Google Group where you can get lots of information about this powerful project.

To learn more about this project, you can watch this short video from DevconTLV conference.

Happy counting,

Posted by

Ron Schwartz, Software Developer @ FTBpro.com

imageimageimageimageimage