25 October 2013

Blizzard's Hadoop platform

Talk given by Brian Griffith and Amanda Gerdes at the OC Hadoop user group meeting in October 2013.

Blizzard uses the same Hadoop platform for Diablo 3, Starcraft 2, WoW, and Hearthstone. This platform went live in March 2013. Before this platform, game developers would log game events in log files, and use custom scripts to ETL these log files into relational databases for analysis. Problem: cumbersome, hard to maintain, low performance.

Solution: game developers, on their own, decide what to track in their game, and send that data to the platform. Instead of a log file, the game developers send protobuf objects. The 20 nodes in the Hadoop cluster receive and deserialize around a billion objects per day. The message's headers determine where to store each protobuf object within Hadoop. Blizzard also uses Hadoop as an operational data store. The cluster runs map-reduce jobs to filter and aggregate the stored protobuf objects. Currently, the 20 nodes store 60TB, but now that Blizzard realizes what they can do with Hadoop, they plan a 100-node cluster storing 1PB. Current bottleneck: CPU for deserialization. They rarely hit the disk for data so no IO waiting bottleneck.

For the messaging, they use a federation of machines running RabbitMQ. 50 producers worldwide (China, Europe, US, etc.) and 8 consumers (most likely in the Blizzard headquarters in California). When an Internet cable got cut with China, some messages were queued for 40 hours.

The analysts were using Greenplum for processing game data in parallel and ETL. Now that Hadoop is in place, they could start using Pig for ETL. But sales and customer data still come from relational databases, and the Greenplum is great at ETL. So there is no reason to force analysts into Hadoop. Solution: ETL jobs pull data from Hadoop and store it into Greenplum for warehousing. Greenplum is still in charge of its own ETL jobs.

Some data (such as a character's level or class in WoW) stay interesting across patches, but others (such as transmogrification usage) are only interesting when the feature launches. So the platform allows to build KPIs daily (e.g. activity aggregated per character level) and dive deep on demand.

Use case: WoW economy. Every gold transaction is sent to Hadoop. Reduce step aggregates by NPC id, item id, or player id. Makes it possible to:

  • Check if the gold sinks follow designers' expectations.
  • Detect networks of gold farmers.
  • Keep an eye on auction house prices.
  • More ...

15 October 2013

Handling references in ACM format in Word

This was tested on Word 2010 on Windows.

Step 1: Download and install the styles (instructions on their website). We still have a problem: the ACM - Name Sequence* formatting does not work (it writes [BO] instead of the ref numbers).

Step 2: Download the extender. Save your doc with the ref style ACM - Name Sequence*. Close Word. Run the tool. Don't forget to re-open the doc and select the style again.

08 October 2013

Importing Chinese characters from CSV file to SQL server

Problem:

Error 0xc02020a1: Data Flow Task 1: Data conversion failed. The data conversion for column "Column 0" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.". (SQL Server Import and Export Wizard)

Solution: