23 Nov 2012

Node.js で Twitter Streaming API のデータを Treasure Data に流しこむ

Treasure Data - Live Customer Data Platform

Twitter の Streaming API を Treasure Data に流しこむのがすごく簡単にできて感動したという話しです.

追記

この記事を Treasure Data さんのブログで紹介していただきました...!

How To Analyze Twitter Data From Node.js Applications in 15 Minutes | Treasure Data Blog

手順

CentOS 環境で, node は入っているものとします.

  • twitter にアプリを登録して api をたたくのに必要なキー類を発行する
  • Treasure Data のアカウントも Sign up
  • td コマンドの準備 (no title)
$ curl -L https://get.rvm.io | bash -s stable --ruby
$ source ~/.rvm/scripts/rvm
$ rvm install 1.9.2 # takes time
$ rvm use 1.9.2 --default
$ gem install td
$ td
usage: td [options] COMMAND [args]
  • アカウントの authorize
$ td account -f
プロンプトが出るのでメアドとパスワードを入れる
  • td-agent のインストール
    • treasure data の yum リポジトリを追加. 以下を `/etc/yum.repos.d/td.repo` に保存
[treasuredata]
name=TreasureData
baseurl=http://packages.treasure-data.com/redhat/$basearch
gpgcheck=0
    • インストール
$ sudo yum update
$ sudo yum install -y td-agent
  • api key を td-agent.conf に設定
    • api key を確認
$ td apikey:show
    • /etc/td-agent/td-agent.conf を編集
    
      apikey に上記のキーを設定

    
      type forward
      port 24224 を追加
  • sudo /etc/init.d/td-agent start

td 周りの準備はこれで完了. あとはスクリプトを書く準備です. twitter api のクライアントには ntwitter, fluentd とのやりとりには fluent-logger-node を使います.

package.json を準備し,

{
  "name": "sample-app",
  "version": "0.0.1",
  "private": true,
  "dependencies": {
    "ntwitter": "~0.5.0",
    "fluent-logger": "~0.1.0"
  }
}

npm install すれば OK です.

コードはこんな感じです. 実質 3 行ほどで, 非常に短い. (Twitter の各種キーは自分で取得したものに変更のこと)

/*jslint indent: 4*/
/*jslint node: true */
'use strict';

var Twitter = require('ntwitter'),
    logger = require('fluent-logger');
logger.configure('td.test_db', {host: 'localhost', port: 24224});

var twit = new Twitter({
    consumer_key: 'XXX',
    consumer_secret: 'XXX',
    access_token_key: 'XXX',
    access_token_secret: 'XXX',
});

twit.stream('statuses/filter', {'track': 'javascript'}, function (stream) {
    stream.on('data', function (data) {
        logger.emit('javascript', data);
    });
});

これだけで, 'javascript' というキーワードでのtwitter 検索結果を treasure data に突っ込んでくれます. こんなかんじに Hive のクエリをなげてあげると結果を取得できます.

% td query -w -d test_db 'select get_json_object(v["user"], "$.screen_name"), v["text"], v["retweet_count"] as retweet_count from javascript order by retweet_count desc limit 20'
Job 1143630 is queued.
Use 'td job:show 1143630' to show the status.
queued...
  started at 2012-11-22T13:26:33Z
  Hive history file=/tmp/1179/hive_job_log__1270572010.txt
  Total MapReduce jobs = 1
  Launching Job 1 out of 1
  Number of reduce tasks determined at compile time: 1
  In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=
  In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=
  In order to set a constant number of reducers:
    set mapred.reduce.tasks=
  Starting Job = job_201209262127_102700, Tracking URL = http://ip-10-8-189-47.ec2.internal:50030/jobdetails.jsp?jobid=job_201209262127_102700
  Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=10.8.189.47:8021 -kill job_201209262127_102700
  2012-11-22 13:26:45,811 Stage-1 map = 0%,  reduce = 0%
  2012-11-22 13:26:52,919 Stage-1 map = 100%,  reduce = 0%
  finished at 2012-11-22T13:27:05Z
  2012-11-22 13:27:03,029 Stage-1 map = 100%,  reduce = 100%
  Ended Job = job_201209262127_102700
  OK
  MapReduce time taken: 24.44 seconds
  Time taken: 24.563 seconds
Status     : success
Result     :
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| _c0           | _c1                                                                                                                                          | retweet_count |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| brianjewing   | RT @vmg: "The men who stare at callbacks": 4 engineers,  free to choose any language to solve problems with, end up picking Javascript.  ... | 9             |
| noda3_1st     | RT @ww24: JS ガールと流れてきたのを見て、 JavaScript ガールかと思った… 女子小学生とかどうでもいいから! JavaScript ガールはよ!!          || 9             |
| erudot        | RT @futomi: 拙著『HTMLとJavaScriptではじめるWindowsストアアプリ開発入門』 が発売されました。ご興味がある方はぜひ。 http://t.co/RnVh66ja      | 9             |
| ModusJesus    | RT @arnog: Who says JavaScript is not for large scale, sophisticated apps: http://t.co/U8QXp66Q                                              | 9             |
| cesarob       | RT @janneharkonen: I don't get the Stockholm syndrome around javascript. Yes, you can build amazing things with it. No, that doesn't mak ... | 9             |
| Public05      | RT @losneurona: HTML, CSS, JavaScript, PHP, MySQL? Buscamos alumnos en práctica para desarrollo web; Proyección. Envía DM Envia DM @Busc ... | 9             |
| soulbit       | RT @ariyahidayat: new blog post, on the detection of "Polluting and Unused JavaScript Variables" http://t.co/EqCGcA72                        | 9             |
| anandrajaram  | RT @arnog: Who says JavaScript is not for large scale, sophisticated apps: http://t.co/U8QXp66Q                                              | 9             |
| ajarn_donald  | RT @AvocetCreative: Don’t rely just on cool JavaScript to navigate or view content, if a user can not see it, they can not use your webs ... | 8             |
| Bonifacio2    | RT @janneharkonen: I don't get the Stockholm syndrome around javascript. Yes, you can build amazing things with it. No, that doesn't mak ... | 8             |
| yourwebmaker  | RT @addyosmani: A Few New Things Coming To JavaScript ♡ http://t.co/jgh5WCdW #esnext                                                         | 8             |
| iPrashanta    | RT @addyosmani: A Few New Things Coming To JavaScript ♡ http://t.co/jgh5WCdW #esnext                                                         | 8             |
| azat_co       | RT @sfhtml5: On 5 Dec, @ariyahidayat will walk through JavaScript code analysis at the wonderful @StackMob HQ. Sign up now! http://t.co/ ... | 8             |
| MCKLMT        | RT @deltakosh: Adding a parallax background to your #windows8 #javascript app!\nhttp://t.co/7lPWUP6b                                         | 8             |
| BrendanEich   | RT @ariyahidayat: new blog post, on the detection of "Polluting and Unused JavaScript Variables" http://t.co/EqCGcA72                        | 8             |
| yurimalheiros | RT @janneharkonen: I don't get the Stockholm syndrome around javascript. Yes, you can build amazing things with it. No, that doesn't mak ... | 8             |
| hinatami      | RT @futomi: 拙著『HTMLとJavaScriptではじめるWindowsストアアプリ開発入門』 が発売されました。ご興味がある方はぜひ。 http://t.co/RnVh66ja      | 8             |
| Bardty        | RT @vmg: "The men who stare at callbacks": 4 engineers,  free to choose any language to solve problems with, end up picking Javascript.  ... | 8             |
| SoyVengador   | RT @losneurona: HTML, CSS, JavaScript, PHP, MySQL? Buscamos alumnos en práctica para desarrollo web; Proyección. Envía DM Envia DM @Busc ... | 8             |
| isenthil      | RT @MicrosoftPress: Final version: http://t.co/NkWzHD8Z RT @spoofyroot: Free Ebook on coding Win8 Apps from Microsoft Press http://t.co/ ... | 8             |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+---------------+
% td tables
+----------+------------+------+-------+--------+
| Database | Table      | Type | Count | Schema |
+----------+------------+------+-------+--------+
| test_db  | javascript | log  | 4284  |        |
| test_db  | test       | log  | 4     |        |
+----------+------------+------+-------+--------+
2 rows in set

このように, 集計期間中のツイートの RT が多いものを取り出せます.

まとめ

Node で Twitter Streaming API を取得して Treasure Data に流しこむ方法を紹介しました. Treasure Data が非常に簡単に扱えるということは知っていたんですが, 実際やってみてほんの数行のコードでこんなことができるのかと驚きました.