Node.js で Twitter Streaming API のデータを Treasure Data に流しこむ
Treasure Data - Live Customer Data Platform
Twitter の Streaming API を Treasure Data に流しこむのがすごく簡単にできて感動したという話しです.
追記
この記事を Treasure Data さんのブログで紹介していただきました...!
How To Analyze Twitter Data From Node.js Applications in 15 Minutes | Treasure Data Blog
手順
CentOS 環境で, node は入っているものとします.
- twitter にアプリを登録して api をたたくのに必要なキー類を発行する
- Treasure Data のアカウントも Sign up
- td コマンドの準備 (no title)
$ curl -L https://get.rvm.io | bash -s stable --ruby $ source ~/.rvm/scripts/rvm $ rvm install 1.9.2 # takes time $ rvm use 1.9.2 --default $ gem install td $ td usage: td [options] COMMAND [args]
- アカウントの authorize
$ td account -f プロンプトが出るのでメアドとパスワードを入れる
- td-agent のインストール
- treasure data の yum リポジトリを追加. 以下を `/etc/yum.repos.d/td.repo` に保存
[treasuredata] name=TreasureData baseurl=http://packages.treasure-data.com/redhat/$basearch gpgcheck=0
- インストール
$ sudo yum update $ sudo yum install -y td-agent
- api key を td-agent.conf に設定
- api key を確認
$ td apikey:show
- /etc/td-agent/td-agent.conf を編集
apikey に上記のキーを設定
- sudo /etc/init.d/td-agent start
td 周りの準備はこれで完了. あとはスクリプトを書く準備です. twitter api のクライアントには ntwitter, fluentd とのやりとりには fluent-logger-node を使います.
package.json を準備し,
{ "name": "sample-app", "version": "0.0.1", "private": true, "dependencies": { "ntwitter": "~0.5.0", "fluent-logger": "~0.1.0" } }
npm install すれば OK です.
コードはこんな感じです. 実質 3 行ほどで, 非常に短い. (Twitter の各種キーは自分で取得したものに変更のこと)
/*jslint indent: 4*/ /*jslint node: true */ 'use strict'; var Twitter = require('ntwitter'), logger = require('fluent-logger'); logger.configure('td.test_db', {host: 'localhost', port: 24224}); var twit = new Twitter({ consumer_key: 'XXX', consumer_secret: 'XXX', access_token_key: 'XXX', access_token_secret: 'XXX', }); twit.stream('statuses/filter', {'track': 'javascript'}, function (stream) { stream.on('data', function (data) { logger.emit('javascript', data); }); });
これだけで, 'javascript' というキーワードでのtwitter 検索結果を treasure data に突っ込んでくれます. こんなかんじに Hive のクエリをなげてあげると結果を取得できます.
% td query -w -d test_db 'select get_json_object(v["user"], "$.screen_name"), v["text"], v["retweet_count"] as retweet_count from javascript order by retweet_count desc limit 20' Job 1143630 is queued. Use 'td job:show 1143630' to show the status. queued... started at 2012-11-22T13:26:33Z Hive history file=/tmp/1179/hive_job_log__1270572010.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_201209262127_102700, Tracking URL = http://ip-10-8-189-47.ec2.internal:50030/jobdetails.jsp?jobid=job_201209262127_102700 Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=10.8.189.47:8021 -kill job_201209262127_102700 2012-11-22 13:26:45,811 Stage-1 map = 0%, reduce = 0% 2012-11-22 13:26:52,919 Stage-1 map = 100%, reduce = 0% finished at 2012-11-22T13:27:05Z 2012-11-22 13:27:03,029 Stage-1 map = 100%, reduce = 100% Ended Job = job_201209262127_102700 OK MapReduce time taken: 24.44 seconds Time taken: 24.563 seconds Status : success Result : +---------------+----------------------------------------------------------------------------------------------------------------------------------------------+---------------+ | _c0 | _c1 | retweet_count | +---------------+----------------------------------------------------------------------------------------------------------------------------------------------+---------------+ | brianjewing | RT @vmg: "The men who stare at callbacks": 4 engineers, free to choose any language to solve problems with, end up picking Javascript. ... | 9 | | noda3_1st | RT @ww24: JS ガールと流れてきたのを見て、 JavaScript ガールかと思った… 女子小学生とかどうでもいいから! JavaScript ガールはよ!! || 9 | | erudot | RT @futomi: 拙著『HTMLとJavaScriptではじめるWindowsストアアプリ開発入門』 が発売されました。ご興味がある方はぜひ。 http://t.co/RnVh66ja | 9 | | ModusJesus | RT @arnog: Who says JavaScript is not for large scale, sophisticated apps: http://t.co/U8QXp66Q | 9 | | cesarob | RT @janneharkonen: I don't get the Stockholm syndrome around javascript. Yes, you can build amazing things with it. No, that doesn't mak ... | 9 | | Public05 | RT @losneurona: HTML, CSS, JavaScript, PHP, MySQL? Buscamos alumnos en práctica para desarrollo web; Proyección. Envía DM Envia DM @Busc ... | 9 | | soulbit | RT @ariyahidayat: new blog post, on the detection of "Polluting and Unused JavaScript Variables" http://t.co/EqCGcA72 | 9 | | anandrajaram | RT @arnog: Who says JavaScript is not for large scale, sophisticated apps: http://t.co/U8QXp66Q | 9 | | ajarn_donald | RT @AvocetCreative: Don’t rely just on cool JavaScript to navigate or view content, if a user can not see it, they can not use your webs ... | 8 | | Bonifacio2 | RT @janneharkonen: I don't get the Stockholm syndrome around javascript. Yes, you can build amazing things with it. No, that doesn't mak ... | 8 | | yourwebmaker | RT @addyosmani: A Few New Things Coming To JavaScript ♡ http://t.co/jgh5WCdW #esnext | 8 | | iPrashanta | RT @addyosmani: A Few New Things Coming To JavaScript ♡ http://t.co/jgh5WCdW #esnext | 8 | | azat_co | RT @sfhtml5: On 5 Dec, @ariyahidayat will walk through JavaScript code analysis at the wonderful @StackMob HQ. Sign up now! http://t.co/ ... | 8 | | MCKLMT | RT @deltakosh: Adding a parallax background to your #windows8 #javascript app!\nhttp://t.co/7lPWUP6b | 8 | | BrendanEich | RT @ariyahidayat: new blog post, on the detection of "Polluting and Unused JavaScript Variables" http://t.co/EqCGcA72 | 8 | | yurimalheiros | RT @janneharkonen: I don't get the Stockholm syndrome around javascript. Yes, you can build amazing things with it. No, that doesn't mak ... | 8 | | hinatami | RT @futomi: 拙著『HTMLとJavaScriptではじめるWindowsストアアプリ開発入門』 が発売されました。ご興味がある方はぜひ。 http://t.co/RnVh66ja | 8 | | Bardty | RT @vmg: "The men who stare at callbacks": 4 engineers, free to choose any language to solve problems with, end up picking Javascript. ... | 8 | | SoyVengador | RT @losneurona: HTML, CSS, JavaScript, PHP, MySQL? Buscamos alumnos en práctica para desarrollo web; Proyección. Envía DM Envia DM @Busc ... | 8 | | isenthil | RT @MicrosoftPress: Final version: http://t.co/NkWzHD8Z RT @spoofyroot: Free Ebook on coding Win8 Apps from Microsoft Press http://t.co/ ... | 8 | +---------------+----------------------------------------------------------------------------------------------------------------------------------------------+---------------+ % td tables +----------+------------+------+-------+--------+ | Database | Table | Type | Count | Schema | +----------+------------+------+-------+--------+ | test_db | javascript | log | 4284 | | | test_db | test | log | 4 | | +----------+------------+------+-------+--------+ 2 rows in set
このように, 集計期間中のツイートの RT が多いものを取り出せます.
まとめ
Node で Twitter Streaming API を取得して Treasure Data に流しこむ方法を紹介しました. Treasure Data が非常に簡単に扱えるということは知っていたんですが, 実際やってみてほんの数行のコードでこんなことができるのかと驚きました.