MongoCollection::aggregate

(PECL mongo >=1.3.0)

MongoCollection::aggregate — aggregation フレームワークを使って集約する

説明

public MongoCollection::aggregate ( array $pipeline [, array $options ] ) : array

public MongoCollection::aggregate ( array $op [, array $op [, array $... ]] ) : array

MongoDB の » aggregation フレームワークを使うと、値を集約するときに MapReduce を使わずに済ませることができます。 MapReduce は強力な手段ですが、単にフィールドの合計や平均を調べたいだけといった単純な集約に使うにはちょっと大げさすぎることもあります。

このメソッドには、任意の数のパイプライン演算子を指定することもできるし、パイプラインを構成する演算子の配列を一つだけ渡すこともできます。

パラメータ

pipeline

パイプライン演算子の配列。

options

集約コマンドのオプション。以下のオプションが使えます。

"allowDiskUse"

集約時に、テンポラリファイルへの書き込みを許可します。
"cursor"

カーソルオブジェクトの作成を制御するオプション。このオプションは、コマンドの実行結果として返されるドキュメントを、 MongoCommandCursor を作るのに適した形式にします。このオプションを使う必要がある場合は、 MongoCollection::aggregateCursor() を使うことを検討しましょう。
"explain"

パイプライン処理の情報を返します。
"maxTimeMS"
サーバー上で操作を行う累積時間の制限 (アイドル時間を含まない) を、ミリ秒単位で指定します。この時間内にサーバー側の操作が完了しなければ、MongoExecutionTimeoutException をスローします。

op: 最初のパイプライン演算子。
op: 二番目のパイプライン演算子。
...: それ以降のパイプライン演算子。

返り値

集約の結果を配列で返します。成功した場合は ok が 1 になり、失敗した場合は 0 になります。

エラー / 例外

エラーが発生した場合は、次のキーを含む配列を返します。

errmsg - 失敗の原因。
code - エラーコード。
ok - 0 が入っています。

変更履歴

バージョン	説明
1.5.0	オプションの引数 `options` が追加されました。

例

例1 MongoCollection::aggregate() の例

次の例は、記事につけられたタグでグループ化した、作者名のセットを作ります。 aggregation フレームワークを呼ぶには、こんなコマンドを発行します。


<?php
$m = new MongoClient("localhost");
$c = $m->selectDB("examples")->selectCollection("article");
$data = array (
    'title' => 'this is my title',
    'author' => 'bob',
    'posted' => new MongoDate,
    'pageViews' => 5,
    'tags' => array ( 'fun', 'good', 'fun' ),
    'comments' => array (
      array (
        'author' => 'joe',
        'text' => 'this is cool',
      ),
      array (
        'author' => 'sam',
        'text' => 'this is bad',
      ),
    ),
    'other' =>array (
      'foo' => 5,
    ),
);
$d = $c->insert($data, array("w" => 1));

$ops = array(
    array(
        '$project' => array(
            "author" => 1,
            "tags"   => 1,
        )
    ),
    array('$unwind' => '$tags'),
    array(
        '$group' => array(
            "_id" => array("tags" => '$tags'),
            "authors" => array('$addToSet' => '$author'),
        ),
    ),
);
$results = $c->aggregate($ops);
var_dump($results);
?>

上の例の出力は以下となります。

array(2) {
  ["result"]=>
  array(2) {
    [0]=>
    array(2) {
      ["_id"]=>
      array(1) {
        ["tags"]=>
        string(4) "good"
      }
      ["authors"]=>
      array(1) {
        [0]=>
        string(3) "bob"
      }
    }
    [1]=>
    array(2) {
      ["_id"]=>
      array(1) {
        ["tags"]=>
        string(3) "fun"
      }
      ["authors"]=>
      array(1) {
        [0]=>
        string(3) "bob"
      }
    }
  }
  ["ok"]=>
  float(1)
}

次の例では » zipcode データセットを使います。 mongoimport を使って、このデータセットを mongod インスタンスに読み込みましょう。

例2 MongoCollection::aggregate() の例

人口が1000万人を上回るすべての州を返すには、こんな操作をします。


<?php
$m = new MongoClient("localhost");
$c = $m->selectDB("test")->selectCollection("zips");

$pipeline = array(
    array(
        '$group' => array(
            '_id' => array('state' => '$state'),
            'totalPop' => array('$sum' => '$pop')
        )
    ),
    array(
        '$match' => array(
            'totalPop' => array('$gte' => 10 * 1000 * 1000)
        )
    ),
);
$out = $c->aggregate($pipeline);
var_dump($out);
?>

上の例の出力は、たとえば以下のようになります。

array(2) {
  ["result"]=>
  array(7) {
    [0]=>
    array(2) {
      ["_id"]=>
      string(2) "TX"
      ["totalPop"]=>
      int(16986510)
    }
    [1]=>
    array(2) {
      ["_id"]=>
      string(2) "PA"
      ["totalPop"]=>
      int(11881643)
    }
    [2]=>
    array(2) {
      ["_id"]=>
      string(2) "NY"
      ["totalPop"]=>
      int(17990455)
    }
    [3]=>
    array(2) {
      ["_id"]=>
      string(2) "IL"
      ["totalPop"]=>
      int(11430602)
    }
    [4]=>
    array(2) {
      ["_id"]=>
      string(2) "CA"
      ["totalPop"]=>
      int(29760021)
    }
    [5]=>
    array(2) {
      ["_id"]=>
      string(2) "OH"
      ["totalPop"]=>
      int(10847115)
    }
    [6]=>
    array(2) {
      ["_id"]=>
      string(2) "FL"
      ["totalPop"]=>
      int(12937926)
    }
  }
  ["ok"]=>
  float(1)
}

例3 MongoCollection::aggregate() の例

各州の市の平均人口を返すには、こんな操作をします。


<?php
$m = new MongoClient;
$c = $m->selectDB("test")->selectCollection("zips");

$out = $c->aggregate(
    array(
        '$group' => array(
            '_id' => array('state' => '$state', 'city' => '$city' ),
            'pop' => array('$sum' => '$pop' )
        )
    ),
    array(
        '$group' => array(
            '_id' => '$_id.state',
            'avgCityPop' => array('$avg' => '$pop')
        )
    )
);

var_dump($out);
?>

上の例の出力は、たとえば以下のようになります。

array(2) {
  ["result"]=>
  array(51) {
    [0]=>
    array(2) {
      ["_id"]=>
      string(2) "DC"
      ["avgCityPop"]=>
      float(303450)
    }
    [1]=>
    array(2) {
      ["_id"]=>
      string(2) "DE"
      ["avgCityPop"]=>
      float(14481.913043478)
    }
...
    [49]=>
    array(2) {
      ["_id"]=>
      string(2) "WI"
      ["avgCityPop"]=>
      float(7323.0074850299)
    }
    [50]=>
    array(2) {
      ["_id"]=>
      string(2) "WV"
      ["avgCityPop"]=>
      float(2759.1953846154)
    }
  }
  ["ok"]=>
  float(1)
}

例4 MongoCollection::aggregate() でのコマンド引数

パイプラインをどのように処理するのかの情報を返すには、explain オプションを使います。


<?php
$m = new MongoClient;
$c = $m->selectDB("test")->selectCollection("zips");

$pipeline = array(
    array(
        '$group' => array(
            '_id' => '$state',
           'totalPop' => array('$sum' => '$pop'),
        ),
    ),
    array(
        '$match' => array(
            'totalPop' => array('$gte' => 10 * 1000 * 1000)
        )
    ),
    array(
        '$sort' => array("totalPop" => -1),
    ),
);

$options = array("explain" => true);
$out = $c->aggregate($pipeline, $options);
var_dump($out);
?>

上の例の出力は、たとえば以下のようになります。

array(2) {
  ["stages"]=>
  array(4) {
    [0]=>
    array(1) {
      ["$cursor"]=>
      array(3) {
        ["query"]=>
        array(0) {
        }
        ["fields"]=>
        array(3) {
          ["pop"]=>
          int(1)
          ["state"]=>
          int(1)
          ["_id"]=>
          int(0)
        }
        ["plan"]=>
        array(4) {
          ["cursor"]=>
          string(11) "BasicCursor"
          ["isMultiKey"]=>
          bool(false)
          ["scanAndOrder"]=>
          bool(false)
          ["allPlans"]=>
          array(1) {
            [0]=>
            array(3) {
              ["cursor"]=>
              string(11) "BasicCursor"
              ["isMultiKey"]=>
              bool(false)
              ["scanAndOrder"]=>
              bool(false)
            }
          }
        }
      }
    }
    [1]=>
    array(1) {
      ["$group"]=>
      array(2) {
        ["_id"]=>
        string(6) "$state"
        ["totalPop"]=>
        array(1) {
          ["$sum"]=>
          string(4) "$pop"
        }
      }
    }
    [2]=>
    array(1) {
      ["$match"]=>
      array(1) {
        ["totalPop"]=>
        array(1) {
          ["$gte"]=>
          int(10000000)
        }
      }
    }
    [3]=>
    array(1) {
      ["$sort"]=>
      array(1) {
        ["sortKey"]=>
        array(1) {
          ["totalPop"]=>
          int(-1)
        }
      }
    }
  }
  ["ok"]=>
  float(1)
}

参考

MongoCollection::aggregateCursor() - Execute an aggregation pipeline command and retrieve results through a cursor
MongoDB » aggregation フレームワーク