Basic

NoSQL 数据库
compass app: 连接 mongodb atlas 的官方 app
mongod: mongoDB 的 database application
mongo: 提供和 database 进行交互的 shell 接口 application

向 mongodb atlas 上传数据

mongoimport --type csv --headerline --db mflix --collection movies_initial --host "cluster0-shard-00-00-1cvum.mongodb.net:27017,cluster0-shard-00-01-1cvum.mongodb.net:27017,cluster0-shard-00-02-1cvum.mongodb.net:27017" --authenticationDatabase admin --ssl --username hansonzhao007 --password XXXXXX --file movies_initial.csv

Aggregation Framework

pipeline

从 collection 里面获得数据，并输入到后面的 stages，每个都进行不同的操作。这些输入输出都是 documents。

stages

stages
stage 里面可以定义自己对 document 的操作。可以是 reshape，accumulation 等等。

stage-function
这里的 stage1 可以用来过滤输入数据，不用对每个数据都处理。stage2 可以用来自定义一些数据操作；stage3 可以再次过滤一下。

write a pipeline

下图是python 写 mongodb pipeline 的图例：
python-pipeline

import pymongo
from pymongo import MongoClient
import pprint
from IPython.display import clear_output

client = MongoClient("mongodb+srv://hansonzhao007:yourpassword@cluster0-1cvum.mongodb.net")

pipeline = [
    # group stage
    {
        '$group':{
            '_id':{"language":"$language"},
            'count':{'$sum': 1}
        }
    },
    # sort stage
    {
        '$sort':{'count': -1} # decending order, 1 is accending order
    }
]

# # a simplify version is:
# pipeline = [
#   {
#     '$sortByCount':"$language"
#   }
# ]

clear_output()
pprint.pprint(list(client.mflix.movies_initial.aggregate(pipeline)))

[{'_id': {'language': 'English'}, 'count': 25325},
 {'_id': {'language': 'French'}, 'count': 1784},
 {'_id': {'language': 'Italian'}, 'count': 1480},
 {'_id': {'language': 'Japanese'}, 'count': 1290},
 {'_id': {'language': ''}, 'count': 1115},
 {'_id': {'language': 'Spanish'}, 'count': 875},
 {'_id': {'language': 'Russian'}, 'count': 777},
 ...

indexes

加速

使用 index 来加速 mongodb 的查找。

indexes

collection 就像是一本书，document 就是书里面的章节内容，而 indexes 的作用就是章节目录。如果没有章节目录，我们想要搜索想要的内容，就只能把书从头找到尾，耗时O(N)。如果有目录，目录按照字母顺序排序，那么想要某一章节，就可以对目录进行 binary search，耗时 O(N)，然后直接定位到对应章节。

同时为了满足针对不同 filed 的索引需求，mongodb 的一个 collection 可以有多个 index，用来进行索引。

multi-indexes