The Pymongo Playbook
nosql pymongo

The Pymongo Playbook


A guide on how to do the basic set up and connect to a MongoDB Atlas server. I will also try to explain how to use PyMongo (Python) to connect to it and perform CRUD operations in MongoDB using NoSQL.

NoSQL

NoSQL, also referred to as “not only SQL”, “non-SQL”, is an approach to database design that enables the storage and querying of data outside the traditional structures found in relational databases in schema-less JSON format.

Instead of the typical tabular structure of a relational database, NoSQL databases, house data within one data structure, such as JSON document.

NoSQL is also type of distributed database, which means that information is copied and stored on various servers, which can be remote or local. This ensures availability and reliability of data. If some of the data goes offline, the rest of the database can continue to run.

Document Based DB → e.g. MongoDB

Key-Value based DB -> e.g. Amazon DynamoDB

Graph Based DB → e.g Neo4j

Column Based DB → e.g. Cassandra, Hbase

MongoDB

MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas supporting CRUD operations.

C — Create DB | Collection | Document

R— Read DB | Collection | Document

U — Update DB | Collection | Document

D— Delete DB | Collection | Document

MongoDB Atlas

  • Create a Cluster
  • Create a Python Driver exposing the Server URI

MongoDB with Python

PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python

Requirements.txt:

pymongo
pymongo[srv]        

Create a new client and connect to the server

from pymongo import MongoClient

uri = "mongodb+srv://<username>:<password>@cluster0.k3s4vuf.mongodb.net/?retryWrites=true&w=majority"

# Create a new client and connect to the server
client = MongoClient(uri)
print(client)

## Output ##
MongoClient(host=['ac-edosefl-shard-00-02.k3s4vuf.mongodb.net:27017', 'ac-edosefl-shard-00-01.k3s4vuf.mongodb.net:27017', 'ac-edosefl-shard-00-00.k3s4vuf.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, retrywrites=True, w='majority', authsource='admin', replicaset='atlas-t2x2vx-shard-0', tls=True)        

Send a ping to confirm a successful connection

status = client.admin.command('ping')
print("Response: {}".format(status))Response: {'ok': 1}        

Create a new Database

db = client.kaggle_iris
print("Database name is {0}".format(db.name))Database name is kaggle_iris        

Print list of Database names

print("List of databases in MongoDB Cluster: {0}".format(client.list_database_names()))

## Output ##
List of databases in MongoDB Cluster: ['forum', 'kaggle_iris', 'mongotest1', 'mongotest2', 'sample_airbnb', 'sample_analytics', 'sample_geospatial', 'sample_guides', 'sample_mflix', 'sample_restaurants', 'sample_supplies', 'sample_training', 'sample_weatherdata', 'admin', 'local']        

Create a new Collection & Print list of Collections

coll = db.iris
print("Collection name is {0}".format(coll.name))
print("List of collections in Database {0}: {1}".format(db.name, db.list_collection_names()))

## Output ##
Collection name is iris
List of collections in Database kaggle_iris: ['iris']        

Datasets | Documents in JSON format

doc = {"sepalLength": 5.5, "sepalWidth": 2.4, "petalLength": 3.7, "petalWidth": 1.0, "species": "versicolor"}
    
doc_list = [
{"sepalLength": 5.1, "sepalWidth": 3.5, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 4.9, "sepalWidth": 3.0, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 4.7, "sepalWidth": 3.2, "petalLength": 1.3, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 4.6, "sepalWidth": 3.1, "petalLength": 1.5, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 5.0, "sepalWidth": 3.6, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 5.4, "sepalWidth": 3.9, "petalLength": 1.7, "petalWidth": 0.4, "species": "setosa"},
{"sepalLength": 5.8, "sepalWidth": 2.6, "petalLength": 4.0, "petalWidth": 1.2, "species": "versicolor"},
{"sepalLength": 5.0, "sepalWidth": 3.4, "petalLength": 1.5, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 4.4, "sepalWidth": 2.9, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
    {"sepalLength": 5.9, "sepalWidth": 3.0, "petalLength": 5.1, "petalWidth": 1.8, "species": "virginica"}]        

CRUD Operations

i.e create, read, update, and delete

Insert one document

coll.insert_one(doc)

## Output ##
<pymongo.results.InsertOneResult at 0x7f8240a9e520>        

Insert more than one document

coll.insert_many(doc_list)

## Output ##
<pymongo.results.InsertManyResult at 0x7f8230bc7fa0>        

Find all documents in collection

for x in coll.find():
    print(x)

## Output ##
{'_id': ObjectId('648f33cf03184eea7a245faa'), 'sepalLength': 5.5, 'sepalWidth': 2.4, 'petalLength': 3.7, 'petalWidth': 1.0, 'species': 'versicolor'}
{'_id': ObjectId('648f33cf03184eea7a245fab'), 'sepalLength': 5.1, 'sepalWidth': 3.5, 'petalLength': 1.4, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fac'), 'sepalLength': 4.9, 'sepalWidth': 3.0, 'petalLength': 1.4, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fad'), 'sepalLength': 4.7, 'sepalWidth': 3.2, 'petalLength': 1.3, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fae'), 'sepalLength': 4.6, 'sepalWidth': 3.1, 'petalLength': 1.5, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245faf'), 'sepalLength': 5.0, 'sepalWidth': 3.6, 'petalLength': 1.4, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fb0'), 'sepalLength': 5.4, 'sepalWidth': 3.9, 'petalLength': 1.7, 'petalWidth': 0.4, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fb1'), 'sepalLength': 5.8, 'sepalWidth': 2.6, 'petalLength': 4.0, 'petalWidth': 1.2, 'species': 'versicolor'}
{'_id': ObjectId('648f33cf03184eea7a245fb2'), 'sepalLength': 5.0, 'sepalWidth': 3.4, 'petalLength': 1.5, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fb3'), 'sepalLength': 4.4, 'sepalWidth': 2.9, 'petalLength': 1.4, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fb4'), 'sepalLength': 5.9, 'sepalWidth': 3.0, 'petalLength': 5.1, 'petalWidth': 1.8, 'species': 'virginica'}        

Find only first documents in collection

print(coll.find_one())

## Output ##
{'_id': ObjectId('648f33cf03184eea7a245faa'), 'sepalLength': 5.5, 'sepalWidth': 2.4, 'petalLength': 3.7, 'petalWidth': 1.0, 'species': 'versicolor'}        

Update document

query = {"sepalLength": 5.9}
newvalues = {"$set": {"petalWidth": 11.9}}
print("Document prior udapte: {0}".format(coll.find_one(query)))
coll.update_one(query, newvalues)
print("Document after udapte: {0}".format(coll.find_one(query)))


## Output ##
Document prior udapte: {'_id': ObjectId('648f33cf03184eea7a245fb4'), 'sepalLength': 5.9, 'sepalWidth': 3.0, 'petalLength': 5.1, 'petalWidth': 1.8, 'species': 'virginica'}
Document after udapte: {'_id': ObjectId('648f33cf03184eea7a245fb4'), 'sepalLength': 5.9, 'sepalWidth': 3.0, 'petalLength': 5.1, 'petalWidth': 11.9, 'species': 'virginica'}        

Delete document

x = coll.delete_one({"sepalLength": 5.4})
print(x.deleted_count, " documents deleted using delete_one()")

## Output ##
1  documents deleted using delete_one()        

Delete all documents

x = coll.delete_many({})  # Empty query means delete all collection
print(x.deleted_count, " documents deleted using delete_many()")

## Output ##
10  documents deleted using delete_many()        

Aggregate Pipeline

An aggregation pipeline consists of one or more stages that process documents:

  • Each stage performs an operation on the input documents. For example, a stage can filter documents, group documents, and calculate values.
  • The documents that are output from a stage are passed to the next stage.
  • An aggregation pipeline can return results for groups of documents. For example, return the total, average, maximum, and minimum values.

e.g. $group $sum $min $max $filter $zip

# Create new collection
mycol2 = db['Orders']

multipleRecords = [
    { "_id": 0, "name": "Pepperoni", "size": "small", "price": 19,
     "quantity": 10, "date": 'ISODate( "2021-03-13T08:14:30Z" )' },
   { "_id": 1, "name": "Pepperoni", "size": "medium", "price": 20,
     "quantity": 20, "date" : 'ISODate( "2021-03-13T09:13:24Z" )' },
   { "_id": 2, "name": "Pepperoni", "size": "large", "price": 21,
     "quantity": 30, "date" : 'ISODate( "2021-03-17T09:22:12Z" )' },
   { "_id": 3, "name": "Cheese", "size": "small", "price": 12,
     "quantity": 15, "date" : 'ISODate( "2021-03-13T11:21:39.736Z" )' },
   { "_id": 4, "name": "Cheese", "size": "medium", "price": 13,
     "quantity":50, "date" : 'ISODate( "2022-01-12T21:23:13.331Z" )' },
   { "_id": 5, "name": "Cheese", "size": "large", "price": 14,
     "quantity": 10, "date" : 'ISODate( "2022-01-12T05:08:13Z" )' },
   { "_id": 6, "name": "Vegan", "size": "small", "price": 17,
     "quantity": 10, "date" : 'ISODate( "2021-01-13T05:08:13Z" )' },
   { "_id": 7, "name": "Vegan", "size": "medium", "price": 18,
     "quantity": 10, "date" : 'ISODate( "2021-01-13T05:10:13Z" )' }
]
status = mycol2.insert_many(multipleRecords)

print(status.inserted_ids)        

Output:

[0, 1, 2, 3, 4, 5, 6, 7]        
# Calculate Total Order Quantity

aggregate = mycol2.aggregate( [

    # Stage 1: Filter pizza order documents by pizza size and pass the output to next in pipeline
    {
        "$match": { "size": "medium" }
    },

    # Stage 2: Group remaining documents by pizza name and calculate total quantity
    {
        "$group": { "_id": "$name", "totalQuantity": { "$sum": "$quantity" } }
    }

] )

for doc in aggregate:
    print(doc)        

Output:

{'_id': 'Pepperoni', 'totalQuantity': 20} 
{'_id': 'Vegan', 'totalQuantity': 10} 
{'_id': 'Cheese', 'totalQuantity': 50}        

Use Case

  • Import data from external source kept in .csv format

# Importing CSV

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/abhijitpaul0212/DataSets/main/AirPassengers.csv")        

  • Convert the data into Lists of JSON format records

import json
mycsvfile = list(json.loads(df.T.to_json()).values())        

  • Push the converted data into MongoDB in a go

# Create new collection
mycol = db['AirPassengers']

# Insert all records
mycol1.insert_many(mycsvfile)        

  • Verify records are inserted

for doc in mycol1.find({}).limit(5:
    print(doc)


{'_id': ObjectId('64eaf250f4c2730326ff5b9c'), 'date': '1949-01-01', 'value': 112}
{'_id': ObjectId('64eaf250f4c2730326ff5b9d'), 'date': '1949-02-01', 'value': 118}
{'_id': ObjectId('64eaf250f4c2730326ff5b9e'), 'date': '1949-03-01', 'value': 132}
{'_id': ObjectId('64eaf250f4c2730326ff5b9f'), 'date': '1949-04-01', 'value': 129}
{'_id': ObjectId('64eaf250f4c2730326ff5ba0'), 'date': '1949-05-01', 'value': 121}
        

#nosql #mongodb #pymongo #knowledgesharing #opensource #communitybuilding #continuouslearning

Like reading it, you can follow this newsletter, I share a post every Friday here.

Thank you so much for reading!

If you want to read my learnings on any specific topic, let me know in the comment box. I will tag you too if I write on that.

xx

Abhijit Paul


要查看或添加评论,请登录

社区洞察

其他会员也浏览了