The Pymongo Playbook
Abhijit Paul
SDET-II @ Trading Technologies | Convolutional Neural Networks, Core ML
A guide on how to do the basic set up and connect to a MongoDB Atlas server. I will also try to explain how to use PyMongo (Python) to connect to it and perform CRUD operations in MongoDB using NoSQL.
NoSQL
NoSQL, also referred to as “not only SQL”, “non-SQL”, is an approach to database design that enables the storage and querying of data outside the traditional structures found in relational databases in schema-less JSON format.
Instead of the typical tabular structure of a relational database, NoSQL databases, house data within one data structure, such as JSON document.
NoSQL is also type of distributed database, which means that information is copied and stored on various servers, which can be remote or local. This ensures availability and reliability of data. If some of the data goes offline, the rest of the database can continue to run.
Document Based DB → e.g. MongoDB
Key-Value based DB -> e.g. Amazon DynamoDB
Graph Based DB → e.g Neo4j
Column Based DB → e.g. Cassandra, Hbase
MongoDB
MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas supporting CRUD operations.
C — Create DB | Collection | Document
R— Read DB | Collection | Document
U — Update DB | Collection | Document
D— Delete DB | Collection | Document
MongoDB Atlas
MongoDB with Python
PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python
Requirements.txt:
pymongo
pymongo[srv]
Create a new client and connect to the server
from pymongo import MongoClient
uri = "mongodb+srv://<username>:<password>@cluster0.k3s4vuf.mongodb.net/?retryWrites=true&w=majority"
# Create a new client and connect to the server
client = MongoClient(uri)
print(client)
## Output ##
MongoClient(host=['ac-edosefl-shard-00-02.k3s4vuf.mongodb.net:27017', 'ac-edosefl-shard-00-01.k3s4vuf.mongodb.net:27017', 'ac-edosefl-shard-00-00.k3s4vuf.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, retrywrites=True, w='majority', authsource='admin', replicaset='atlas-t2x2vx-shard-0', tls=True)
Send a ping to confirm a successful connection
status = client.admin.command('ping')
print("Response: {}".format(status))Response: {'ok': 1}
Create a new Database
db = client.kaggle_iris
print("Database name is {0}".format(db.name))Database name is kaggle_iris
Print list of Database names
print("List of databases in MongoDB Cluster: {0}".format(client.list_database_names()))
## Output ##
List of databases in MongoDB Cluster: ['forum', 'kaggle_iris', 'mongotest1', 'mongotest2', 'sample_airbnb', 'sample_analytics', 'sample_geospatial', 'sample_guides', 'sample_mflix', 'sample_restaurants', 'sample_supplies', 'sample_training', 'sample_weatherdata', 'admin', 'local']
Create a new Collection & Print list of Collections
coll = db.iris
print("Collection name is {0}".format(coll.name))
print("List of collections in Database {0}: {1}".format(db.name, db.list_collection_names()))
## Output ##
Collection name is iris
List of collections in Database kaggle_iris: ['iris']
Datasets | Documents in JSON format
doc = {"sepalLength": 5.5, "sepalWidth": 2.4, "petalLength": 3.7, "petalWidth": 1.0, "species": "versicolor"}
doc_list = [
{"sepalLength": 5.1, "sepalWidth": 3.5, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 4.9, "sepalWidth": 3.0, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 4.7, "sepalWidth": 3.2, "petalLength": 1.3, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 4.6, "sepalWidth": 3.1, "petalLength": 1.5, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 5.0, "sepalWidth": 3.6, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 5.4, "sepalWidth": 3.9, "petalLength": 1.7, "petalWidth": 0.4, "species": "setosa"},
{"sepalLength": 5.8, "sepalWidth": 2.6, "petalLength": 4.0, "petalWidth": 1.2, "species": "versicolor"},
{"sepalLength": 5.0, "sepalWidth": 3.4, "petalLength": 1.5, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 4.4, "sepalWidth": 2.9, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
{"sepalLength": 5.9, "sepalWidth": 3.0, "petalLength": 5.1, "petalWidth": 1.8, "species": "virginica"}]
CRUD Operations
i.e create, read, update, and delete
Insert one document
领英推荐
coll.insert_one(doc)
## Output ##
<pymongo.results.InsertOneResult at 0x7f8240a9e520>
Insert more than one document
coll.insert_many(doc_list)
## Output ##
<pymongo.results.InsertManyResult at 0x7f8230bc7fa0>
Find all documents in collection
for x in coll.find():
print(x)
## Output ##
{'_id': ObjectId('648f33cf03184eea7a245faa'), 'sepalLength': 5.5, 'sepalWidth': 2.4, 'petalLength': 3.7, 'petalWidth': 1.0, 'species': 'versicolor'}
{'_id': ObjectId('648f33cf03184eea7a245fab'), 'sepalLength': 5.1, 'sepalWidth': 3.5, 'petalLength': 1.4, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fac'), 'sepalLength': 4.9, 'sepalWidth': 3.0, 'petalLength': 1.4, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fad'), 'sepalLength': 4.7, 'sepalWidth': 3.2, 'petalLength': 1.3, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fae'), 'sepalLength': 4.6, 'sepalWidth': 3.1, 'petalLength': 1.5, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245faf'), 'sepalLength': 5.0, 'sepalWidth': 3.6, 'petalLength': 1.4, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fb0'), 'sepalLength': 5.4, 'sepalWidth': 3.9, 'petalLength': 1.7, 'petalWidth': 0.4, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fb1'), 'sepalLength': 5.8, 'sepalWidth': 2.6, 'petalLength': 4.0, 'petalWidth': 1.2, 'species': 'versicolor'}
{'_id': ObjectId('648f33cf03184eea7a245fb2'), 'sepalLength': 5.0, 'sepalWidth': 3.4, 'petalLength': 1.5, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fb3'), 'sepalLength': 4.4, 'sepalWidth': 2.9, 'petalLength': 1.4, 'petalWidth': 0.2, 'species': 'setosa'}
{'_id': ObjectId('648f33cf03184eea7a245fb4'), 'sepalLength': 5.9, 'sepalWidth': 3.0, 'petalLength': 5.1, 'petalWidth': 1.8, 'species': 'virginica'}
Find only first documents in collection
print(coll.find_one())
## Output ##
{'_id': ObjectId('648f33cf03184eea7a245faa'), 'sepalLength': 5.5, 'sepalWidth': 2.4, 'petalLength': 3.7, 'petalWidth': 1.0, 'species': 'versicolor'}
Update document
query = {"sepalLength": 5.9}
newvalues = {"$set": {"petalWidth": 11.9}}
print("Document prior udapte: {0}".format(coll.find_one(query)))
coll.update_one(query, newvalues)
print("Document after udapte: {0}".format(coll.find_one(query)))
## Output ##
Document prior udapte: {'_id': ObjectId('648f33cf03184eea7a245fb4'), 'sepalLength': 5.9, 'sepalWidth': 3.0, 'petalLength': 5.1, 'petalWidth': 1.8, 'species': 'virginica'}
Document after udapte: {'_id': ObjectId('648f33cf03184eea7a245fb4'), 'sepalLength': 5.9, 'sepalWidth': 3.0, 'petalLength': 5.1, 'petalWidth': 11.9, 'species': 'virginica'}
Delete document
x = coll.delete_one({"sepalLength": 5.4})
print(x.deleted_count, " documents deleted using delete_one()")
## Output ##
1 documents deleted using delete_one()
Delete all documents
x = coll.delete_many({}) # Empty query means delete all collection
print(x.deleted_count, " documents deleted using delete_many()")
## Output ##
10 documents deleted using delete_many()
Aggregate Pipeline
An aggregation pipeline consists of one or more stages that process documents:
e.g. $group $sum $min $max $filter $zip
# Create new collection
mycol2 = db['Orders']
multipleRecords = [
{ "_id": 0, "name": "Pepperoni", "size": "small", "price": 19,
"quantity": 10, "date": 'ISODate( "2021-03-13T08:14:30Z" )' },
{ "_id": 1, "name": "Pepperoni", "size": "medium", "price": 20,
"quantity": 20, "date" : 'ISODate( "2021-03-13T09:13:24Z" )' },
{ "_id": 2, "name": "Pepperoni", "size": "large", "price": 21,
"quantity": 30, "date" : 'ISODate( "2021-03-17T09:22:12Z" )' },
{ "_id": 3, "name": "Cheese", "size": "small", "price": 12,
"quantity": 15, "date" : 'ISODate( "2021-03-13T11:21:39.736Z" )' },
{ "_id": 4, "name": "Cheese", "size": "medium", "price": 13,
"quantity":50, "date" : 'ISODate( "2022-01-12T21:23:13.331Z" )' },
{ "_id": 5, "name": "Cheese", "size": "large", "price": 14,
"quantity": 10, "date" : 'ISODate( "2022-01-12T05:08:13Z" )' },
{ "_id": 6, "name": "Vegan", "size": "small", "price": 17,
"quantity": 10, "date" : 'ISODate( "2021-01-13T05:08:13Z" )' },
{ "_id": 7, "name": "Vegan", "size": "medium", "price": 18,
"quantity": 10, "date" : 'ISODate( "2021-01-13T05:10:13Z" )' }
]
status = mycol2.insert_many(multipleRecords)
print(status.inserted_ids)
Output:
[0, 1, 2, 3, 4, 5, 6, 7]
# Calculate Total Order Quantity
aggregate = mycol2.aggregate( [
# Stage 1: Filter pizza order documents by pizza size and pass the output to next in pipeline
{
"$match": { "size": "medium" }
},
# Stage 2: Group remaining documents by pizza name and calculate total quantity
{
"$group": { "_id": "$name", "totalQuantity": { "$sum": "$quantity" } }
}
] )
for doc in aggregate:
print(doc)
Output:
{'_id': 'Pepperoni', 'totalQuantity': 20}
{'_id': 'Vegan', 'totalQuantity': 10}
{'_id': 'Cheese', 'totalQuantity': 50}
Use Case
# Importing CSV
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/abhijitpaul0212/DataSets/main/AirPassengers.csv")
import json
mycsvfile = list(json.loads(df.T.to_json()).values())
# Create new collection
mycol = db['AirPassengers']
# Insert all records
mycol1.insert_many(mycsvfile)
for doc in mycol1.find({}).limit(5:
print(doc)
{'_id': ObjectId('64eaf250f4c2730326ff5b9c'), 'date': '1949-01-01', 'value': 112}
{'_id': ObjectId('64eaf250f4c2730326ff5b9d'), 'date': '1949-02-01', 'value': 118}
{'_id': ObjectId('64eaf250f4c2730326ff5b9e'), 'date': '1949-03-01', 'value': 132}
{'_id': ObjectId('64eaf250f4c2730326ff5b9f'), 'date': '1949-04-01', 'value': 129}
{'_id': ObjectId('64eaf250f4c2730326ff5ba0'), 'date': '1949-05-01', 'value': 121}
Like reading it, you can follow this newsletter, I share a post every Friday here.
Thank you so much for reading!
If you want to read my learnings on any specific topic, let me know in the comment box. I will tag you too if I write on that.
xx