登录查看更多内容

Part1 Trading System with pyspark

Reginaldo Melo

Data Engineer | Azure | Python | SQL | Databricks | Spark | pyspark | Data Analyst | Power BI | Cognos

发布日期: 2022年7月25日

+ 关注

Hi folks,

And you see MGLU3 soaring like a rocket fifty percent in just two weeks and you think:

Oh my, I missed that! Seriously? #SQN

Can you see?

Disclaimer: I'm not a securities analyst. The intention is educational in manipulating data with pyspark.

So, lets take a look into B3 database for answers.

Wow! It's not very easy folks! You've to be patient here.

"Posi??es em Aberto de Empréstimo de Ativos" Baixar Arquivo LendingOpenPositionFile_20220705_1.csv

I'll take 20220705 until 20220721

Can you see that? csv file separated with semicolon and values with comma?

Talk is cheap, lets code now!

In https://colab.research.google.com/

1) Install pyspark like this !pip?install?pyspark

2) SparkSession - from?pyspark.sql?import?SparkSession

领英推荐

Loading Data into Snowflake using Snowpark DataFrames

Factspan 8 个月前

Vector Embeddings and Fuzzy Matching with SQL

Richard Conway 1 个月前

Five VScode Extensions for Working with Data

Rami Krispin 1 个月前

from?pyspark.sql?import?SparkSession

3) spark variable

spark=SparkSession.builder.appName("load?csv").master("local[*]").getOrCreate()

4) create folder lend manually?and upload the csv files

5) let's code in three steps: 1) load all these files at once; 2) data transformation; 3) show

Wow! This shows us something! On July 5th we had R$ 493,754,632.26 in sales. On July 21 we already had 641,923,213.96. Is a 30% increase even if this sudden is consistent?

In the market there are traders who bet on the fall while others on the rise. If the stock really goes up, it will drag all the shorts and puts to close the trade.

Will we have a short squeeze soon?

In the next we learning to write this dataframe in just one parquet file and others things.

So, anybody can ask where is the borrower rate? In another dataframe.

Next, Part2 we explore this in more details. See you soon folks. Enjoy the code.

!pip install pyspark

!pyspark --version


from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("load csv").master("local[*]").getOrCreate()


# 1 read all files Lending in folder lend
import pyspark.sql.utils
try:
? ? df1 = spark.read.csv("lend/*.csv", sep=";", header="True")
except pyspark.sql.utils.AnalysisException:
? ? print("Verify lend folder, could be empty")


# 2 transformation drop, rename, format
import pyspark.sql.functions as F


df1 = df1.drop(df1.ISIN).drop(df1.Asst).drop(df1.PricFctr)
df1 = ( df1.withColumnRenamed("RptDt", "Date")
? ? ? ? ? ?.withColumnRenamed("TckrSymb", "Stock")
? ? ? ? ? ?.withColumnRenamed("BalQty", "Qty")
? ? ? ? ? ?.withColumnRenamed("TradAvrgPric","AvgPrice")
? ? ? ? ? ?.withColumnRenamed("BalVal", "Volume")
? ? ? )? ? ?
df1 = df1.withColumn('AvgPrice', F.regexp_replace('AvgPrice',',', '.')).withColumn('Volume', F.regexp_replace('Volume',',', '.'))
df2 = df1.orderBy('Stock', F.desc('Date'))


# 3 show the results
df2.filter(df2.Stock == "MGLU3").show(30, False)

links

https://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/boletim-diario/dados-publicos-de-produtos-listados-e-de-balcao/
https://colab.research.google.com/

Part1 Trading System with pyspark

Reginaldo Melo

Data Engineer | Azure | Python | SQL | Databricks | Spark | pyspark | Data Analyst | Power BI | Cognos

Talk is cheap, lets code now!

领英推荐

Will we have a short squeeze soon?

In the next we learning to write this dataframe in just one parquet file and others things.

So, anybody can ask where is the borrower rate? In another dataframe.

社区洞察

其他会员也浏览了

Episode #131: Key insights and best practices from writing SQL for 15+ years with Ergest Xheblati

Technical Analysis of the latest UK House Price Index, Deploying Modern tools

When Data Engineers Trade: A Modern Stack for Real-time Market Sentiment Analysis and Automated Trading : Part 3 of 5

Week of May 13th

Apache Spark 101: Window Functions

Snowflake Materialized View Query Auto-Rewrite

Lakes, Lakehouses, Warehouse and.....MDM?

Is DuckDB Useful For Parsing JSON? Yes, Definitely.

RDD vs Dataframe vs Dataset

Text to SQL by Leveraging Large Language Models (LLM)