登录查看更多内容

Extracting and Transforming Tick Data from B3

Eurico Paes

发布日期: 2024年7月17日

Introduction

The Brazilian stock exchange, known as B3 (Brasil, Bolsa, Balc?o), is one of the largest and most sophisticated exchanges in the world. It provides a rich source of financial data that can be leveraged for various analytical purposes. Among the types of data available, tick data is particularly valuable for high-frequency trading, algorithmic strategies, and quantitative analysis. This article will guide you through the process of extracting and transforming tick data from B3, ensuring you can effectively utilize this information for your financial models.

Understanding Tick Data

Tick data records every transaction that occurs in the market, including details such as the price, volume, and timestamp of each trade. This granular level of data is essential for:

Analyzing market microstructure
Developing and backtesting trading algorithms
Conducting detailed market research

Steps to Extract Tick Data from B3

# Import necessary libraries
import os
import wget
import zipfile
import pandas as pd

# Define the URL to download the tick data from B3, where 'date' is a variable holding the desired date
url = 'https://arquivos.b3.com.br/apinegocios/tickercsv/' + date

# Define the file name for the downloaded data, which is a ZIP file
file_name = str(date + '_B3_TickData.zip')

# Download the ZIP file from the URL and save it with the defined file name
wget.download(url, file_name)

# Get the directory where the ZIP file is saved
zip_dir = os.path.dirname(file_name)

# Create a ZipFile object to work with the ZIP file
with zipfile.ZipFile(file_name, 'r') as zip_ref:
    # Extract all the contents of the ZIP file into the directory
    zip_ref.extractall(zip_dir)

# Print a confirmation message after extraction
print('/n')
print("1/8 - Extracted all contents of ", file_name)

# Get the current working directory to locate the extracted files
folder_ref = os.getcwd()

# List all files in the current working directory
files = os.listdir(folder_ref)

# Filter the list to include only the text files with '_NEGOCIOSAVISTA.txt' in their name
files_txt = [i for i in files if i.endswith('_NEGOCIOSAVISTA.txt')]

# Read the first text file in the filtered list into a pandas DataFrame
df = pd.read_csv(files_txt[0], sep=";")

Explanation

Import Libraries: The script begins by importing the necessary libraries, such as os for directory operations, wget for downloading files, zipfile for handling ZIP files, and pandas for data manipulation.

Define URL and File Name: The url variable is constructed using a base URL and the date variable, which represents the date for which tick data is being downloaded. The file_name variable is the name under which the downloaded ZIP file will be saved.

Download ZIP File: The wget.download() function is used to download the ZIP file from the specified URL and save it locally with the defined file name.

Extract ZIP File:

The directory of the ZIP file is obtained using os.path.dirname().
A ZipFile object is created to open and manipulate the ZIP file.
The extractall() method is used to extract all contents of the ZIP file into the same directory where the ZIP file is located.
A confirmation message is printed to indicate successful extraction.

Locate Extracted Files:

The current working directory is obtained using os.getcwd().
A list of all files in the current directory is created using os.listdir().
This list is filtered to include only the text files ending with _NEGOCIOSAVISTA.txt.

Read Data into DataFrame: The first file in the filtered list is read into a pandas DataFrame using pd.read_csv(), with the semicolon (;) as the separator.

This code effectively downloads, extracts, and reads tick data from B3, which can be essential for your quantitative finance analysis and trading strategies. This step-by-step process ensures you can handle the data efficiently, allowing you to focus on building and testing your financial models.

Steps to Transform Tick Data from B3

# Import necessary libraries
import pandas as pd

# Update 'PrecoNegocio' column to replace commas with dots and convert to float
df['PrecoNegocio'] = df.PrecoNegocio.str.replace(",", ".").astype('float')
print('2/8 - PrecoNegocio Updated')

# Fill missing values in 'CodigoParticipanteComprador' and 'CodigoParticipanteVendedor' with 0
# Convert the columns to integer type and then to string type
df[['CodigoParticipanteComprador', 'CodigoParticipanteVendedor']] = df[['CodigoParticipanteComprador', 'CodigoParticipanteVendedor']].fillna(0)
df[['CodigoParticipanteComprador', 'CodigoParticipanteVendedor']] = df[['CodigoParticipanteComprador', 'CodigoParticipanteVendedor']].astype('int').astype('str')
print('3/8 - Codigos Participantes Updated')

# Update 'HoraFechamento' to ensure it is a string and pad with leading zeros to make it 9 characters long
df['HoraFechamento'] = df['HoraFechamento'].astype(str).str.zfill(9)
# Reformat 'HoraFechamento' to the format HH:MM:SS.sss
df['HoraFechamento'] = df['HoraFechamento'].apply(lambda x: f"{x[:2]}:{x[2:4]}:{x[4:6]}.{x[6:9]}")
# Ensure 'HoraFechamento' is a string and convert to datetime.time type
df['HoraFechamento'] = df['HoraFechamento'].astype(str)
df['HoraFechamento'] = pd.to_datetime(df['HoraFechamento'], format='%H:%M:%S.%f').dt.time
print('4/8 - HoraFechamento Updated')

# Create a new index by concatenating 'CodigoInstrumento', 'CodigoIdentificadorNegocio', 'DataReferencia', and 'HoraFechamento'
str1 = df.CodigoInstrumento
str2 = df.CodigoIdentificadorNegocio.astype(str)
str3 = df.DataReferencia.astype(str)
str4 = df.HoraFechamento.astype(str)
newindex = str1 + '_' + str2 + '_' + str3 + '_' + str4
df['Index'] = newindex
# Set the new 'Index' column as the index of the DataFrame
df = df.set_index('Index')
print('5/8 - New_Index Created')

# Remove the specified columns from the DataFrame
df.drop(columns=['AcaoAtualizacao', 'TipoSessaoPregao', 'DataNegocio'], inplace=True)
print('6/8 - Columns Remove Updated')

# Rename the columns using a dictionary to map old names to new names
dicionario = {
    'DataReferencia': 'Dia',
    'CodigoInstrumento': 'Instrumento',
    'PrecoNegocio': 'Preco',
    'QuantidadeNegociada': 'Quantidade',
    'HoraFechamento': 'Hora',
    'CodigoIdentificadorNegocio': 'Cod_Negocio',
    'CodigoParticipanteComprador': 'Comprador',
    'CodigoParticipanteVendedor': 'Vendedor'
}
df.rename(dicionario, axis=1, inplace=True)
print('7/8 - Columns Rename Updated')

# Reorder the columns in the specified new order
new_order = ['Cod_Negocio', 'Instrumento', 'Dia', 'Hora', 'Preco', 'Quantidade', 'Comprador', 'Vendedor']
df = df[new_order]
print('8/8 - Columns New Order Updated')

# Print completion message
print('Data Extraction and Transformation - Done')

Explanation

Updating ‘PrecoNegocio’ Column:

Replaces commas with dots in the PrecoNegocio column to conform to the float format.
Converts the column to float type.

Updating Participant Codes:

领英推荐

Temporal similarity search for smarter trading decisions

KX 3 个月前

A game-changing year for BMLL!

BMLL 2 周前

Best Amibroker Datafeeds for Realtime & Tick Data

Global Datafeeds 4 个月前

Fills missing values in CodigoParticipanteComprador and CodigoParticipanteVendedor columns with 0.
Converts these columns to integer type and then to string type.

Updating ‘HoraFechamento’ Column:

Ensures the HoraFechamento column is a string and pads it with leading zeros to make it 9 characters long.
Reformats the column to the format HH:MM.sss and converts it to datetime.time type.

Creating a New Index:

Concatenates CodigoInstrumento, CodigoIdentificadorNegocio, DataReferencia, and HoraFechamento columns to create a new index.
Sets this new index as the index of the DataFrame.

Removing Unnecessary Columns:

Drops the AcaoAtualizacao, TipoSessaoPregao, and DataNegocio columns from the DataFrame.

Renaming Columns:

Renames the columns using a dictionary that maps old column names to new, more meaningful names.

Reordering Columns:

Reorders the columns in the specified order to ensure a logical and consistent structure.

Completion Message:

Prints a completion message to indicate that the data extraction and transformation process is done.

This transformation process ensures the tick data from B3 is clean, well-structured, and ready for analysis or further processing.

要查看或添加评论，请登录

Eurico Paes的更多文章

Building a Python Application to Automate Basket Orders in MetaTrader 5 with MQL5

2024年12月5日

Building a Python Application to Automate Basket Orders in MetaTrader 5 with MQL5

Introduction Automating trading workflows can save time, reduce errors, and streamline operations. This article…
Minha Casa Minha Vida: Oportunidades e Desafios para Reduzir o Déficit Habitacional no Brasil

2024年11月28日

Minha Casa Minha Vida: Oportunidades e Desafios para Reduzir o Déficit Habitacional no Brasil

O déficit habitacional é um dos desafios mais urgentes do Brasil. Estima-se que milh?es de famílias vivam sem acesso a…
Building a Stock Trading Strategy Visualizer App in Python

2024年11月21日

Building a Stock Trading Strategy Visualizer App in Python

Algorithmic trading has gained immense popularity, and creating a user-friendly application to visualize trading…
Mastering Algorithmic Trading: A Beginner’s Guide with Python

2024年11月20日

Mastering Algorithmic Trading: A Beginner’s Guide with Python

Algorithmic trading is reshaping the financial world by automating trades based on pre-defined strategies. In this…
Enhancing NinjaTrader Strategies: Integrating C# with Python for Advanced Trading Automation

2024年11月12日

Enhancing NinjaTrader Strategies: Integrating C# with Python for Advanced Trading Automation

In today’s world of algorithmic trading, leveraging multiple programming languages within a single workflow can provide…
Streaming Real-Time Market Data with Interactive Brokers TWS API in Python: A Practical Guide

2024年10月1日

Streaming Real-Time Market Data with Interactive Brokers TWS API in Python: A Practical Guide

In the world of quantitative finance and algorithmic trading, access to real-time market data is essential. Whether…
Building a Real-Time Binance BTC/USDT Spot and Futures Data Dashboard using Tkinter and WebSockets

2024年9月29日

Building a Real-Time Binance BTC/USDT Spot and Futures Data Dashboard using Tkinter and WebSockets

When working with financial applications, it’s critical to have a user interface that updates in real-time with market…
How to Update a Branch with Master on GitHub

2024年7月15日

How to Update a Branch with Master on GitHub

Keeping your branches up-to-date with the latest changes from the master (or main) branch is essential for a smooth…
Extracting Data from Polygon.io

2024年7月10日

Extracting Data from Polygon.io

Getting Historical Data from Polygon.io Accessing historical market data is essential for various financial analyses…
Extracting Data from Yahoo Finance with yFinance

2024年7月7日

Extracting Data from Yahoo Finance with yFinance

The financial markets are driven by data, and accessing this data efficiently is crucial for analysts, traders, and…

1 条评论

See all articles

Extracting and Transforming Tick Data from B3

Eurico Paes

Introduction

Understanding Tick Data

Steps to Extract Tick Data from B3

Explanation

Steps to Transform Tick Data from B3

Explanation

领英推荐

Eurico Paes的更多文章

社区洞察

其他会员也浏览了

HOW TO USE DATA ANALYTICS TO BENEFIT FROM FIBONACCI RETRACEMENT LEVELS?

Axoni Quarterly Digest: Impact of Bad Data, Post-Trade Reconciliation Challenges & More

Decoding Time Series Analysis: Financial Market Predictions

Ironbeam Platform Update: Market-By-Order (MBO) Data Now Available

InFocus - August 2022 - Data Science in Finance, Managing Equity Volatility..

CTRM Center in Review - 26 April 2024

Generate ALPHA across all asset classes

Working with high-frequency market data: Data integrity and cleaning (Part 1)

Speedy DolphinDB – Why is DolphinDB So Fast?

Stock Price Forecasting of PDD Using TimesNet, ARIMA, Transformer, and GARCH

Introduction

Understanding Tick Data

Steps to Extract Tick Data from B3

Explanation

Steps to Transform Tick Data from B3

Explanation

领英推荐

Eurico Paes的更多文章

Building a Python Application to Automate Basket Orders in MetaTrader 5 with MQL5

Minha Casa Minha Vida: Oportunidades e Desafios para Reduzir o Déficit Habitacional no Brasil

Building a Stock Trading Strategy Visualizer App in Python

Mastering Algorithmic Trading: A Beginner’s Guide with Python

Enhancing NinjaTrader Strategies: Integrating C# with Python for Advanced Trading Automation

Streaming Real-Time Market Data with Interactive Brokers TWS API in Python: A Practical Guide

Building a Real-Time Binance BTC/USDT Spot and Futures Data Dashboard using Tkinter and WebSockets

How to Update a Branch with Master on GitHub

Extracting Data from Polygon.io

Extracting Data from Yahoo Finance with yFinance

社区洞察

其他会员也浏览了

HOW TO USE DATA ANALYTICS TO BENEFIT FROM FIBONACCI RETRACEMENT LEVELS?

Axoni Quarterly Digest: Impact of Bad Data, Post-Trade Reconciliation Challenges & More

Decoding Time Series Analysis: Financial Market Predictions

Ironbeam Platform Update: Market-By-Order (MBO) Data Now Available

InFocus - August 2022 - Data Science in Finance, Managing Equity Volatility..

CTRM Center in Review - 26 April 2024

Generate ALPHA across all asset classes

Working with high-frequency market data: Data integrity and cleaning (Part 1)

Speedy DolphinDB – Why is DolphinDB So Fast?

Stock Price Forecasting of PDD Using TimesNet, ARIMA, Transformer, and GARCH