Extracting and Transforming Tick Data from B3
Introduction
The Brazilian stock exchange, known as B3 (Brasil, Bolsa, Balc?o), is one of the largest and most sophisticated exchanges in the world. It provides a rich source of financial data that can be leveraged for various analytical purposes. Among the types of data available, tick data is particularly valuable for high-frequency trading, algorithmic strategies, and quantitative analysis. This article will guide you through the process of extracting and transforming tick data from B3, ensuring you can effectively utilize this information for your financial models.
Understanding Tick Data
Tick data records every transaction that occurs in the market, including details such as the price, volume, and timestamp of each trade. This granular level of data is essential for:
Steps to Extract Tick Data from B3
# Import necessary libraries
import os
import wget
import zipfile
import pandas as pd
# Define the URL to download the tick data from B3, where 'date' is a variable holding the desired date
url = 'https://arquivos.b3.com.br/apinegocios/tickercsv/' + date
# Define the file name for the downloaded data, which is a ZIP file
file_name = str(date + '_B3_TickData.zip')
# Download the ZIP file from the URL and save it with the defined file name
wget.download(url, file_name)
# Get the directory where the ZIP file is saved
zip_dir = os.path.dirname(file_name)
# Create a ZipFile object to work with the ZIP file
with zipfile.ZipFile(file_name, 'r') as zip_ref:
# Extract all the contents of the ZIP file into the directory
zip_ref.extractall(zip_dir)
# Print a confirmation message after extraction
print('/n')
print("1/8 - Extracted all contents of ", file_name)
# Get the current working directory to locate the extracted files
folder_ref = os.getcwd()
# List all files in the current working directory
files = os.listdir(folder_ref)
# Filter the list to include only the text files with '_NEGOCIOSAVISTA.txt' in their name
files_txt = [i for i in files if i.endswith('_NEGOCIOSAVISTA.txt')]
# Read the first text file in the filtered list into a pandas DataFrame
df = pd.read_csv(files_txt[0], sep=";")
Explanation
Import Libraries: The script begins by importing the necessary libraries, such as os for directory operations, wget for downloading files, zipfile for handling ZIP files, and pandas for data manipulation.
Define URL and File Name: The url variable is constructed using a base URL and the date variable, which represents the date for which tick data is being downloaded. The file_name variable is the name under which the downloaded ZIP file will be saved.
Download ZIP File: The wget.download() function is used to download the ZIP file from the specified URL and save it locally with the defined file name.
Extract ZIP File:
Locate Extracted Files:
Read Data into DataFrame: The first file in the filtered list is read into a pandas DataFrame using pd.read_csv(), with the semicolon (;) as the separator.
This code effectively downloads, extracts, and reads tick data from B3, which can be essential for your quantitative finance analysis and trading strategies. This step-by-step process ensures you can handle the data efficiently, allowing you to focus on building and testing your financial models.
Steps to Transform Tick Data from B3
# Import necessary libraries
import pandas as pd
# Update 'PrecoNegocio' column to replace commas with dots and convert to float
df['PrecoNegocio'] = df.PrecoNegocio.str.replace(",", ".").astype('float')
print('2/8 - PrecoNegocio Updated')
# Fill missing values in 'CodigoParticipanteComprador' and 'CodigoParticipanteVendedor' with 0
# Convert the columns to integer type and then to string type
df[['CodigoParticipanteComprador', 'CodigoParticipanteVendedor']] = df[['CodigoParticipanteComprador', 'CodigoParticipanteVendedor']].fillna(0)
df[['CodigoParticipanteComprador', 'CodigoParticipanteVendedor']] = df[['CodigoParticipanteComprador', 'CodigoParticipanteVendedor']].astype('int').astype('str')
print('3/8 - Codigos Participantes Updated')
# Update 'HoraFechamento' to ensure it is a string and pad with leading zeros to make it 9 characters long
df['HoraFechamento'] = df['HoraFechamento'].astype(str).str.zfill(9)
# Reformat 'HoraFechamento' to the format HH:MM:SS.sss
df['HoraFechamento'] = df['HoraFechamento'].apply(lambda x: f"{x[:2]}:{x[2:4]}:{x[4:6]}.{x[6:9]}")
# Ensure 'HoraFechamento' is a string and convert to datetime.time type
df['HoraFechamento'] = df['HoraFechamento'].astype(str)
df['HoraFechamento'] = pd.to_datetime(df['HoraFechamento'], format='%H:%M:%S.%f').dt.time
print('4/8 - HoraFechamento Updated')
# Create a new index by concatenating 'CodigoInstrumento', 'CodigoIdentificadorNegocio', 'DataReferencia', and 'HoraFechamento'
str1 = df.CodigoInstrumento
str2 = df.CodigoIdentificadorNegocio.astype(str)
str3 = df.DataReferencia.astype(str)
str4 = df.HoraFechamento.astype(str)
newindex = str1 + '_' + str2 + '_' + str3 + '_' + str4
df['Index'] = newindex
# Set the new 'Index' column as the index of the DataFrame
df = df.set_index('Index')
print('5/8 - New_Index Created')
# Remove the specified columns from the DataFrame
df.drop(columns=['AcaoAtualizacao', 'TipoSessaoPregao', 'DataNegocio'], inplace=True)
print('6/8 - Columns Remove Updated')
# Rename the columns using a dictionary to map old names to new names
dicionario = {
'DataReferencia': 'Dia',
'CodigoInstrumento': 'Instrumento',
'PrecoNegocio': 'Preco',
'QuantidadeNegociada': 'Quantidade',
'HoraFechamento': 'Hora',
'CodigoIdentificadorNegocio': 'Cod_Negocio',
'CodigoParticipanteComprador': 'Comprador',
'CodigoParticipanteVendedor': 'Vendedor'
}
df.rename(dicionario, axis=1, inplace=True)
print('7/8 - Columns Rename Updated')
# Reorder the columns in the specified new order
new_order = ['Cod_Negocio', 'Instrumento', 'Dia', 'Hora', 'Preco', 'Quantidade', 'Comprador', 'Vendedor']
df = df[new_order]
print('8/8 - Columns New Order Updated')
# Print completion message
print('Data Extraction and Transformation - Done')
Explanation
Updating ‘PrecoNegocio’ Column:
Updating Participant Codes:
领英推荐
Updating ‘HoraFechamento’ Column:
Creating a New Index:
Removing Unnecessary Columns:
Renaming Columns:
Reordering Columns:
Completion Message:
This transformation process ensures the tick data from B3 is clean, well-structured, and ready for analysis or further processing.