Experimenting with GraphQL, Pandas, and HVAC data
I've been intrigued by the potential of diving into the metadata strategies outlined in ASHRAE 223P, which delves into the intricacies of Brick and Haystack. One notable challenge lies in the often-messy nature of data extracted from building automation systems (BAS). The BAS, typically overseeing HVAC operations within a building, tend to employ diverse naming conventions. For instance, during the setup phase by technology contractors, each HVAC system, down to individual devices and components, the naming conventions used may exhibit significant variability from one building to another.
This variability poses hurdles for data analysis and integration efforts, particularly in the realm of IoT or building commissioning in general. The process can be painstakingly slow, often necessitating human intervention to manually map each data point and device to a coherent naming convention. This is precisely the issue that initiatives like Brick and Haystack aim to address through the establishment of a standardized data model.
I recently came across "GraphQL," a query language that, in my initial exploration, appears similar to SQL and is utilized for querying datasets. Originating in 2012 at Facebook, GraphQL enables precise querying of data through an API. For instance, when querying the OpenWeatherMap API for weather forecast data using a script, the API's response can be extensive, containing a plethora of potentially irrelevant data. This highlights one of GraphQL's strengths: its ability to craft specific queries to retrieve only the desired information, thereby avoiding unnecessary data overload.
In my endeavor to experiment with GraphQL, I've amassed approximately six months' worth of data for an air handling unit (AHU), stored in CSV format with timestamps recorded at five-minute intervals. What quite common in data analysis is the Python Pandas library and I am also using Strawberry for Graph with mapping unique point names to some Brick formatish... Like this below where the brick_mapping keys are the unique building point names and the values for the Python dictionary are the Brick formatted mapped names.
# Load the data
df = pd.read_csv("data/ahu_data.csv", parse_dates=["Date"])
# Define Brick Schema mappings
brick_mappings = {
"SF Spd%": "Motor_Speed_Sensor",
"Cooling Valve": "Chilled_Water_Valve",
"Heating Valve": "Heating_Valve",
"RA Damper": "Mixed_Damper",
"TotZnFlow": "Supply_Air_Flow_Sensor",
"Duct Static Press": "Supply_Air_Static_Pressure_Sensor",
"SA StcPresSp": "Supply_Air_Static_Pressure_Setpoint",
"SA Temp": "Supply_Air_Temperature_Sensor",
"SAT Setpoint": "Supply_Air_Temperature_Setpoint",
"MA Temp": "Mixed_Air_Temperature_Sensor",
"Outdoor Air Temp": "Outside_Air_Temperature_Sensor",
"RA Temp": "Return_Air_Temperature_Sensor",
}
In the realm of building commissioning, rapid data analysis proves invaluable. Utilizing a dataset spanning six months for a building located in the upper Midwest, I observe significant outdoor temperature fluctuations. For instance, August records indicate elevated outdoor temperatures, while my primary focus lies on a recent cold snap experienced in January. This insight is swiftly gleaned through a chart generated within a Dash app.
Getting creative in Graph and using the power of Pandas a query on the CSV file dataset can look something like this below (apologize for the formatting) which is used by the Dash app to retrieve the data to display in the chart.
# Daily average
@strawberry.field(name="averageByDay")
def averageByDay(
self,
sensorName: str, fanRunning: bool = False
) -> List[DailyAverage]:
print(" averageByDay Query fanRunning: ", fanRunning)
column = get_column_from_brick(sensorName)
motor_speed_column = get_column_from_brick("Motor_Speed_Sensor")
if column and column in df.columns:
# Create a filtered DataFrame based on fanRunning status
filtered_df = df[df[motor_speed_column] > 5.0] if fanRunning else df.copy()
# Remove rows with NaN values in the specified column
# and motor speed column if fanRunning is True
filtered_df = filtered_df.dropna(
subset=[column] + ([motor_speed_column] if fanRunning else [])
)
# Resample data daily and calculate the mean
daily_data = (
filtered_df.set_index("Date").resample("D")[column].mean().reset_index()
)
daily_data["Date"] = daily_data["Date"].dt.strftime("%Y-%m-%d")
return [
DailyAverage(
day=row["Date"],
average=(
row[column] if not pd.isna(row[column]) else NAN_PLACEHOLDER
),
)
for index, row in daily_data.iterrows()
]
else:
return []
This function, averageByDay, computes the daily average of a specified sensor's data. It takes parameters such as the sensor name and an optional boolean flag indicating whether the AHU fan is running. The function filters the DataFrame based on the fan's operational status and removes any NaN values. Subsequently, it resamples the data on a daily basis and calculates the mean. Finally, it returns a list of DailyAverage objects containing the date and corresponding average sensor reading for each day.
The Dash app interface allows users to select specific data points in Brick format for plotting. Users also have the option to filter data based on the fan's operational status. Additionally, they can allocate different data points to either the left or right axis, facilitating the plotting of data with varying units. For instance, users may choose to display temperature sensor data on the left axis and fan static pressure on the right, preventing conflicting units from resulting in nonsensical charts.
Examining the AHU duct static pressure via a Graph query for hourly averaged data reveals a noticeable change in AHU operations for an unidentified reason. Specifically, during the cold snap, it's evident that the AHU duct static pressure reset strategy was disabled. Instances where the duct static pressure drops to zero denote periods when the AHU is off during building unoccupancy, indicating that the fan isn't generating static pressure in the duct system. Meanwhile, the sharp lines observed indicate effective responsiveness of the trim and respond (T&R) programming.
Dash offers a plethora of features, including integration with the Plotly JavaScript library, enabling rapid chart creation and even allowing users to download plots for reports with a simple click. Leveraging these functionalities and add in a few more lines to the plots, I can pinpoint the cold snap within the data. Subsequently, I can utilize Dash's capabilities to zoom into this specific period, focusing on the solid week of January characterized by cold weather.
It's striking to note the red lines (plot below) representing the mixing air temperature of the AHU, which has the potential to trigger a low temperature freeze protection alarm. Given this concern, it's understandable why building operators might opt to keep the equipment running continuously during the cold snap, prioritizing the prevention of any callbacks or damage to coils due to freezing. Additionally, certain data points indicate that during AHU startup, there are brief instances where temperatures plummet dangerously low, raising further cause for attention.
领英推荐
Moreover, the behavior observed in a variable volume AHU, wherein the discharge air temperature appears to be heating the building to over 100 degrees Fahrenheit, is indeed peculiar. Ordinarily, in a heating mode, reheat coils are responsible for heating the building, while the AHU typically only tempers the air to around 65 degrees Fahrenheit. This anomaly suggests more issues worth investigation to ensure proper functionality and efficiency.
The strength of utilizing GraphQL in my specific use case, especially in conjunction with Pandas, lies in its ability to fetch data from a source and seamlessly integrate it into Pandas for customized data processing. The Graph query I'm utilizing follows a distinct format, designed to retrieve exclusively the specific data pertinent to our use case, leaving out any extraneous information.
query_outside_air_temp_weekly = (
'{ averageByWeek(sensorName: "Outside_Air_Temperature_Sensor") { week average } }'
)
Similar to the code demonstrated above for computing various averages such as MonthlyAverage, WeeklyAverage, DailyAverage, HourlyAverage, and RawData, GraphQL facilitates the creation of filterable queries, enabling selective data retrieval based on the operational status of the fan.
Moreover, one additional GraphQL query DailyMotorRunTime is calculating the daily averaged motor run time. This functionality proves invaluable, particularly in projects where energy conservation is paramount. By accurately assessing whether equipment remains overridden to run continuously, insights into energy consumption, particularly concerning fan motors, can be gained, thus facilitating informed decisions aimed at optimizing energy usage and achieving long-term savings.
@strawberry.type
class MonthlyAverage:
month: str
average: float
@strawberry.type
class WeeklyAverage:
week: str
average: float
@strawberry.type
class DailyAverage:
day: str
average: float
@strawberry.type
class HourlyAverage:
hour: str
average: float
@strawberry.type
class RawData:
timestamp: str
value: float
@strawberry.type
class DailyMotorRunTime:
day: str
run_time_hours: float
A plot in Dash for a motor run time plot in hours per day looks like this below which is some additional useful ammo for your report on why the energy bills maybe increasing.
And the graph query looks like this below where the code calculates the daily motor run time based on the fan VFD speed in percent greater than a value of 10%. It first retrieves the column corresponding to motor speed from the dataset. Then, it creates a copy of the dataset and sets the date column as the index. The code determines when the motor is on by checking if the motor speed is greater than a threshold value.
Next, it computes the time difference between consecutive rows, assuming consistent time intervals. Using this information, it calculates the hours of motor runtime by multiplying the time difference by the motor's operational status (on or off). It then aggregates the runtime data on a daily basis and sums the hours of motor operation. Finally, it formats the results into a list of DailyMotorRunTime objects, containing the date and total runtime hours for each day.
@strawberry.field(name="dailyMotorRunTime")
def daily_motor_run_time(self, sensorName: str) -> List[DailyMotorRunTime]:
motor_speed_column = get_column_from_brick("Motor_Speed_Sensor")
df_copy = df.copy().set_index("Date", inplace=False)
if motor_speed_column in df_copy.columns:
# Determine when the motor is on (speed > 0.1)
motor_on = df_copy[motor_speed_column] > 0.1
# Calculate the time difference between consecutive rows assuming consistent time intervals
delta = df_copy.index.to_series().diff().fillna(pd.Timedelta(seconds=0))
# Calculate hours of motor runtime
df_copy["running_hours"] = (
delta.apply(lambda x: x.total_seconds() / 3600) * motor_on
)
# Group by day and sum the running hours
daily_motor_runtime = df_copy["running_hours"].resample("D").sum()
# Reset the index to turn the dates back into a column
daily_motor_runtime = daily_motor_runtime.reset_index()
# Rename columns for clarity
daily_motor_runtime.columns = ["day", "total_runtime_hours"]
# Convert to list of DailyMotorRunTime
return [
DailyMotorRunTime(
day=row["day"].strftime("%Y-%m-%d"),
run_time_hours=row["total_runtime_hours"],
)
for index, row in daily_motor_runtime.iterrows()
]
else:
return []
There's considerable potential in leveraging GraphQL for data analysis and commissioning endeavors within HVAC and smart building initiatives. However, a key challenge I'm grappling with is determining whether my utilization of the Brick schema for metadata aligns appropriately. I think the industry tends to witness each smart building IoT vendor adopting their own interpretation of metadata, leading to potential discrepancies in how it's employed.
While I'm currently applying my own interpretation, (still learning...) envisioning a standardized approach to smart building metadata across the industry holds promise. Imagine a future where each building's metadata is seamlessly integrated into platforms like BAS, eliminating the need for manual intervention and potential errors. Therefore, it's prudent to stay vigilant regarding ASHRAE 223P efforts, ensuring both proper implementation of standards and identifying instances of improper implementations within the industry. Thanks for reading! This sure beats RCx data analysis in Excel!! Here's a Link to the code used in the writeup.
Talks about #smart buildings, #ontologies, #energyEfficiency, and #MSI
1 年Adalberto Guerra Cabrera