Short Tutorial on How to Setup a Smart DAQ to upload LIVE Experimental Data to a DOI Data Repository

Short Tutorial on How to Setup a Smart DAQ to upload LIVE Experimental Data to a DOI Data Repository

The other day I found a new way of doing scientific research science spite research data repositories exist for quite some time now. And this was not by chance, after one year of upgrading my hardware electronics design skills, and now have more than 20 PCB design projects available on GitHub ready to help me do real LIVE & publishable science.

(read May 2023 edit at the bottom of this article)

_________________________________________________________________________

this article is being used to write a MethodX paper suitable for publication in a peer-reviewed journal. Updates to this article will continue as I write the paper in parallel Anyone who finds this article interesting can find an early draft revision of the paper document at 爱思唯尔 's SSRN Preprint server. Here's the link.

_________________________________________________________________________

The novelty to me is the possibility of programming some firmware piece of code into any of my custom designed, prototyped, and built smart data acquisition devices, named #LDAD (Live Data Acquisition Device) to upload automatically to a scientific data repository, all experimental data being collected on any of my experimental setups. For instance 美国哈佛大学 's #Dataverse.

Moreover, adding to this huge advantage in designing an experiment, is possible to upload #LIVE data to such data repositories. This means I can finally do scientific collaboration and cooperation with anyone working on the same subject and on any country on the planet. I'm no longer bound to limitations of proximity, language, or secrecy. The advantages do not end here, this way of doing experimental data collection gives a researcher the freedom, and time, by minimizing the need for schedule synchronization among fellow researchers collaborating, in particular, useless remote meetings over 微软 #Teams or #Zoom or even other, now redundant, socializing tasks.

List of data repositories with an API

One mandatory requirement is for the reader to be knowledgeable about how to work in open science environments, in particular how to use Open Data, Open Source, and Open Hardware electronics. There's a short introductory video on YouTube about "Open Data & Data Sharing Platforms" in science by Shady Attia 阿蒂亚 , a prof. at the Université de Liège , which I recommend watching before reading further. Another requirement is for the reader to be experienced in how to correct errors, for instance by the level of severity, on his/her LIVE public scientific research. An open AGILE way of doing science, with constant updates and feedback made publicly and in real-time. These are mandatory requirements for all looking to do science in a live and real-time manner and take advantage of all that live open environments have to offer. In regards to safety & health in science. In regards to Intellectual property. In regards to remote collaboration and cooperation among scientific researchers.

For this short tutorial, I'll be using 美国哈佛大学 's #Dataverse #API with a simple hack. Anyone can find the API documentation here. The piece of API of interest is this one, replacement of dataset file on a repository:

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24

curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F '[email protected]' -F 'jsonData={json}' $SERVER_URL/api/files/$ID/replace
        

My 12/24bit LDAD DAQ devices use Tensilica LX7 dual-core MCUs form Expressif and are compatible with Arduino Studio. Below is the piece of code to send locally collected sensor data, as a file, to a dataverse repository using their API:


#include <SPI.h>
#include <string.h>
#include “utility/debug.h”
#include<stdlib.h>
#include <SD.h>

// WiFi network (change with your settings !)
#define WLAN_SSID “mywirelessnetwork”
#define WLAN_PASS “mypassword”
# define WLAN_SECURITY WLAN_SEC_WPA2
WiFiClient client;

int i = 0;
int keyIndex = 1;
int port = 80; //Server Port
uint32_t MANip = 0;

void setup() {
  Serial.begin(115200);
  // Initialise the module
? Serial.println(F("\nInitializing…"));

  // Connect to WiFi network
 ?// Next 2 line seem to be needed to connect to wifi after Wake up
? WiFi.disconnect();
? WiFi.mode(WIFI_OFF);
? yield();
? /////////////////////////
? // Connect to WiFi network
? WiFi.mode(WLAN_SECURITY);
? WiFi.begin(WLAN_SSID, WLAN_PASS );
 
  //Get MAN IP address
  getMANip();
}

void send datasetFile(){
  String API_TOKEN="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
  String SERVER_URL="https://demo.dataverse.org"
  String ID="24"

  String lineEnd = "\r\n";
  String twoHyphens = "--";
  String boundary = "*****" + milis() + "*****";

  char datasetFile[14] = “myDataset.csv”;
  String JsonMetaData="{}";
 
  // Prepare HTTP request

  //Initialise SD Card
? if (!SD.begin(4)) {
? ? Serial.println(“initialization failed!”);
? ? return;
? }
? Serial.println(“initialization of SD done.”);

  File myFile = SD.open(datasetFile);
? uint16_t fileLen = myFile.size();
? uint16_t extra_length;
? extra_length = start_request.length() + end_request.length();
? uint16_t len = fileLen + extra_length;

  // Set up TCP connection with web server
  if (client.connect(MANip, port)) {
? ? Serial.println("Start uploading...");

    if (client.connected()) {
? ?   Serial.println(“Start uploading…”);

      client.println(F(“POST "/api/files/" + $ID + "/replace" HTTP/1.1”));
      client.println(F(“Host:” + SERVER_URL));
      client.println(F("X-Dataverse-key=" + API_TOKEN));

? ?   // send multipart form data after file data...
? ?   client.println(twoHyphens + boundary + lineEnd);
? ?   client.println("Content-Disposition: form-data; jsonData=\"d\"");
? ?   client.println(lineEnd);
? ?   client.println("Content-Type: text/plain; charset=" + charset);
? ?   client.println(lineEnd);
? ?   client.println(lineEnd);
? ?   client.println(JsonMetaData);
? ?   client.println(lineEnd);

?     client.println(twoHyphens + boundary + lineEnd);? ? ? ? ? ? 
? ?   client.println(F(“Content-Type: multipart/form-data; boundary=”+boundary ));
? ?   client.print(F("Content-Length: "));
? ?   client.println(len);
? ?   client.println(start_request);

? ?   client.println(twoHyphens + boundary + lineEnd);
? ?   client.println("Content-Disposition: form-data; filename=\"" + datasetFile+ lineEnd);
      client.primtln("Content-Type: file”);
      client.println(“Content-Transfer-Encoding: binary”);
      client.println("");
      client.println(lineEnd);

      if (myFile) {
?       byte clientBuf[32];
?       int clientCount = 0;

      while(myFile.available())
?     {
? ?     clientBuf[clientCount] = myFile.read();
? ?     clientCount++;

      if(clientCount > 31)
? ?     {
? ? ?     client.write(clientBuf,32);
? ? ?     clientCount = 0;
? ?     }
    }

    if(clientCount > 0) 
      client.write(clientBuf,clientCount);
      client.print(end_request);
?     client.println();
?   }else{
? ?   Serial.println(“File not found”);
?   }
? }else{
? ? Serial.println(“Web server connection failed”);
? }
  myFile.close();?
? client.close();
? Serial.println(“done…”);
}

void getMANip(){
? while? (MANip? ==? 0)? {
? ? if? (!client.getHostByName(SERVER_URL, &MANip))? {
? ? ? //Serial.println(F(“Couldn’t resolve!”));
? ? }
? ? delay(500);
? }?
}

void loop() {
   delay(10000);
}        

The piece of code above only handles sending collected data, stored in a local file, to a dataverse repository. To store collected data locally on the smart DAQ device, data is needed first to be saved on the local file system SPIFS. The reader can find a good tutorial on Tutorials Point, with code examples on how to use the SPIFS library. Below is part of the code needed that focus on saving the actual sensor data.


void listDir(fs::FS &fs, const char * dirname, uint8_t levels){
? ?Serial.printf("Listing directory: %s\r\n", dirname);

? ?File root = fs.open(dirname);
? ?if(!root){
? ? ? Serial.println("? failed to open directory");
? ? ? return;
? ?}
? ?if(!root.isDirectory()){
? ? ? Serial.println(" ? not a directory");
? ? ? return;
? ?}


? ?File file = root.openNextFile();
? ?while(file){
? ? ? if(file.isDirectory()){
? ? ? ? ?Serial.print("? DIR : ");
? ? ? ? ?Serial.println(file.name());
? ? ? ? ?if(levels){
? ? ? ? ? ? listDir(fs, file.name(), levels -1);
? ? ? ? ?}
? ? ? } else {
? ? ? ? ?Serial.print("? FILE: ");
? ? ? ? ?Serial.print(file.name());
? ? ? ? ?Serial.print("\tSIZE: ");
? ? ? ? ?Serial.println(file.size());
? ? ? }
? ? ? file = root.openNextFile();
? ?}
}


void readFile(fs::FS &fs, const char * path){
? ?Serial.printf("Reading file: %s\r\n", path);


? ?File file = fs.open(path);
? ?if(!file || file.isDirectory()){
? ? ? ?Serial.println("? failed to open file for reading");
? ? ? ?return;
? ?}


? ?Serial.println("? read from file:");
? ?while(file.available()){
? ? ? Serial.write(file.read());
? ?}
}


void writeFile(fs::FS &fs, const char * path, const char * message){
? ?Serial.printf("Writing file: %s\r\n", path);


? ?File file = fs.open(path, FILE_WRITE);
? ?if(!file){
? ? ? Serial.println("? failed to open file for writing");
? ? ? return;
? ?}
? ?if(file.print(message)){
? ? ? Serial.println("? file written");
? ?}else {
? ? ? Serial.println("? frite failed");
? ?}
}


void appendFile(fs::FS &fs, const char * path, const char * message){
? ?Serial.printf("Appending to file: %s\r\n", path);


? ?File file = fs.open(path, FILE_APPEND);
? ?if(!file){
? ? ? Serial.println("? failed to open file for appending");
? ? ? return;
? ?}
? ?if(file.print(message)){
? ? ? Serial.println("? message appended");
? ?} else {
? ? ? Serial.println("? append failed");
? ?}
}


void renameFile(fs::FS &fs, const char * path1, const char * path2){
? ?Serial.printf("Renaming file %s to %s\r\n", path1, path2);
? ?if (fs.rename(path1, path2)) {
? ? ? Serial.println("? file renamed");
? ?} else {
? ? ? Serial.println("? rename failed");
? ?}
}


void deleteFile(fs::FS &fs, const char * path){
? ?Serial.printf("Deleting file: %s\r\n", path);
? ?if(fs.remove(path)){
? ? ? Serial.println("? file deleted");
? ?} else {
? ? ? Serial.println("? delete failed");
? ?}
}        

As I stated at the beginning of this short tutorial, this is kind of a #hack to dataverse's API. And the hack is on recurrently uploading a replacement dataset file to the repository. To this date, the API lacks the functionality of uploading raw sensor data to be appended by the dataverse system to an existing file hosted in a repository, the code above does it locally on the smart DAQ, saving collected sensor data and storing it in a local file. This means the smart DAQ needs to have enough storage space to save all experimental data collected during the experimental campaign. Another important consideration to have is the number of times the local file, stored in the smart DAQ, is uploaded to a dataverse repository. Too many times, in a short period of time, may result in a temporary loss of access from the dataverse server. Too few times, other researchers accessing data remotely are kept waiting and are unable to work on experimental data in a LIVE manner. This needs to be inquired to the host of the data repository for an accurate number of file uploads allowed.

One Final remark, to add the following. The best API for this kind of scientific research needs to be capable of LIVE data storage and accept LIVE raw sensor data. For instance, accept:

  • an inline string with sensor data and append it to an existing file CSV file on a repository.
  • sensor data plus information on where to be added in an Excel document file. For instance in a specific column/ row.
  • sensor data plus information of table and table fields to be added in an SQLite database file.

In 2022, storing experimental data makes more sense by doing it in an individual file rather than on a common centralized database server. The advantages of individual files, Excel or SQLite, over a common database can be summarized due to the centralized nature of the latter. In a centralized form, is more difficult and more complex to share experimental data contents. It also requires additional software and setup for someone wanting to access it offline. Another advantage over a centralized DB server is the fact individual files can be stored in a censorship-resistant web3, for instance on the Interplanetary File System (IPFS) Network. Accessing data is as easy as ... well opening MS Excel. For cases of Big, huge data records above Excel's maximum number of allowed data rows, SQLite database files are well-suited and fully compatible with Python or any other programming language with more than enough open-source libraries available on GitHub .

____________________________

one final note: The firmware programming code used to do LIVE experimental data upload to a Dataverse is now publicly available on GitHub . Here's the link.


This is the very first draft version of a future document. As part of my research on online live writing, no proofreading was made to this text. Only basic automated corrections. Over time the reader can expect additional changes to the text without prior notification: bugs, error correction, and also content addition. So make sure to check back later on this article.


May 2023 EDIT:

This document has grown and now has its own open repository where all is happening. Titled "Validation of Experimental Data Origins: A Swarm of DAQ Devices able to Deliver Unique Experimental Data using Blockchain-like Fingerprint ID to a Data Repository". The smart data acquisition device is now ready to use at any sci. research laboratory as an early beta test data acquisition device and looks and feels awesome ?? .

要查看或添加评论,请登录

Miguel Silva的更多文章

其他会员也浏览了