From an idea to Production
Providence Data Explorer

From an idea to Production

My colleague Rhiannon Williams and I have been working for the last year on a product from the ground up. This product has had a few tries before, but we were the first team to finish and release the vision into production. The data explorer screen above is good example of a complex system that looks simple - in fact, it is made up of many parts both backend and frontend, resulting in an accumulation of everything that this application entails.

Data Structure

I believe that the first place any project should start is with the data structures/database. We had a few issues to solve here, including the requirement for a hierarchical Organisation structure with permissions and access for different users at different levels. We had a requirement that some clients could be a service provider, which would in turn have their own clients which they could manage, plus different user levels for access, and analyst screens which require the user to have security knowledge.

No alt text provided for this image
Typical solution for organisation hierachy's the power of SQL

When you are building data structures that require referential integrity, a SQL database is the best choice in most circumstances. Referential integrity is very important, because it allows you to setup the database in such a way that it prevents the insertion of data that should never be allowed using constraints, and it allows you to increase the system's performance using indexes.

We chose PostgreSQL because of its ease of use, feature set and cost. This was one of the more complex relationships in the database and it is at the heart of everything, with most tables linking from Organisation in some way. We designed the client facing product and the database for the search backend in 2 weeks, with a few changes once we started implementation.

No alt text provided for this image
Organisationl Menu

Backend - APIs / NestJS

The old backend which never made it into production was written in Java/Springboot with 11 micro services, RabbitMQ using Cosmos/Mongo DB and cost $5000+ a month to run, Debugging it was a nightmare.

As Java isn't my forte, we chose to re-write in Node/Typescript using Nest JS(https://docs.nestjs.com/). Nest JS has a lot of documentation and addons, and provides an easy framework to build an API on, particularly for those who have used AngularJS before. We chose to use Fastify API and Prisma ORM instead of the default (Express/Type ORM). We found Prisma to be a much easier and more straight forward choice when compared to Type ORM - Prisma easily generates the database scripts for building and maintaining database updates.

It was surprisingly simple to connect with our Auth provider, which in my experience can often be quite fiddly to get working. The ability to write custom Guards and interceptors allowed us to integrate the Auth process at a granular level - we were able to check if a user had access to a particular role in an organisation even before we did any other queries. It was so fast to write API calls, and the project is laid out in a way that is easy to follow, with built-in Swagger documentation support. These technologies allowed the first backend to come together very quickly, fleshing out the user/organisation management and associated reference data needed to build out most of the screens.

No alt text provided for this image
Grid for Viewing Data

Message Queues (BullMQ)

The second backend which is the core of our service, goes out onto the web and scans for keywords and threat detection - this proved a much harder problem to solve. As we would be handling large amounts of data and searching over almost every social / web service including the darkweb, it required a much different approach. On top of that we also have queries performed by our clients which needed to be run on an ad-hoc basis.

This required a Queue to manage the jobs. Another reason I chose Nest JS was because of its out of the box support for micro services and BullMQ message service using Redis Cache to handle the queue. The database holds the information about jobs - then we have a Search Queue which handles all our internal search and an Investigation Queue which handles searching the data returned from our external sources, plus an instant search which goes out any time and crawls the internet for particular results.

So far we have run hundreds of jobs using these backends, without any crashes or major issues. We have had jobs that needed to be fixing from time to time due the changing nature of data collection on the internet, but every part of the queue system is written in such a way that we can plug in and out any search type and data solution easily enough, without breaking any other parts of the application. We have also built this such that we can break it up into micro or macro services later on if we start to hit performance bottle necks.

We have a built a categorisation, location and keyword system which allows us to pull down an article from the internet, find all the keywords, match it to our categorisation system and work out locations on the fly in real time. We can process tens of thousands of results in seconds.

Front End

Now that we have stored and categorised all our data, and have users roles and permissions sorted we need a way to view all this juicy information.

UI: The front end is built in react/typescript, using Material UI. We have pages such as the menu which uses react router, which are setup like templates to build out the frame of the website. Some of the more difficult components we built was the editor using draft js, screen splitter for showing 2 items side by side and searching, the menu for the hierarchical org structure

No alt text provided for this image
Editor

API Calls: We are using React-Query (https://www.npmjs.com/package/react-query) which is wrapped around a pattern I have built before that allows me to build and connect an API in a short amount of time using hooks and comes with many features like caching out of the box.

Forms: React hook forms (https://react-hook-form.com/) for any forms that need to be filled out as this provides validation and other features.

No alt text provided for this image
Analyst Screen

State Management: React context is used for the initial user setup, but then HookState is used for any other internal global state mechanics (https://hookstate.js.org/). Redux could have been used for this, but in my experience it is over engineered for anything other than really large scale solutions. HookState requires just 3 lines of code to make a global state, and is so easy to use that you barely know you are using it.

Environment / CI/CD (Single Click Deployment): If you have ever used react projects you will know about the .env file. I have found this to be the worst way to handle environment variables. It also is very hard to make it work with any deployment solution. We are using Azure pipelines, which supports json variables very easily, so I wrote a custom solution using env.json files. We do not need to store a single variable, database connect string, or auth details in the project or on git. I use env.local.json for development and we copy up the env.json file filled in with the variable names but no information - then the pipeline fills in the blanks on deployment. This allows me to do a single click deployment for dev, test and production each with different environment variables.

Summary

Initially, we built the first version of the front end connected to the older backend, in order to get something client facing ASAP. We managed to produce a fully functional UI from scratch to in front of a client in just 4 months. Then we replaced the old backend with the new one in 5 months, and all of it has been stable in production since.

Breakdown of our tech stack:

  • PostgreSQL
  • NestJs, Bull MQ, Redis, Typescript
  • React, React-Query, Hook-State, React-Hook-Forms (many other libraries)
  • Deployed on Azure Web Apps using Azure piplines

Chris Nielsen

Software Engineering Manager @ Orica

2 年

Great work! Love the detail, really brings the tech stack to life

回复

要查看或添加评论,请登录

Daniel Kelder的更多文章

  • Resistance is futile - AI in existing projects

    Resistance is futile - AI in existing projects

    Like most good developers, I was initially skeptical of the next big thing—AI was no exception. I went through all the…

  • AI to replace your programming?

    AI to replace your programming?

    A lot of retric around the rise of AI replacing programmers, is this true? Right now in simple terms No. Currently most…

    1 条评论
  • Python wrangling (My first time using python)

    Python wrangling (My first time using python)

    In a 20+ years of programming I have never used Python. Recently we needed to solve a problem and for some reason in…

社区洞察

其他会员也浏览了