Full Stack Walkthrough
Original post on the subject stack here.
Summary
High level development walk through for a toy example of a modern full web application stack. Intended for those who want an idea of what basic core knowledge and skills a "Full Stack Engineer" would have without being bogged down with the exstensive knowledge required for a real world project. Some scope of additional knowledge required for a real world project is mentioned.
The complete source code of the example project https://randalmoore.me is available at https://github.com/RandyMoore/mySiteDjango
Requirements
Before starting out on any project it is important to clearly define what the goal is. In this case a personal website serving as a space to learn, experiment, and showcase web technologies. The site serves as an example for those wishing to learn the technology and so the project should be as simple and understandable as possible. The site also serves as a professional portfolio and so should be visually appealing to a non-technical audience.
Architecture
Requirements drive architecture. The user roles consist of the content creator and the content viewer. The content is static and not sensitive which simplifies several aspects of the architecture. We don't need to be concerned with growing storage demands or security since there is nothing sensitive to protect. Another requirement is deployment flexibility compatible with the major cloud service providers for low cost high performance hosting. Someone wishing to learn the technology should also be able to easily deploy the site locally so they can experiment with it. A two tier architecture (web server and database layers) encapsulated in a deployable container fits the bill for this use case.
Languages
Choice of programming languages impacts the set of existing software you can choose to build your project from. Language and existing software choice (framework, libraries) are somewhat of a chicken and egg problem, start with which you feel strongest about. Web applications are broadly split into two parts, the front end and back end. The front end part of the project is run by the viewer's web browser. The back end is run by the computing resources provided by the cloud service provider.
There's less of a language choice for the front end which is driven by the need for compatibility to run on a variety of web browsers. For a web page HTML is required for the structured layout of content; CSS and Javascript are optional for separating out style and adding behavior respectively. There are many other front end languages you might encounter (e.g. CoffeScript, SASS) but these generally compile down to some older version of Javascript or CSS before it is served to the viewer's browser.
For the back end, anything goes. The client's web browser will contact the server using HTTP. The web browser has no knowledge of what is happening on the other side of this HTTP interface and so you are free to choose whatever language you wish. But you do need to comply with the HTTP specification, which is rather large. Implementing the behavior required by the HTTP specification using only primitive language features would be an impractical task. Enter frameworks.
Frameworks
HTTP is well defined in a specification and this has allowed others to write reusable software to take care of the HTTP details. Frameworks are language specific, they provide a bare-bones system and allow you to add your own code to add custom behavior. You may already be familiar with the concept of a library. A framework is similar in that it is reusable code but is different because it drives control instead of responding to commands as a library would. In this case the framework code will be the first to handle the incoming HTTP request from the viewer's web browser and then it will call your code to form a reply. Django was chosen as the web framework for this project. Here's a visualization of framework vs library:
Libraries
The next step in the project is deciding what content should be added. A blog is a standard way for professional software developers to market their brand; the blog may be used to write about each project as it is added to the site. A naive approach is to write a single HTML file for each blog post. But writing HTML code can be cumbersome and having each post as a stand-alone HTML file would be a hassle in the future if you want to change the style of the entire site; each file would have to change.
Writing HTML by hand can be a good fit for simple data content that isn't meant to be pretty. Creating something that has appealing style requires design skill. Fortunately libraries exist that encapsulate design. A quick search yields a Bootstrap clean-blog design. This takes care of the design aspect of the blog.
People have also written reusable software for the content aspect of things like blogs. This kind of software is known as a Content Management System (CMS). Generally a CMS is used as a means to separate content creation from the technical details of a site. Content specialists (e.g. journalists) may add content without becoming mired in technical details. CMS systems usually include WYSIWYG (What You See Is What You Get) editors; one way to avoid writing raw HTML. CMS systems are generally dependent on the web framework since there is a lot of glue code within the framework that has to interface with the various facets of the content (html generation, image resources, URL paths managed by the CMS, ...). Some searching yielded Wagtail as a CMS for use with Django. Perhaps a bit overkill for this site (having a team size of 1) but I was curious about CMS technology and wanted to experiment with a full fledged example of this technology.
Development
For a team size of one enough has been decided at this point to begin working with code. Generally a development environment is built based on choice of language since each language has it's own paradigm of what a development environment is. For the Python back end we use pip to fetch and install project requirements (existing reusable software components). A common concern with development environments is isolation between multiple projects on the same machine. We use VirtualEnv to keep all libraries and frameworks from interfering with each other and the global Python environment on our development machine.
Thought should be given regarding project structure. The chosen framework will often dictate this since it needs to know where to find your extensions but often your project will have files that exist outside of the framework. For example, in this project the declaration for which Python frameworks and libraries to install are kept in a file requirements.txt that can be fed to pip. This and other configuration files that affect the deployed code environment (Docker image discussed later) are kept in a folder named config at the highest level of the project. The Django framework structure begins as a sibling at same level.
A modern trend with languages and frameworks is to provide a REPL (Read Evaluate Print Loop). Languages (ie Python) often have a shell that evaluate commands this way, providing an interactive environment that encourages experimentation and exploration. Django and many other frameworks mimic a REPL with hot reloading. In Django, this is provided by the built-in development server command 'python manage.py runserver'. Anytime you make a change to your code the changes are quickly and automatically ingested by the server; reflected the next time an HTTP request is processed. This feedback loop allows you to incrementally add and see small changes to the site, speeding development.
To edit code many people find an Integrated Development Environment (IDE) helpful. The IDE adds smarts and tools to your basic text editor. Smarts meaning it is aware of the code in your project and can automatically recognize relationships between pieces of your code and also library or framework code. The IDE allows you to quickly navigate through execution paths to quickly understand how the pieces work together. I'm a fan of IDEA products, their community edition (CE) PyCharm is great for Python centric projects. The 'python manage.py runserver' command may be run in a debugging session within the IDE, making it trivial to debug and explore the inner workings of the server code.
PyCharm CE doesn't support front end languages well so I've been using (learning) the Atom editor for front end work. As mentioned before, front end programming languages generally get compiled down to an older version of HTML, CSS and Javascript to maximize browser compatibility. The step of converting your modern code into these older language versions is called "bundling". For the front end development environment this project uses Node.js as the engine to execute the Javascript. Similar to pip for Python for Javascript there is npm (Node Package Manager). Yarn may be used interchangeably with npm if npm gives you grief.
In this project the Javascript dependencies are recorded at the top level in a package.json file. Grunt is a Javascript program that serves as a way to glue together the steps of the bundling process. The tasks are declared in the project level Gruntfile.js. The default task invoked with the 'grunt' command may be run while editing code and serves as a REPL; when any front end related file changes it invokes the rebundling process and the changes will be reflected by the development server after a browser page refresh. For debugging the developer tools available in the browser are used (available by default in modern browsers).
Project Management
After writing some code and getting something basic to work you will worry that making further changes will break what you've already accomplished. Enter version control systems. Such systems allow you to track changes, moving between versions of your code at will. Version control becomes more of a concern as team size increases, but even with a team of size of one they are worth using. For example you can use branching to work on a major new feature while having the flexibility to quickly tweak the version of code that is currently deployed and being viewed by the public. This project uses git. GitHub (a company providing git enabled service) is used as a publicly accessible repository since it is well suited for open source projects.
Project management consists of far more than version control and becomes increasingly important as team size grows. At a high level a team decides on a development methodology and refines a specific software development process to meet the project needs. The choice of methodology and process can influence what technologies to use and so is usually decided early in the project's life cycle. The art of harnessing the work of multiple individual developers is it's own subject and best learned from experience working with a team.
Deployment
Once you have something you'd like to show to the public the next step is to serve it to the world. Technically you could serve from your development machine but keeping your development machine running 100% of the time for the site would likely be a nuisance. You could also serve from a different machine that you own but this comes with a great deal of complexity and will probably expose your personal network to a high level of security risk. Additionally your home network is likely on the fringe of the world wide web which makes it slow to respond.
Fortunately companies offering cloud services exist which allow you to host arbitrary software and take care of the messy details of hosting. The idea is that the hosting company figures out how to manage all of the hosting details for many users and then provides hosting as a service, adding value with economy of scale. A popular example is Amazon Web Services (AWS), but many vendors exist. Even with the hosting details taken care of there is still the issue of how to install your software on the cloud.
Accessing running software on the cloud isn't as convenient or seamless as your local development environment. Changing anything almost always guarantees bugs. Wouldn't it be awesome if you could draw a box around the code running on your development machine and simply drop it into the cloud? Then changing to the cloud environment would introduce minimal changes (seen from the perspective of your code) and result in fewer bugs. Abstraction comes to the rescue: why not abstract a machine, or more precisely the operating system? The technology that does this has existed for some time and is know as Virtualization. Virtual machines came about early on but are cumbersome due to their large size from replicating an entire OS. A much lighter weight alternative is Docker, which this project uses.
The Docker paradigm is to have a single app with all dependencies running in a single container and to tie multiple containers together to form a system. This is an example of the Single Responsibility Principle which gives rise to many desirable design traits including ease of reuse. In this project there are two Docker containers; one for the web service and another that hosts the database. The database container is reused from a publicly available repository; the project only needs to declare it as a dependency, populate it with data, and it works out of the box.
At the top level of the project is the Dockerfile that is used to build the image for the web server. Running an image with Docker (the program installed on a machine) creates a container. The docker-compose.yml ties together the local web server and database images to form the complete stack; running the command 'docker-compose up' will create both web server and database containers from the images and make them visible to each other in their own network.
Once the local docker images are working they are pushed to the cloud, in this case an AWS repository. AWS requires the repository identifier reflected in the image name hence the separate docker-compose-AWS.yml file (there are also some settings to best allocate the use of the micro but free computing resources). At a high level deploying to AWS involves creating a spot to host the containers (EC2 instance), authenticating through a local AWS docker client, and running 'docker-compose up' on docker-compose-AWS.yml. The Docker service running on the EC2 instance fetches the images from the AWS repository and brings them up to form the complete system.
Testing
A minimal amount of testing was required for this project. Testing needs increase with how mission critical a project is and how many people are working on the project, among other considerations. Even though this was a toy project with one developer it does have some unit tests, serving primarily as regression tests. React and Flux were new to me so unit tests were created once I had things working so that I could refactor the code and quickly detect when I had made a code change that resulted in a change of code output. Of note in these tests is the use of Jest which allows creation of unit tests without the manual creation of assertion code. It has built-in support for snapshots (auto verifying everything) - perfect for regression testing.
The development process for this project was most like software prototyping. Emphasis was placed on getting up and running as quickly as possible with quick turnaround enabling experimentation and progress along the learning curve. The REPL like nature of the environment offered the bulk of testing for this project. For systems that allow interaction, especially when users are allowed to change data on the server, more testing would be necessary.
Not in this Example
Being a toy project there are many parts not present here that would be encountered in a real world project. Here minimal attention is given to security as the content is static and not sensitive. HTTP encryption via SSL is becoming an expected feature. Once you have any user interaction authentication becomes a requirement. Scalability isn't an issue here but if a web site provides a non-trivial service to many users a distributed system architecture would likely be necessary.
For development workflow a team would have a system for deciding, recording, organizing, and tracking implementation progress for requirements. Automated testing with a deployment pipeline would exist. Error handling for both end users and developers (e.g. throw meaningful exceptions) would exist. Logging is typical as is emitting, collecting and analyzing performance metrics from the system for both engineering and business purposes. Each of these areas requires deep knowledge and experience to implement well; there is often a specialized role created for each. Participating in existing projects (especially open source) is the best way to learn about these subjects.