Space Mission Launch Analysis (SMLA) API
For this project, myself and a team of two other engineers built an API system using a Flask web server deployed using Kubernetes that provides the user with various information about space launches from the 1950’s to today.
MOTIVATION
Within the past two decades the growth of the private space industry in the United States has exceeded expectations in its growth. From its role in communications, GPS, and satellite TV to its importance to defense in our country, it’s clear that such an important pillar in the United States economy should be better understood by engineering students alike. Compounding this, there are a number of space agencies, apart from NASA, that students are not familiar with. Thus, there is a need for tools to better understand and communicate the activity and economy of entities involved in the space industry– both publicly and privately.
This project aims to give students methods of better understanding the space industry. To do so, we have created a software code-base with several functionalities capable of returning key pieces of information about space mission launches from as far back as the 1950s. By exploring using this software, students will be able to get a better grasp at modern space launch activity, historical trends, and the current volume of the space economy.
HIGH LEVEL SUMMARY
This API, named the Space Missions Launch Analysis API (SMLA) is a tool capable of organizing and visualizing information on space launches– dating as far back as the space race between the US and the Soviet Union. The domain of the dataset includes the entire globe and is updated with modern space launches as recent as August 2020. This software makes use of three key technologies, of which we will describe in more detail: Flask, Docker, and Kubernetes. Using these software tools, this API is able to be deployed as a Kubernetes cluster, from a docker-compose file, or locally on a user's computer.
By doing so, the user is able to interact with this software to explore more specific pieces of information on space mission launches as well as accessing two methods of visualizing the dataset we have chosen.
Bar graph depicting the total spending of space launches per country in dataset
One such method of visualization is presented above. The API can sum the total cost of spending per country, assuming the spending has been provided, and graphs the cost in the form of a bar graph. Notably, the USA spends a lot more than every other country but this may be due to the dataset lacking information on these other countries or these other countries not making the information publicly available in the first place.
DATA DESCRIPTION
The data set contains information of space mission rocket launches from around the world. It is sourced from Kaggle.com. The range of the information goes as far back as the 1950’s and still continues to this day. Using this dataset the user can create and drop ‘mission_launches.csv’ into the repository’s directory. This is the file that contains all the information that is accessed via the website Kaggle.com. Due to some issues with potentially hosting this data, and the functionality of the Kaggle.com website, the user must download this file themselves in order to operate the repository.
The information contained includes organizations, locations of launches, date launched, etc. Allowing us to create a map of rocket launches superimposed on a map of the world. As well as identifying who is responsible for which launches. Additionally the approximate price of each launch is sometimes included in each entry of the dataset. The API uses this information to return the total cost used by each organization. But note, plenty of entries are missing this information as it is often deemed confidential.
KEY TECHNOLOGIES
Flask: The flask library is imported and primarily utilized by ‘flask_api.py’ within the repository. This python file is responsible for hosting ‘front-end’ functionalities. In other words, end-users will interface with the functions in this file directly. Through use of the Flask library, we can host the code on a Flask web app server, where the user is then able to access ‘endpoints’ via the ‘curl’ command (e.g., ‘curl localhost:5000/jobs). Each endpoint represents a different function, in the previous example the user accessed the ‘jobs’ endpoint. As a result, the list of all previous jobs is returned to the user’s terminal.
Docker: : Docker allows the source code (i.e., ‘flask_api.py’, ‘worker.py’, and ‘jobs.py’) to be containerized. Essentially, Docker will package these files, and their dependencies, into an object called an ‘image’: a blueprint of everything necessary to run each python file. These files are then uploaded to DockerHub, a website much like GitHub that hosts these images. By doing this, you can allow use of other methods of software installation to greatly speed up development and setup of the software.
Kubernetes: Kubernetes takes advantage of the aforementioned technologies to quickly create instances of these docker containers. These instances are called pods, or virtual machines that use the ‘blueprint’ or docker images to act as virtual machines capable of running the code in individual systems (pods). By using several Kubernetes files, we can utilize several abilities including: scaling this software deployment up or down, restarting services automatically, abstracting parts of this software, and maintaining modularity.
By making use of these key technologies the software deployment becomes robust, modular, and scalable. Pictured above is a drawn representation of the network of Kubernetes clusters that host several important pieces of the code base, such as: the front-end Flask API, the Redis databases, the analysis/worker cluster, and the creation of queues. Aside from the object titled ‘user interaction’, which represents the cluster which hosts the ‘flask_api.py’ code, the majority of the systems are abstracted from the user.
Additional notes:
Database 0 contains the dataset described in the section above.
Database 1 stores the job queue
Database 2 stores the processed analysis
ENDPOINT LIST
Here is a list of endpoint routes contained in the ‘flask_api.py’ file. These are the commands that are to be accessed by the user on the front-end of this code base. Note, that although there are only 3 endpoints listed, there are multiple ways to use each endpoint.
‘/data’: This endpoint is related to the data of the mission launches stored in-memory. It directly accesses the information on the space launches contained within the Redis databases. Thus, it affects the data that is accessed by every file in the codebase. This endpoint can be used in conjunction with the ‘GET’ command to return the information currently stored in the database to the user. This is useful if the user needs to view what is currently being stored in-memory, or to check if there is any information being stored at all. The user can also use the ‘DELETE’ command on this endpoint to clear the stored memory. In the event that the user does not find any data in the database, they can use the ‘POST’ command to update the database with the .csv file contained in the repository.
‘/jobs’: This endpoint is related to the queuing of ‘jobs’ or tasks related to analyzing or visualizing the data (e.g., creating the map, returning a list of all organizations). Using the ‘GET’ command on this endpoint will return a list of python dictionary objects containing information on all previous jobs the user has created.
‘/jobs/<string:JOB_ID>’: If the user adds another ‘/’ and types out the ID of a previous job after the slash, then the dictionary object of a specific job will be returned. Uses the ‘GET’ command.
‘/jobs/<string:ROUTE>’: If the user adds a string containing the name of a function, then a job will be created and the result will be stored in Redis. This is the main way the user will interact with the application.
‘/jobs/clear’: If the user uses the ‘DELETE’ command on this endpoint, the current list of jobs will be cleared.
‘/help’: This endpoint is accessed via the ‘GET’ command. Users can access it by curling the IP address of the Flask server. A text message containing information on usage of all the endpoints will then be returned to the user.
USAGE - POST A JOB TO THE DATABASE:
First make sure the Kubernetes cluster is set up and Docker is running
2. Make sure to post the data before querying to get actual results. Else there will be an error message waiting.
3. Next, add a job to the queue using this query:
4. Finally, find and retrieve the jobs list (which can be done through a browser):
localhost:5000/jobs