TODO:

Links to final source code (github repo) Mashira Farid

Use cases/ requirements of API (Focus on what has been achieved)Zifan Wei

The system design and Implementation

update from feedback Avijit Prasad
Final software architecture (diagram) Tingzhuang Zhou
tech stack we using (any changes) Tingzhuang Zhou
other teams API using (used team minions' API but discard finally- justify this) Avijit Prasad
existing API using Avijit Prasad
Summary of Key benefits/achievements of project relating to design/implementation Lin Thit Myat Hsu

Team organization and conclusion/appraisal of your work

Responsibilities of each member Lin Thit Myat Hsu (Anyone else can help me figure out who did what it’d be perfect)
How did the project go Mashira Farid

Major achievements in project
Issues/problems encountered
What kind of skills you wish you had before the workshop (this way we can try including them in other courses)
Would you do it any differently now? • I.e. tools, different technology, time management, etc

Project Summary

Major achievements

Choropleth map

One of the main features of our final website is the interactive choropleth map of each county in each state in the US. It displays different types of data, such as covid risk level, the ratio of the population vaccinated, and the number of staffed and licensed beds, including ICU beds. The user can select which data they wish to view via a dropdown menu, and updates the map and colour scale legend accordingly. Users can zoom in and out, hover over a county to get its name and the value of the data being shown, as well as click on the county to get more information. By having this map, policymakers can coordinate with other counties and plan resource allocation more efficiently.

Charts and county info

Another feature of our final website is the ability to view charts regarding the number of covid cases in a certain county over time. When the user clicks on a county on the choropleth map, some information regarding the county, such as the covid community level, infection and vaccination rates, and the number of hospital beds available, are shown on the right side. A line chart showing the number of covid cases in the county is also shown. By having this information easily available, policymakers will find it easier to make planning decisions.

Preprocessing data

Many of the features of our website, such as the choropleth map and charts, require taking data from multiple sources, putting them together, and mapping them to the correct counties. Some values, such as the covid community risk level, also need to be calculated by taking data from multiple sources. In order to preprocess all this data, one of our team members created a Jupyter notebook, which when run, merges data taken from CovidActNow and CovidCareMap, filters out only the data required, calculates the risk level for each county, then maps all this data to their corresponding counties via a unique FIPS code. The result is outputted as a JSON file, which can then be easily used by different components in our website.

Problems encountered

Backend

One issue we faced was that we took a long time to understand what exactly was required from our API. Initially, we believed that all we needed to do was to get reports of diseases from the data source provided (Global Incident Map). As such, a majority of our work on the API was based around solely scraping and processing the data from the data source. However, after discussion with our mentor, we realised that we were actually required to get links to the articles regarding diseases from our data source, then analyse each article to generate a report for each disease mentioned in the article. Each report would contain details such as the time period of each case, symptoms, and location of cases. This was much more work than we expected, and we only had about a week to fix our API to match the requirements.

Due to this lack of time as well as lack of experience with scraping and processing data to match the requirements, the best we were able to do before the deadline was to get the URL, headline, date of publication, and main text of each article. For the reports section of each article, we were only able to generate one report object which contained a list of all the diseases and locations mentioned in the article, as well as a time period for all the cases. However, it was not fully completed to our satisfaction, for example, the time period would sometimes be longer compared to the actual time period mentioned in the article because only the smallest and largest mentioned dates would be extracted from the article.

Another issue we faced was scraping data. No one in our team had experience with scraping data from websites, so the members who were assigned to work on the scraper had to spend a significant amount of time learning how to scrape data. After they had gotten comfortable scraping from the data source, our team found that we would also need to scrape and analyse every single article. This was complicated, as each article website had different structures, making it hard to successfully scrape and analyse each article, compared to scraping from one website, like our data source. As such, it was almost impossible to successfully scrape and process each article. In the end, we had to resort to using a few external libraries to scrape and process each article, but even then there were a few instances where this wasn't fully successful.

We also faced many difficulties trying to deploy our API. We had decided to deploy our API using Heroku, as it was quick and simple, and one member had experience with Heroku. However, we quickly found that it wasn’t as simple as we had imagined. Because of the various tools and libraries we were using, our API was bigger than the size limit allowed by the free tier. One of our team members had to spend a few days trying to reduce the size of our API, which was no easy task as every single library we had installed was essential to our API.

Even after we finally managed to deploy our API, we still faced issues, such as our API timing out due to how long it took to get a response. The way our API worked, it would scrape article URLs from the data source based on the provided parameters, then scrape and process each article URL, before returning the results as a JSON file. Our data source is a very old website that takes a long time to fully load, so most of the time was wasted waiting for the pages of the data source to load, which would lead to the API exceeding the Heroku request timeout limit of 30 seconds. We did manage to fix this in the end, however it resulted in only part of the expected data to be returned. For example, if a user requested articles that mentioned the words ‘outbreak’ and ‘hantavirus’, the deployed API would only return articles that mentioned ‘outbreak’ to stay within the request timeout limit.

Frontend

The main issue we faced with our frontend was selecting a target user. We had a very vague idea of what our website should do, but weren’t sure who our target user should be, so instead we focused mainly on the features of our website, putting quantity over quality. Some of the features our initial frontend aimed to have included being able to view case reports around the world as a table, bookmark diseases and locations to receive alerts, and view various charts and predictions. However, due to the number of features we planned to have and their complexity, in the end, we were unable to properly implement most of the features. After our first demo, we received alot of feedback regarding the fact that our target user wasn't well defined, our frontend did not match the needs of our target user, and we had many features but none were well implemented. We were encouraged to try defining our target user and focusing on providing one or two key features very well, so for our final demo we ended up restarting our frontend almost completely from scratch, and transforming it into a single page app instead of a multipage website. As such, alot of the time and energy we had spent making various components and pages for the frontend felt like a waste.

Because of this overhaul of our frontend, we also had to spend more time learning to use new libraries and tools, which was hard due to the limited time we had. We would keep running into issues and errors with the libraries we used, and for some of them, such as react-simple-maps for drawing the choropleth map, and d3 for the colour legend scale, it was hard to debug due to the lack of documentation and resources.

Another issue we faced was regarding preprocessing the data we collected from various sources. Alot of the time our choropleth map and charts did not match the ones shown on our sources, even though they used the same data. It took a long time to debug and fix the Jupyter notebook that did the data preprocessing to match the original sources.

Skills we wish we had beforehand

How to scrape data

What we would do differently

stricter deadlines

add more features

get API properly working

more defined roles for each member, instead of randomly assigning tasks

spend more time narrowing down target user, instead of focusing on features to offer

Key Benefits

Our first main key benefit was the total overhaul of the design compared to our old week 7-8 design. While that design had some benefits, with help from our tutor we managed to refine our ideas and move from an application which did a lot of multiple things decently to a much better application that did one thing very well - all done within only 2 weeks.

Another benefit we found is the singe-page design of the website. Compared to other designs we’ve seen, a single-page design like the one we have reduced the load time of loading multiple websites and making the user experience much faster and smoother for the user to use.

The website’s map’s ability to highlight certain information also makes it easier for users to gather and display information in a quick and efficient manner compared to other websites, and since it’s all accessible on one page, switching between different maps is quick and easy. The fact that each of the areas on the map can be interacted to display as more information, as well as the graphs, makes our website a potent source of information at any given time.

Responsibilities of Each Member

· Mashira was our main coder for the project, making up most of the backend work, which included the API connections and parsing data from the API to something that the frontend can use (usually involving converting it from JSON)

· Lin was one of the designers of the application, setting up and creating the front-end prototype of the website, as well as cleaning up and advising features that might have been either good or possibly unneeded. He also was the one who did the clean-up and Q&A of the program and the report.

· Avijit(?) was the one who worked on the API the most and created a new dataset and algorithm which allowed us to display all the information on the map.

Final Report