Commit 2d168b46 authored by Michael T Bailey's avatar Michael T Bailey

Merge branch 'tales_of_imt' into 'master'

Tales of IMT Episode 1

See merge request !64
parents d0d7dbde afba62ec
......@@ -46,7 +46,7 @@ DEPENDENCIES
json
RUBY VERSION
ruby 2.4.1p111
ruby 2.5.0p0
BUNDLED WITH
1.15.3
1.16.1
......@@ -6,6 +6,9 @@ main:
- title: "Events"
url: "/events/"
- title: "News"
url: "/news/"
- title: "Calendar"
url: "/calendar/"
......
......@@ -8,6 +8,8 @@ layout: default
</h1>
{% if page.category == 'events' %}
<h4>{{ post.event_date | date: "%A %B %-d, %Y" }}</h4>
{% elsif post.author %}
<h4>by {{ post.author }} | {{ post.date | date: "%A %B %-d, %Y" }}</h4>
{% else %}
<h4>{{ post.date | date: "%A %B %-d, %Y" }}</h4>
{% endif %}
......@@ -17,23 +19,31 @@ layout: default
{% if post.hero_image %}
<div class="col-md-6">
<p>
<img class="img-fluid" src="{{ post.hero_image | absolute_url }}" alt={{ post.hero_alt }}>
<img class="img-fluid" src="{{ post.hero_image | prepend: site.url }}" alt={{ post.hero_alt }}>
</p>
</div>
<div class="col-md-6">
<p>
{{ post.excerpt }}
{% if post.content contains "<!-- more -->" %}
{{ post.content | split:"<!-- more -->" | first }}
{% else %}
{{ post.excerpt }}
{% endif %}
</p>
<p>
<a href="{{ post.url | absolute_url }}">Read More
<a href="{{ post.url | prepend: site.url }}">Read More
<i class="fas fa-arrow-circle-right"></i>
</a>
</p>
</div>
{% else %}
{{ post.excerpt }}
{% if post.content contains "<!-- more -->" %}
{{ post.content | split:"<!-- more -->" | first }}
{% else %}
{{ post.excerpt }}
{% endif %}
<p>
<a href="{{ post.url | absolute_url }}">Read More
<a href="{{ post.url | prepend: site.url }}">Read More
<i class="fas fa-arrow-circle-right"></i>
</a>
</p>
......@@ -44,5 +54,5 @@ layout: default
<br />
{% if page.archive %}
<p><a href="{{ page.archive | aosulute_url }}"><i class="fas fa-arrow-circle-left"></i> Archive</a></p>
<p><a href="{{ page.archive | prepend: site.url }}"><i class="fas fa-arrow-circle-left"></i> Archive</a></p>
{% endif %}
\ No newline at end of file
......@@ -20,6 +20,9 @@
<div class="container">
<div class="d-flex justify-content-center flex-wrap">
<h1 style="color: white">{{ page.title }}</h1>
{% if page.author %}
<h3 style="color: white">by {{ page.author }}</h1>
{% endif %}
</div>
</div>
</div>
......
---
title: "Tales of IMT #1: Addressing Aquia and How SRCT Content Gets Served"
layout: post
categories: news
slug: imt-ep-1
date: 2018-04-29T00:00:00-0400
hero_image: "/assets/img/servers.jpg"
hero_alt: "Future Aquia Servers in a car"
author: "Michael Bailey"
---
This is going to be one part of what will presumably be part of a multi-piece transparency initiative in which I, Michael Bailey (Lead Systems Admin on the Infrastructure Management Team at SRCT), show the campus community (and SRCT) how information is served at SRCT. I'll cover how our sites are served and a few IT tricks we use including Host matching, reverse proxy, and auditing. **These pieces are going to tow a line between what is right and what we do**. I will try to disclaim what we do and what the right move would be to do, because they aren't always the same as this is effectively a volunteer position with limited resources.
<!--more-->
# Addressing Aquia
I've been hearing a lot in terms of people wondering what the situation is with Aquia. Presently we are working on having relatively semantic changes shuffled between us and Professor Bell according to ITS's requests and concerns. We are working now on getting our servers to a new intermediate location until we can move in. After all of this is done, we may need to hold for background checks. They have not started to my knowledge and really only apply to me.
# How Requests are Made to Us
Essentially, whenever a browser is provided a URL like go.gmu.edu, it asks a series of servers where it actually is on the internet via it's [IP Address](https://en.wikipedia.org/wiki/IP_address). This is a configuration known as [DNS](https://en.wikipedia.org/wiki/Domain_Name_System) and we have ours configured at [Hurricane Electric](http://www.he.net/). Our *.gmu.edu domains were handled with an agreement with [GMU Networking](https://itservices.gmu.edu/services/view-service.cfm?customel_dataPageID_4609=6141). As a reminder or if you weren't aware, pretty much everyone on the Infrastructure Management Team is in the workforce, so there's some vested trust in how we handle things.
![imt](/assets/img/imt/dns.png "dhaynes making dns requests")
# What Happens After Requests Are Made To Us
So now we have a web request we need to fulfill sent to 107.170.176.214. All we have to go on as far as fulfilling their request is the browser requesting from our IP right? **This is a key misconception.** We get a wide variety of information, including, on average, what is below. Not only can we use this information to route your request to the right site internally, we can use this to handle other things, such as if we are able to serve you an old copy without issue ([caching](https://en.wikipedia.org/wiki/Cache_(computing))).
![imt](/assets/img/imt/go_request.png "HTTP Payload ")
As an example of caching, services like Whats Open can cache data in certain cases. For instance, if you requested what was open at 3:00pm, while the interface may change as far as what is closed when you ask again at 3:05pm, the actual hours aren't changed so we are free and clear to serve you "stale" hours without requiring the server to go back and fetch the hours again, which may take processing power.
More importantly, routing the request properly. A key element our web server, nginx, is able to manage is the "Host" header. It can read this and determine what actual page you're trying to visit. It's a simple configuration in nginx called [server_name](http://nginx.org/en/docs/http/server_names.html). It defines what configurations should be taken into effect in what cases.
# Redirecting Request Internally
Despite being rich applications in most cases rather than just static HTML, we host multiple applications of multiple types on our servers.
Take for instance "albert", where following web configurations/routes exist.
```
albert bookshare gitbot icinga-web.conf
pgp pmwiki roomlist schedules weather
api default go mediawiki
piwik punycode sched-joke srctweb
whats-open-web
```
As a result, we really squeeze resources out. We have 2GB of memory on albert, leaverage swap, and 30GB of disk on Albert. Gitlab is on a much bigger dedicated box, so that's a different story.
But what does this actual delivery look like? Take for instance our Go example.
Here's ashortened version of what's in production, ignoring certain overrided paths, comments or anything "extra" we don't need to care about.
```
server {
listen 443 ssl;
include ssl.conf;
server_name go.gmu.edu;
ssl_certificate /etc/nginx/ssl/go/go-chain.crt;
ssl_certificate_key /etc/nginx/ssl/go/go.key;
location / {
proxy_pass http://localhost:9992/;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
proxy_connect_timeout 10s;
real_ip_header X-Forwarded-For;
real_ip_recursive on;
error_page 502 /502.html;
error_page 504 /502.html;
}
```
This is effectively what configures go. So what does this actually do? Let's break them down.
`listen 443 ssl` enables HTTPS on HTTPS's port (443), that pretty HTTPS lock you see in your browser. This is supported by `ssl_certificate` and `ssl_certificate_key`. There's another configuration to reroute HTTP (plaintext) requests to HTTPS (secure) for the user experience.
`server_name` is effectively what we discussed in the past where we check the Host header. [For more, see Nginx's documentation on it](http://nginx.org/en/docs/http/request_processing.html).
`proxy_pass` is the key part. It tells the browser any requests under the path (`/` being virtually every request) to port 9992 on the server. This lets us host multiple sites and services, whether they're Django apps or other apps, often in Docker containers, at the same time on a series of innocuous ports and keep them out of sight and out of mind.
Other settings ensure even in the internal service sitting on port 9992 we are still getting key info like the Host. Django cares about this, because if you visit `terroristbombs.veryunsafe.com` or the raw IP of the server (in our case wouldn't work) and it's pointed at Go, Go may not want to serve you it.
All of this proxying is why if you visit the [raw IP of any of our servers](http://107.170.176.214/) you either get nothing useful or a failure of some variety.
# Once It Reaches The Right Port
We now need to check out what service is on port 9992 to actually manage anything granular about go. We run `lsof -i -n -P|grep 9992` as root to list useful information about ports to commands and search for 9992 specifically.
We get
```
systemd 1 root 107u IPv4 30604 0t0 TCP 127.0.0.1:9992 (LISTEN)
gunicorn 1265 go 7u IPv4 30604 0t0 TCP 127.0.0.1:9992 (LISTEN)
gunicorn 23126 go 7u IPv4 30604 0t0 TCP 127.0.0.1:9992 (LISTEN)
```
First and foremost, `systemd` is a fundamental building block system for Linux but probably indicates it has a `systemctl` service set for the application. In Go's case, this is set up.
```
root@albert:/etc/nginx/sites-enabled# systemctl status go
● go.service - Roomlist Gunicorn daemon
Loaded: loaded (/etc/systemd/system/go.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2018-04-11 16:32:18 EDT; 2 weeks 4 days ago
Main PID: 1265 (gunicorn)
Tasks: 2
Memory: 45.8M
CPU: 9min 15.871s
CGroup: /system.slice/go.service
├─ 1265 /srv/go/venv/bin/python2 /srv/go/venv/bin/gunicorn --pid /run/go/pid --log-level=debug --timeout=20 -b 127.0.0.1:9992
└─23126 /srv/go/venv/bin/python2 /srv/go/venv/bin/gunicorn --pid /run/go/pid --log-level=debug --timeout=20 -b 127.0.0.1:9992
Apr 30 09:13:41 albert.srct.gmu.edu gunicorn[1265]: [2018-04-30 09:13:41 +0000] [23126] [DEBUG] GET /
Apr 30 09:18:41 albert.srct.gmu.edu gunicorn[1265]: [2018-04-30 09:18:41 +0000] [23126] [DEBUG] GET /
Apr 30 09:23:41 albert.srct.gmu.edu gunicorn[1265]: [2018-04-30 09:23:41 +0000] [23126] [DEBUG] GET /
Apr 30 09:28:41 albert.srct.gmu.edu gunicorn[1265]: [2018-04-30 09:28:41 +0000] [23126] [DEBUG] GET /
```
We won't go in depth as to how the service is configured, but I should disclose in production it technically says `9991` in the command lines under "CGroup" but in almost all services it'd say `9992` in the command lines and it's functionally the same.
**It's worth noting another (probably better) way to do all of this is through UNIX Sockets** or in Django's case uwsgi. For more on Sockets, see the Tweeted comic below by Julia Evans aka @b0rk.
[![imt](/assets/img/imt/sockets.png "Sockets comic")](https://twitter.com/b0rk/status/810261842291462145)
So now we have a request that goes from the browser, to our server, to an internal port, and back out to the interwebz! Could be better, could be worse, but given limited man-hours and infrastructure, this is how most SRCT services look.
<style>
img[alt="imt"] {
width: 25%;
}
@media only screen and (max-width: 600px) {
img[alt="imt"] {
width: 100%;
}
}
</style>
\ No newline at end of file
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment