Playing with the HTTP send/response cycle in Ruby, without Faraday ("HTTP Yeah You Know Me" project)

April 2017

Reading time: 29 mins

Article Table of Contents

Finding GET requests/responses in the wild
Setting up my own requests in Ruby, seeing headers in Postman
Congrats!
Iteration 3: supporting params

As part of the HTTP Server project.

First, I’m working through Practicing Ruby’s “Implementing an HTTP File Server” for general practice and understanding.

I’m going to use Postman to capture traffic and try to replicate some of the things the guides reference.

Lastly, I just found Jeff Casamir’s walkthrough of almost everything I’ve discovered in the last week. Save yourself the time, and go read it here

Finding GET requests/responses in the wild #

First, I need to configure postman to show request/response headers, like what the guides indicate:

# HTTP Request
GET /file.txt HTTP/1.1
User-Agent: ExampleBrowser/1.0
Host: example.com
Accept: */*

# HTTP Response
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 13
Connection: close

hello world

Turns out Postman cannot do this caputuring by default.

So, follow these instructions

Lets first look at some get requests in the wild:

Using Postman, I made a get request for turing.io. I get back lots of HTML, etc. But I want to see the headers. The “Headers” tab in the Postman request isn’t really helpful.

What does all this mean? Doesn't seem to correspond at all to the guides I'm working through

So, we’ll look at “network traffic” per the following:

right click inside postman > “inspect” > Network tab > “clear” the screen, if anything is in it, > switch back to postman, re-send the GET request > switch back to “inspect window”.

Looks like this:

this looks much better

OK, cool. This stuff looks promising: a hah!

Everything I highlighted corresponds to items mentioned in the instructions. This is approaching “interesting”, and I feel like there might be an insight around the corner.

Setting up my own requests in Ruby, seeing headers in Postman #

So, I can find the headers for any GET request I might make in the wild, so now I’m going to try to do the same setup with my own little ruby server, per this guide.

I’m following the author’s instructions to:

To begin, let’s build the simplest thing that could possibly work: a web server that always responds “Hello World” with HTTP 200 to any request

I’m writing everything in the http_yeah_you_know_me directory. I’ll delete it before pushing a final project version to Github.

require 'socket'

server = TCPServer.new('localhost', 2345)

Turns out TCPServer is a Ruby class, just like an enumerable or array or hash or Fixnum or NilClass, etc. Cool.

It’s got some methods that I won’t worry about right now.

Moving on:

loop do
  socket = server.accept
  request = socket.gets

  STDERR.puts request

  response = "Hello World!\n"

  socket.print "HTTP/1.1 200 OK\r\n" +
               "Content-Type: text/plain\r\n" + 
               "Content-Length: #{response.bytesize}\r\n" + 
               "Connection: close\r\n"

The loop seems interesting. I tested the simple_server.rb file before writing this, and it seemed that the socket closed as soon as a single request was made. (I.E. I’d start the “server”, and as soon as I interacted with it in any way, it would do something and close. So I wonder if this loop will keep the server open.

Please read through the commented version of what I’m writing above. It’s a good explenation, but I’ve noticed something interesting. We’re printing a HTTP Response. See the HTTP/1.1 200OK in the first line? Does that look familiar?

looks like we're writing some of this header

  socket.print "\r\n"
  socket.print response
  socket.close
end

That first line is, according to the comments, the required new-line to tell the server (or something) that the header is over and the body content is coming. The response references a previously defined variable of response = "Hello World!\n", so we’re printing hello world as the content.

Lets try to fire this thing up. Save file, run as a program from the command line, and visit http://localhost:2345/anything.

Looks like it worked:

printed to the browser

But it’s not very exciting. Lets get back into Postman and try to dig deeper:

Now we’re getting more. I can see every parameter passed through to the server, and it is staying open! (Put a flag in that. The “simple server” we made for HTTP_yeah_you_know_me can be kept open with a loop. Cool.

postman gif

And, as the author mentioned in the tutorial, this seems to be working:

  # Log the request to the console for debugging
  STDERR.puts request

STDERR printed to screen. I bet this is useful for later

And, as recommended, open up another window and run curl --verbose -XGET http://localhost:2345/anything_you_want_to_put_here

I like to understand shell commands when I run them, so if you look at the MAN pages for curl, you can find what each of those flags means.

curl --verbose means “curl talks a lot”, and more importantly, lines beginning with > is header data sent by curl, and lines beginning with < is header data received by curl. Lines beginning with * is additional info added by CURL.

I suspect this will be important.

-XGET is a flag that, according to the MAN page, “specifies custom request method”. the -X lives by itself, and whatever follows is what it makes as the request method, so -XGET, -XPOST, and -XDELETE are all probably valid uses of the flag.

Here’s the output for curl --verbose -XGET htpp://localhost:23454/whatever_blah_blah_blah

[experiments (setup)]$ curl --verbose -XGET http://localhost:2345/whatever_blah_blah_blah
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 2345 (#0)
> GET /whatever_blah_blah_blah HTTP/1.1
> Host: localhost:2345
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Content-Length: 13
< Connection: close
<
Hello World!
* Curl_http_done: called premature == 0
* Closing connection 0

Cool. We’ve got data sent by curl (prepended with >), data received (prepended with <), what is actually passed through (the one line without any characters at the beginning) and of course, additional data returned by CURL. (begins with *).

Phew. Learning some stuff, eh? I still don’t know what most of this means, but it’s building out a mental model. So, keep experimenting. You’re typing all these commands yourself, right? no copy-pasta from here. That’ll keep you from learning as much.

Congrats! #

This is the end of the first section of the post, building a simple server. Now, moving along, gonna build a more complex one.

This is where I look over my notes, take a walk, and get some coffee. This is a lot of work, so let it sink in and consolidate.

Serving files over HTTP #

I’m now working through the second half of this tutorial.

He says:

For each incoming request, we’ll parse the Request-URI header and translate it into a path to a file within the server’s public folder.

What’s the request URI? How’s it different from a URL? I’m not sure, but I can infer some things. Mostly, looking at some of the results of poking around in different pages in Postman, and looking at the results, I think we can figure this out. (Googling around for “URI vs URL didn’t help a lot)

We know that the first line of a HTTP Request looks like this: GET / HTTP/1.1. This is three things:

GET is the method.
/ is the “path”
HTTP/1.1 is the protocal (or possibly called the “scheme”).

So, if I go to the root of a website (like www.turing.io) the header will be: GET / HTTP/1.1.

If I go to a specific page (or resource) I’ll go to www.turing.io/our-team. What do you think the GET request would look like then? Here’s my guess:

GET /our-team HTTP/1.1

Lets fire it up in POSTMAN and see:

looks promising

Cool. So the path is just website.com/<the_path>. Or, to tie it into ruby-speak, when we require another file, we define its from the root directory. Similar idea with the URI.

Onward.

First bit of code in the second file. (I’d start a new file from scratch, and type every line…)

require 'socket'
require 'uri'

WEB_ROOT = './public'

CONTENT_TYPE_MAPPING = {
  'html' => 'text/html',
  'txt' => 'text/plain',
  'png' => 'image/png',
  'jpg' => 'image/jpeg'
}

DEFAULT_CONTENT_TYPE = 'application/octet-stream'

Makes sense, except I have no idea what an octet-stream is. Don’t really care, though. This just is mapping input values to something that our HTTP server can understand, I believe.

def content_type(path)
  ext = File.extname(path).split(".").last
  CONTENT_TYPE_MAPPING.fetch(ext, DEFAULT_CONTENT_TYPE)
end

We’ve seen File.extname before, when doing FileIO stuff. So this is taking the file extension, splitting it from the file name, and trying to match it to the hash we just wrote. If it matches, it’ll pull and do something with the matching value. I.E. jpg => image/jpeg.

Onward, a bunch of other stuff that is explained in the commented out portions of the tutorial, and I barely understand it, so skipping them.

We see this again:

(You typed it out by hand, right?)

loop do

  socket = server.accept
  request_line = socket.gets

  STDERR.puts request_line

  path = requested_file(request_line)

  if File.exist?(path) && !File.directory?(path)
    File.open(path, "rb") do |file|
      socket.print "HTTP/1.1 200 OK\r\n" +
                  "Content-Type: #{content_type(file)}\r\n" +
                  "Content-Length: #{file.size}\r\n" +
                  "Connection: close\r\n"

      socket.print "\r\n"

      IO.copy_stream(file, socket)
    end

  else
    message = "File not found\n"

    socket.print "HTTP/1.1 404 Not Found\r\n" +
                "Content-Type: text/plain\r\n" +
                "Content-Length: #{message.size}\r\n" +
                "Connection: close\r\n"

    socket.print "\r\n"

    socket.print message
  end

I was curious about all the \r\n characters. It’s obviously something to do with a line break.

On stack overflow, I found this:

The CRLF \r\n, should be used according to the php documentation. Also, to conform to the RFC 2822 spec lines must be delimited by the carriage return character, CR \r immediately followed by the line feed, LF \n.

Seems like it’s a Carriage Return Line Feed symbol, native to PHP, that inserts a really “hard” line break. That makes since, since earlier in the guide, the author mentioned that HTTP request/responses are very sensitive to line breaks and white spaces. So, I’m flagging this as “important” for later use.

And, keep following the instructions. I got it to serve up an /index.html page by default, and downloaded a few other files that I “stored” in a public directory in the same place as the server-generation script.

So, I feel like I’ve got my head a little more wrapped around server stuff in Ruby. Writing it out helps me a bit, maybe will be helpful to others, too.

Play around with building your own HTTP Response - LIVE! #

I’m keeping notes for myself and my own understanding - I’m not good at teaching that which I barely can wrap my head around, so… I apologize for the upcoming disconnectedness or holes in reasoning.

I’ve really, realy wanted to get a tool up and running that lets me talk, as a server, to something that’s expecting a given response. This is so I can get one level of abstraction lower to understand these HTTP requests/responses.

I just figured out how to do it.

We’ll use two tools, both already existing in your terminal:

nc (Stands for “Net Cat”)
CURL (stands for C-URL? “see URL”?)

I strongly recommend downloading and installing tldr, as it will let you more quickly explore terminal commands. We’ll be using “flags” for both the above utilities. (a flag makes a program do something other than it’s default. Like ls vs ls -l)

You’ll need two terminal windows side by side, like this:

side by side

On the left is netcat, on the right is CURL.

I started the nc command on the left with nc -l 9090. If you’re reading this, please, please install the tldr tool and run (in your terminal) tldr nc.

Now, look up what -l does.

On the right, I’ve got CURL, running as curl -v localhost:9090. Look it up in TLDR, tell me what -v does. (Hint, i’ve outlines it above.)

Here’s using them in tandem: gif of terminal utilities

You can write, line by line, your own valid header. You’ll notice it ends whenever you add a new-line character, and when you DO try to send actual content, if you get the “content-length” figure wrong, it’ll cut you off.

example:

gif of building a header and getting cut off

I want to talk to my ruby program now #

I feel lik I can use Netcat to impersonate a client (or server? still not sure) and talk to my ruby program (which is operating as a server…) So… lets do that.

Here’s the ruby I’m running:

require 'pry'
require 'socket'

class HTTP
  attr_reader :server

  def initialize
    @server = TCPServer.new(9090)
  end

  puts "server's up, capiTAN"

  def receive_request_send_response
    while client = server.accept
      whole_response = []
      while line = client.gets and !line.chomp.empty?
        whole_response << line
      end
      whole_response
    end

    output = "<html><head></head><body>holy what the fuck</body></html>"
    
  end
end

p server = HTTP.new
p server.server
p server.receive_request_send_response

When I run the ruby program, it just hangs. Which means it’s stuck in while client = server.accept. When I run curl -v localhost:9090 it too just sits and waits. Everyone is waiting for a response, but not even a header is sent, else curl would have shown

> header data
> sent by 
> curl

So, CURL doesn’t have anything to send to. I wonder why.

nc has an option. Checking out tldr nc the first option says:

Connect to a certain port (you can then write to this port): nc ip_address port

Good enough for me.

nc localhost 9090 doesn’t do anything in either the ruby program or NC.

Lets try submitting a well-formatted HTTP request, and see what happens

I tried typing in exactly what CURL had sent last time:

> GET / HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.52.1
> Accept: */*

and got nothing. but that’s OK, cuz my program (right now) isn’t showing me the captured HTTP request. Lets modify it to do that.

…

still trying to get this to work. Rolling back to a “known good” state w/my ruby program that captures server data.

Oh, worth noting:

if you see something that has this:

GET / HTTP/1.1

What is it? A request or a response?

How about this:

HTTP/1.1 200 OK

The first one is a request, the latter a response. I’m starting to not confuse them.

Boom, did it. First, I rolled back to the “server” we wrote as part of the tutorial I linked above.

require 'socket'

tcp_server = TCPServer.new(9292)
client = tcp_server.accept

puts "ready for a request"

request_lines = []

while line = client.gets and !line.chomp.empty?
  request_lines << line.chomp
end

puts "I (the server) just got this request:"
puts request_lines.inspect


puts "I (the server) am sending response."
response = "<pre>" + request_lines.join("\n") + "</pre>"
output = response
headers = ["http/1.1 200 ok",
          "date: #{Time.now.strftime('%a, %e %b %Y %H:%M:%S %z')}",
          "server: ruby",
          "content-type: text/html; charset=iso-8859-1",
          "content-length: #{output.length}\r\n\r\n"].join("\r\n")
client.puts headers
client.puts output

puts ["Wrote this response:", headers, output].join("\n")
client.close
puts "\nResponse complete, exiting."

I started that, and then in a different tab ran nc localhost 9292 (please note I’m mixing up some port numbers here, so don’t just copy/paste and expect any of this to work.)

it's working!

So, now the server will capture correctly-formatted HTTP requests and print them. Now lets build our own HTTP request in Netcat

Aww, yeah. Got it working. Check it out:

built my own fake HTTP request

I’m slowly (very slowly) building a mental map of what’s going on behind the scenes, but this feels like a good realization.

Back to the class project and iteration 0 #

So we now know exactly what this is doing:

while line = client.gets and !line.chomp.empty?
  request_lines << line.chomp
end

It’s taking in header data from the client making the HTTP request, and ONLY header data. As soon as it gets a blank line (which is the universal sign for “end of header data”) it stops adding new lines to the header data.

I’m restarting at Iteration 0, but this time putting everything in a class and methods. I’m not testing yet, that’ll come later.

I’ve also not touched Faraday. ¯_(ツ)_/¯

OK, got Iteration working, inside of a class:

require 'pry'
require 'socket'

class HTTP
  attr_reader :server
  attr_accessor :client_response, :counter

  def initialize
    @server = TCPServer.new(9090)
    @client_response = []
    @counter = 0
  end

  # note to myself in terminal when this gets hit.
  puts "server's up, capiTAN"

  def send_message
    while client = server.accept
      while line = client.gets and !line.chomp.empty?
        client_response << line = line.chomp
      end
        # assign a big block of text to a string with %{text}. New lines get \n auto-added
        output = %{<html><head></head><body>
          #{client_response.join("\n")}
          this server has been (re)started #{counter} times
          </body></html>\n}

        headers = [
          "HTTP/1.1 200 OK",
          "date = #{Time.now.strftime("%Y-%B-%d, %l:%M %P")}",
          "server: ruby",
          "content-type: text/html; charset=UTF-8",
          "content-length: #{output.length}",
          "\r\n"
        ]

      client.puts headers
      client.puts output
      @counter += 1
    end
  end
end

my_server = HTTP.new
my_server.send_message

I can hit it in Postman or with CURL, and everything is expected. It’s ugly, but damnit, it works.

Iteration 1 #

I’m thinking I just need to add the HTML <pre></pre> tags to pretty-format the returned text.

require 'pry'
require 'socket'

class HTTP
  attr_reader :server
  attr_accessor :client_response, :counter

  def initialize
    @server = TCPServer.new(9090)
    @client_response = []
    @counter = 0
  end

  # note to myself in terminal when this gets hit.
  puts "server's up, capiTAN"

  def send_message
    while client = server.accept
      while line = client.gets and !line.chomp.empty?
        client_response << line = line.chomp
      end
        # assign a big block of text to a string with %{text}. New lines get \n auto-added
        output = %{<html><head></head><body>
          <pre>#{client_response.join("\n")}</pre>
          this server has been (re)started #{counter} times
          </body></html>\n}

        headers = [
          "HTTP/1.1 200 OK",
          "date = #{Time.now.strftime("%Y-%B-%d, %l:%M %P")}",
          "server: ruby",
          "content-type: text/html; charset=UTF-8",
          "content-length: #{output.length}",
          "\r\n"
        ]

      client.puts headers
      client.puts output
      @counter += 1
    end
  end
end

my_server = HTTP.new
my_server.send_message

When I do this, my @client_response array gets longer… and longer… and longer. because everytime I reload the server I re-add new diagnostic info to it. So, lemme fix that.

Now, I’m having trouble getting the server to recognize the end of the header and the content.

The problem is with my new-line characters in my header response, I believe.

Editing my headers to:

  headers = ["HTTP/1.1 200 OK\r\n" +
          "date = #{Time.now.strftime("%Y-%B-%d, %l:%M %P")}\r\n" +
          "server: ruby\r\n" +
          "content-type: text/html; charset=UTF-8" +
          "content-length: #{output.length}" +
          "\r\n\r\n"
        ]

and we’re almost back in action. Here’s the CURL output:

[http_yeah_you_know_me (setup)]$ curl -v localhost:9090
* Rebuilt URL to: localhost:9090/
*   Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 9090 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 9090 (#0)
> GET / HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 200 OK
< date = 2017-February-14,  5:22 pm
< server: ruby
< content-type: text/html; charset=UTF-8content-length: 215
* no chunk, no close, no size. Assume close to signal end
<

          <html><head></head><body>
          <pre>GET / HTTP/1.1
Host: localhost:9090
User-Agent: curl/7.52.1
Accept: */*</pre>
          this server has been (re)started 0 times
          </body></html>

It thinks the content ended before the last line. (see how content-length: 215 is directly following content-type? lets add an \r\n to the end of that line in my Header output.

updated headers to:

  headers = ["HTTP/1.1 200 OK\r\n" +
          "date = #{Time.now.strftime("%Y-%B-%d, %l:%M %P")}\r\n" +
          "server: ruby\r\n" +
          "content-type: text/html; charset=UTF-8\r\n" +
          "content-length: #{output.length}\r\n" +
          "\r\n"
        ]

and it works! Iteration 1 done.

Iteration 2: supporting paths #

well eff, my existing code is so horrible path support is a major PITA.

Part of the problem is I cannot easily calculate my output length, so building my response, I cannot tell the server (easily) how many characters to expect.

Time for some refactoring, I guess.

I’m going to try to clean up my http response block. Here’s what I’ve got:

helloworld_path = ["<html><head></head><body>" +
        "hello world, this has been reloaded #{counter} times." +
        "</body></html>"]

      @counter += 1

      default_path = ["<html><head></head><body>\r\n" +
        "<pre>#{client_response.join("\n")}</pre>\r\n" +
        "this server has been (re)started #{counter} times\r\n" +
        "</body></html>\n"
      ]

      headers = ["HTTP/1.1 200 OK\r\n" +
        "date = #{Time.now.strftime("%Y-%B-%d, %l:%M %P")}\r\n" +
        "server: ruby\r\n" +
        "content-type: text/html; charset=UTF-8\r\n" +
        "content-length: #{helloworld_path.join.length}\r\n" +
        "connection: close" +
        "\r\n\r\n"
      ]
      client.puts headers

notice I’ve hard-coded “helloworld_path”. That’s because trying to add path support revealed this little problem to me.

It’s a PITA, but… a few hours of work after ^^, I sorted it out. There’s a few pieces in play:

Pulling data from the GET request (like path)
building the FRAME if the server response. Hardcode some pieces, use variables for the stuff that changes
Build variables that hold the variable parts of the response. (seems obvious, right? That’s why I wanna smash through my desk with my head.)

I’ll add comments to the following code. This mostly handles iteration 2:

def send_message

    # client_response holds all the received data, like `GET / HTTP/1.1`. The / is of particular interest.
    client_response = []
    while client = server.accept
      while line = client.gets and !line.chomp.empty?
        client_response << line = line.chomp
      end
      # ^^ end of building the client response
      
      # pulling out the path, be it /, or /hello, or /datetime, etc.
      path = client_response[0].split[1]

      # building variables to insert into my HTTP response block later
      
      time = Time.now.strftime("%Y %B %d, %H:%M %z")
      
      # possible responses based on path
      hello_path = ["hello world, this has been reloaded #{counter} times."]
      default_path = ["<pre>#{Time.now}\nThis is my default path</pre>\r\n"]
      datetime_path = ["<pre>The time is #{time}</pre>"]
      shutdown_path = ["Total requests: #{counter}\nExiting..."]

      if path == "/"
        response = default_path
      elsif path == "/hello"
        response = hello_path
      elsif path == "/datetime"
        response = datetime_path
      elsif path == "/shutdown"
        response = shutdown_path
      end

      # building the server response. It'll always get wrapped the the following HTML. the `join` method fixes the fact that the parameters are currently wrapped in brackets. example: hello_path = ["some string here"])
      output = "<html><head></head><body>#{response.join}</body></html>"

      # this is the fixed part of the server response. Every response will be the same, except the date changes to be Time.now, and output.length MUST be accurate, or the server will either not print the full request, or it'll hang after all the data is received, because it's expecting more.
      headers = ["http/1.1 200 ok",
                "date: #{time}",
                "server: ruby",
                "content-type: text/html; charset=iso-8859-1",
                "content-length: #{output.length + 1}\r\n\r\n"].join("\r\n")

      # incrementing my counter
      @counter += 1
      
      # client.puts is the _sending_ actual response to the server. Headers is distinct from the output. 
      client.puts headers
      client.puts output
       
      # this stuff is sorta ugly random shit to reset the values so when I reload the page, it'll accept different paths. I am 100% sure this is not best practice.
      client.close

      # I couldn't get my shutdown method to sit neatly in one if statement. ¯\_(ツ)_/¯ 
      client.exit if path == "/shutdown"
      # resetting values for next connection
      response = nil
      path = nil
      client_response = []
    end


  end
end

my_server = HTTP.new
my_server.send_message

Iteration 3: supporting params #

Given a sample paramter like

http://host:port/path?param=value&param2=value2

how can we pull the parameters? That will be the first step to doing something with them. I don’t care how I store them for now.

I think I can pull everything following the ? using a REGEX match or split or something.

lets play around in pry with the string.

string = "http://host:port/path?param=value&param2=value"

# eff, so confused. 

# ok, I used this:

path = client_response[0].split[1]
word_search = path.split("=")[1]

here’s the output from pry:

path                 ... "/word_search?word=house"
response             ... nil
shutdown_path        ... nil
time                 ... nil
word_search          ... "house"

Josh Thompson about blog