Exclude files from Dockerfile

How to exclude files from being added to docker image

TL;DR – use .dockerignore file, Docker’s equivalent of .gitignore for git


When building docker images, minimizing the size of the image is the goal. During building docker images with Dockerfile, especially with in a git repository, we might unintentionally add all the files into the docker image.

It is not uncommon to have something like “ADD . /app” in your Dockerfile. There are two ways to prevent this

  • Explicitly add only the files you need to Dockfile
  • Use .dockerignore file

A typical .dockerignore files in a git repo directory might look like this –

daniel@hidmo:/tmp/myapp$ ls -al
total 56
drwxr-xr-x  4 daniel daniel 4096 Dec 17 17:51 .
drwxrwxrwt 20 root   root   4096 Dec 17 17:51 ..
-rwxr-xr-x  1 daniel daniel  338 Dec 17 17:41 build-image.sh
drwxr-xr-x  2 daniel daniel 4096 Dec 17 17:41 .cache
-rw-r--r--  1 daniel daniel  245 Dec 17 17:41 Dockerfile
-rw-r--r--  1 daniel daniel  102 Dec 17 17:51 .dockerignore
drwxr-xr-x  8 daniel daniel 4096 Dec 17 17:52 .git
-rw-r--r--  1 daniel daniel    6 Dec 17 17:51 .gitignore
-rw-r--r--  1 daniel daniel  133 Dec 17 17:41 README
-rw-r--r--  1 daniel daniel  181 Dec 17 17:41 requirements.txt
-rw-r--r--  1 daniel daniel 7871 Dec 17 17:41 web.py
-rw-r--r--  1 daniel daniel 7871 Dec 17 17:50 web.pyc
daniel@hidmo:/tmp/myapp$ cat .dockerignore 
# Exclude files from being added to docker image
.git
.gitignore
.cache
.pyc
Dockerfile
README
readme

References

https://docs.docker.com/engine/reference/builder/#dockerignore-file

Linux – query a specific name server with nslookup or dig

By default, nslookup in Linux will use the name servers configured in /etc/resolv.conf. To check against a specific dns server, add the dns server IP address or name at the end of the nslookup command.

Below is an example to query Cloudflare name server 1.1.1.1 –

daniel@linux:/$ nslookup -type=MX gmail.com 1.1.1.1
Server:		1.1.1.1
Address:	1.1.1.1#53

Non-authoritative answer:
gmail.com	mail exchanger = 40 alt4.gmail-smtp-in.l.google.com.
gmail.com	mail exchanger = 5 gmail-smtp-in.l.google.com.
gmail.com	mail exchanger = 10 alt1.gmail-smtp-in.l.google.com.
gmail.com	mail exchanger = 20 alt2.gmail-smtp-in.l.google.com.
gmail.com	mail exchanger = 30 alt3.gmail-smtp-in.l.google.com.

For dns related debugging though, dig (under “dnsutils” package) is more feature rich. For troubleshooting the “dig +trace” command is handy in spotting failure points. Here is a useful link on how to use dig to troubleshoot dns issues –
https://linuxfreelancer.com/troubleshooting-dns-dig-tracing

daniel@linux:/$ dig @1.1.1.1 gmail.com mx +short
20 alt2.gmail-smtp-in.l.google.com.
30 alt3.gmail-smtp-in.l.google.com.
40 alt4.gmail-smtp-in.l.google.com.
5 gmail-smtp-in.l.google.com.
10 alt1.gmail-smtp-in.l.google.com.

References –

https://linux.die.net/man/1/nslookup

https://linux.die.net/man/1/dig

https://www.techradar.com/news/best-dns-server

Linux – Cheat sheet

Using curl to get help on Linux commands, programming languages and more. The most comprehensive cheat sheet.

If you are looking for a Linux and programming cheat sheet, please check
https://github.com/chubin/cheat.sh

It provides nicely colored help page, with plenty of examples in a CLI. Here are some sample runs I did.

Curl cheat sheet


daniel@hidmo:/tmp$ curl cheat.sh/curl
# Download a single file
curl http://path.to.the/file

# Download a file and specify a new filename
curl http://example.com/file.zip -o new_file.zip

# Download multiple files
curl -O URLOfFirstFile -O URLOfSecondFile

# Download all sequentially numbered files (1-24)
curl http://example.com/pic[1-24].jpg

# Download a file and follow redirects
curl -L http://example.com/file

# Download a file and pass HTTP Authentication
curl -u username:password URL 

# Download a file with a Proxy
curl -x proxysever.server.com:PORT http://addressiwantto.access

# Download a file from FTP
curl -u username:password -O ftp://example.com/pub/file.zip

# Get an FTP directory listing
curl ftp://username:password@example.com

# Resume a previously failed download
curl -C - -o partial_file.zip http://example.com/file.zip

# Fetch only the HTTP headers from a response
curl -I http://example.com

# Fetch your external IP and network info as JSON
curl http://ifconfig.me/all/json

# Limit the rate of a download
curl --limit-rate 1000B -O http://path.to.the/file

# POST to a form
curl -F "name=user" -F "password=test" http://example.com

# POST JSON Data
curl -H "Content-Type: application/json" -X POST -d '{"user":"bob","pass":"123"}' http://example.com

# POST data from the standard in / share data on sprunge.us
curl -F 'sprunge=<-' sprunge.us

Python lists cheat list

daniel@hidmo:/tmp$ curl cheat.sh/python/list
#  python - Why does += behave unexpectedly on lists?
#  
#  The general answer is that += tries to call the __iadd__ special
#  method, and if that isn't available it tries to use __add__ instead.
#  So the issue is with the difference between these special methods.
#  
#  The __iadd__ special method is for an in-place addition, that is it
#  mutates the object that it acts on. The __add__ special method returns
#  a new object and is also used for the standard + operator.
#  
#  So when the += operator is used on an object which has an __iadd__
#  defined the object is modified in place. Otherwise it will instead try
#  to use the plain __add__ and return a new object.
#  
#  That is why for mutable types like lists += changes the object's
#  value, whereas for immutable types like tuples, strings and integers a
#  new object is returned instead (a += b becomes equivalent to a = a +
#  b).
#  
#  For types that support both __iadd__ and __add__ you therefore have to
#  be careful which one you use. a += b will call __iadd__ and mutate a,
#  whereas a = a + b will create a new object and assign it to a. They
#  are not the same operation!

>>> a1 = a2 = [1, 2]
>>> b1 = b2 = [1, 2]
>>> a1 += [3]          # Uses __iadd__, modifies a1 in-place
>>> b1 = b1 + [3]      # Uses __add__, creates new list, assigns it to b1
>>> a2
[1, 2, 3]              # a1 and a2 are still the same list
>>> b2
[1, 2]                 # whereas only b1 was changed

#  For immutable types (where you don't have an __iadd__) a += b and a =
#  a + b are equivalent. This is what lets you use += on immutable types,
#  which might seem a strange design decision until you consider that
#  otherwise you couldn't use += on immutable types like numbers!
#  
#  [Scott Griffiths] [so/q/2347265] [cc by-sa 3.0]

Golang concurrency cheat sheet

daniel@hidmo:/tmp$ curl cheat.sh/go/concurrency
/*
 * go - When should I use concurrency in golang?
 * 
 * Not an expert in Go (yet) but I'd say:
 * 
 * Whenever it is easiest to do so.
 * 
 * The beauty of the concurrency model in Go is that it is not
 * fundamentally a multi-core architecture with checks and balances where
 * things usually break - it is a multi-threaded paradigm that not only
 * fits well into a multi-core architecture, it also fits well into a
 * distributed system architecture.
 * 
 * You do not have to make special arrangements for multiple goroutines
 * to work together harmoniously - they just do!
 * 
 * Here's an example of a naturally concurrent algorithm - I want to
 * merge multiple channels into one. Once all of the input channels are
 * exhausted I want to close the output channel.
 * 
 * It is just simpler to use concurrency - in fact it doesn't even look
 * like concurrency - it looks almost procedural.
 */

/*
  Multiplex a number of channels into one.
*/
func Mux(channels []chan big.Int) chan big.Int {
    // Count down as each channel closes. When hits zero - close ch.
    var wg sync.WaitGroup
    wg.Add(len(channels))
    // The channel to output to.
    ch := make(chan big.Int, len(channels))

    // Make one go per channel.
    for _, c := range channels {
        go func(c <-chan big.Int) {
            // Pump it.
            for x := range c {
                ch <- x
            }
            // It closed.
            wg.Done()
        }(c)
    }
    // Close the channel when the pumping is finished.
    go func() {
        // Wait for everyone to be done.
        wg.Wait()
        // Close.
        close(ch)
    }()
    return ch
}

/*
 * The only concession I have to make to concurrency here is to use a
 * sync.WaitGroup as a counter for concurrent counting.
 * 
 * Note that this is not purely my own work - I had a great deal of help
 * with this here (https:stackoverflow.com/q/19192377/823393).
 * 
 * [OldCurmudgeon] [so/q/19747950] [cc by-sa 3.0]
 */

Please check
https://github.com/chubin/cheat.sh for more information on installation and using its comprehensive features.

curl – use variables to show response times and other parameters


curl is a tool to interact with a server for transferring data. Although it supports various protocols, it is most commonly used with HTTP/S. It is sort of a browser for CLI folks and a go to tool when writing scripts to interact with servers.


In addition to transferring data, how do we show request and response parameters with curl. The answer is using variables, the complete list of variables can be found here.

Example – use “time_total” to show the total time, in seconds, that the full operation lasted.

$ curl  -w %{time_total} https://www.gcplinux.com
1.149143

It is best to add the variables in a file and use curl to reference the file for better formatting. Here I have added several http request and response variables I am interested in, such as num_connects, size_download, size_header, time_namelookup, time_pretransfer etc.


daniel@hidmo:/tmp$ cat ccurl.txt 
      url_effective:  %{url_effective}\n
       content_type:  %{content_type}\n
          http_code:  %{http_code}\n
       http_version:  %{http_version}\n
       num_connects:  %{num_connects}\n
      num_redirects:  %{num_redirects}\n
          remote_ip:  %{remote_ip}\n
      size_download:  %{size_download}\n
        size_header:  %{size_header}\n
    time_namelookup:  %{time_namelookup}\n
       time_connect:  %{time_connect}\n
    time_appconnect:  %{time_appconnect}\n
   time_pretransfer:  %{time_pretransfer}\n
      time_redirect:  %{time_redirect}\n
 time_starttransfer:  %{time_starttransfer}\n
                    ----------\n
         time_total:  %{time_total}\n


daniel@hidmo:/tmp$ curl -H 'Cache-Control: no-cache' -L -w "@ccurl.txt" -o /dev/null -s https://www.gcplinux.com
      url_effective:  https://gcplinux.com/
       content_type:  text/html; charset=UTF-8
          http_code:  200
       http_version:  1.1
       num_connects:  2
      num_redirects:  1
          remote_ip:  162.247.79.246
      size_download:  71273
        size_header:  537
    time_namelookup:  0.008585
       time_connect:  0.082511
    time_appconnect:  0.264110
   time_pretransfer:  0.264293
      time_redirect:  1.287257
 time_starttransfer:  3.077526
                    ----------
         time_total:  3.177939

As far as time related parameters, listed below are the ones you will most likely use –

  • time_appconnect The time, in seconds, it took from the start until the SSL/SSH/etc connect/handshake to the remote host was completed. (Added in 7.19.0)
  • time_connect The time, in seconds, it took from the start until the TCP connect to the remote host (or proxy) was completed.
  • time_namelookup The time, in seconds, it took from the start until the name resolving was completed.
  • time_pretransfer The time, in seconds, it took from the start until the file transfer was just about to begin. This includes all pre-transfer commands and negotiations that are specific to the particular protocol(s) involved.
  • time_redirect The time, in seconds, it took for all redirection steps including name lookup, connect, pretransfer and transfer before the final transaction was started. time_redirect shows the complete execution time for multiple redirections. (Added in 7.12.3)
  • time_starttransfer The time, in seconds, it took from the start until the first byte was just about to be transferred. This includes time_pretransfer and also the time the server needed to calculate the result.
  • time_total The total time, in seconds, that the full operation lasted.

References –


https://curl.haxx.se/docs/manpage.html

https://stackoverflow.com/questions/18215389/how-do-i-measure-request-and-response-times-at-once-using-curl

Linux how to zip a folder

How to zip or compress a folder or directory in Linux

In Linux or similar Operating Systems, zip utility is used to package and compress (archive) files.

Let us get straight to action, we have a folder to compress with zip tool –


daniel@hidmo:/tmp/tutorial$ tree .
.
??? zip-tutorial
    ??? chapter-1
    ?   ??? content
    ??? chapter-2
    ?   ??? readme
    ??? zip.txt

daniel@hidmo:/tmp/tutorial$ zip -r tutorial.zip zip-tutorial/
  adding: zip-tutorial/ (stored 0%)
  adding: zip-tutorial/zip.txt (deflated 55%)
  adding: zip-tutorial/chapter-2/ (stored 0%)
  adding: zip-tutorial/chapter-2/readme (deflated 55%)
  adding: zip-tutorial/chapter-1/ (stored 0%)
  adding: zip-tutorial/chapter-1/content (deflated 57%)

Basically we use “zip -r DESTINATION-FILE.ZIP FOLDER-TO-COMPRESS” to compress directory. Or in short “zip -r DESTINATION-FILE DIRECTORY-TO-COMPRESS“, we can skip the .zip extension.


daniel@hidmo:/tmp/tutorial$ zip -r tutorial zip-tutorial/
updating: zip-tutorial/ (stored 0%)
  adding: zip-tutorial/zip.txt (deflated 55%)
  adding: zip-tutorial/chapter-2/ (stored 0%)
  adding: zip-tutorial/chapter-2/readme (deflated 55%)
  adding: zip-tutorial/chapter-1/ (stored 0%)
  adding: zip-tutorial/chapter-1/content (deflated 57%)


To view the contents of the compressed folder without uncompressing it –

daniel@hidmo:/tmp/tutorial$ unzip -l tutorial.zip 
Archive:  tutorial.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2019-10-07 21:45   zip-tutorial/
     1202  2019-10-07 21:45   zip-tutorial/zip.txt
        0  2019-10-07 21:45   zip-tutorial/chapter-2/
     1202  2019-10-07 21:45   zip-tutorial/chapter-2/readme
        0  2019-10-07 21:44   zip-tutorial/chapter-1/
      722  2019-10-07 21:44   zip-tutorial/chapter-1/content
---------                     -------
     3126                     6 files

References –

https://linux.die.net/man/1/zip

https://superuser.com/questions/216617/view-list-of-files-in-zip-archive-on-linux

Error when running tree command

The tree command is a popular utility which lists the contents of a directory in a tree format, and it also allows users to specify the display depth of the directory tree. After installing the tree package in ubuntu, and running the tree command – I was getting below error:

$ tree .
sed: read error on .: Is a directory

The error doesn’t look like it is coming from the tree package just installed, after some digging I figured out that the “tree” command in this case was an alias. I use the Bash-it framework for a collection of bash commands and scripts and Bash-it has its own set of aliases including one for tree –

$ type tree
tree is aliased to `find . -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g''

In order to run the actual tree command, I had to prefix it with “command” or “\” as below –

$ command tree .
.
??? chapter-one
??? readme

1 directory, 1 file

$ \tree .
.
??? chapter-one
??? readme

1 directory, 1 file

References –

http://manpages.ubuntu.com/manpages/trusty/man1/tree.1.html

https://github.com/Bash-it/bash-it