Saturday, April 16, 2016

Gaia Embedded - How it works.


Hi everyone, I know you've been having these strange urges. You have these new feelings and you're not sure what to do about them. Everyone goes through this at one point. Its part of growing up. Meet me at camera three.

Ok, so we're here to talk about Gaia Embedded of course. Gaia Embedded is the OS that runs the SMB checkpoint firewalls. Its a combo of a uboot image, busybox, lua, sqlite3 databases and then all the normal stuff you would expect on a firewall. Your fw commands, environment variables and what not.

Another major difference is Gaia Embedded doesn't currently run on any x86/x64 cpu. As of right now it only runs on a ARM or MIPS CPU (that I know of). Meaning you can't just take an executable from say R77.20 Gaia and expect it to work on R77.20 for 1100.

First lets talk boot up. Gaia Embedded uses an image created via u-boot. This loads the kernel and the root file system, which is a rootfs (shocking!) and all the normal file systems.

Lets take a look! This is a small portion of /logs/boot_log. This provides a little hint of what is happening. Oh btw, this is a 1100 running R75.20.71.

Creating 11 MTD partitions on "nand_mtd":
0x00000000-0x000a0000 : "u-boot"
i2c driver was not initialized yet.
0x000a0000-0x00100000 : "bootldr-env"
0x00100000-0x00900000 : "kernel-1"
0x00900000-0x07a00000 : "rootfs-1"
0x07a00000-0x08200000 : "kernel-2"
0x08200000-0x0f300000 : "rootfs-2"
0x0f300000-0x16c00000 : "default_sw"
0x16c00000-0x18400000 : "logs"
0x18400000-0x18500000 : "preset_cfg"
0x18500000-0x18600000 : "adsl"
0x18600000-0x20000000 : "storage"


So it looks like one partition has something related to boot loader environment, then we have kernel, root (-1 and -2). default_sw, logs, preset_cfg (maybe factory default boots here?), adsl (who uses that still?) and storage.

Lets compare to what we have running.
[Expert@FW]# df -h
Filesystem                Size      Used Available Use% Mounted on
tmpfs                    20.0M    620.0k     19.4M   3% /tmp
tmpfs                    40.0M      7.4M     32.6M  18% /fwtmp
/dev/mtdblock7           24.0M      8.5M     15.5M  35% /logs
/dev/mtdblock10         122.0M     27.8M     94.2M  23% /storage
/dev/mtdblock5          113.0M     79.4M     33.6M  70% /pfrm2.0
tmpfs                    40.0M      1.1M     38.9M   3% /tmp/log/local
[Expert@FW]#


Looks like we found logs and storage. Maybe default_sfw is /pfrm2.0 (this is basically where most of the appliance lives).

So what else do we see? /tmp, /fwtmp,  /tmp/log/local are tmp file systems, meaning RAM based file systems. Technically virtual memory base file systems, but these boxes don't have a swap file so everything is for sure in RAM.

Now I want to point out there is no "/" in that listing. I'm pretty sure this is because / is a rootfs which is loaded by the kernel. Its kind of like a tmpfs, only it gets a list of files inserted into it before hand.

ok lets view /

[Expert@FW]# cd / ; ls -l
drwxr-xr-x    2 105      80              0 Apr 12 10:30 bin
lrwxrwxrwx    1 105      80              6 Dec 31  1969 data -> /flash
lrwxrwxrwx    1 root     root            8 Apr 13 05:20 dbg -> /tmp/dbg
drwxr-xr-x    5 5031     80              0 Apr 13 05:21 dev
drwxr-xr-x    7 105      80              0 Apr 15 07:07 etc
lrwxrwxrwx    1 root     root           16 Apr 13 05:20 flash -> /pfrm2.0/config1
drwxrwxrwt   12 root     root          860 Apr 16 11:20 fwtmp
lrwxrwxrwx    1 105      80             10 Dec 31  1969 init -> /sbin/init
drwxr-xr-x    3 105      80              0 Apr 13 05:21 lib
lrwxrwxrwx    1 105      80             11 Dec 31  1969 linuxrc -> bin/busybox
drwxr-xr-x    8 root     root            0 Apr 12 21:27 logs
drwxr-xr-x    8 105      80              0 Apr 13 05:20 mnt
lrwxrwxrwx    1 root     root           10 Apr 13 05:20 opt -> /fwtmp/opt
drwxr-xr-x   12 root     root            0 Dec 31  1969 pfrm2.0
dr-xr-xr-x   70 root     root            0 Dec 31  1969 proc
drwxr-xr-x    2 105      80              0 Apr 13 05:20 sbin
drwxr-xr-x    8 root     root            0 Apr 16 11:20 storage
drwxr-xr-x   10 root     root            0 Dec 31  1969 sys
drwxrwxrwt    4 root     root          320 Apr 16 11:20 tmp
drwxr-xr-x    2 root     root            0 Apr 13 05:21 usb
drwxr-xr-x    8 105      80              0 Apr 13 05:21 usr
drwxrwxrwx    8 105      80              0 Apr 16 10:46 var
drwxr-xr-x    2 root     root            0 Apr 13 05:21 web
[Expert@FW]#


Now.. something to notice. flash is a symbolic link (not a 3rd party app) to /pfrm2.0/config1. We'll come back to that.

Storage - this seems to be where online updates go? Not %100 sure. /logs its pretty much what it looks like. Logs..

Ok so lets talk black magic now...

Busybox. Busybox is a single application that will act differently based on how its called. You might be thinking, what do you mean by "how its called?". Let me show you with a shell script.

[Expert@FW]# echo "echo My Argument is \$1" > /tmp/script.sh
[Expert@FW]# cat /tmp/script.sh
echo My Argument is $1
[Expert@FW]# bash /tmp/script.sh hello!
My Argument is hello!
[Expert@FW]#


In this script i'm saying show me the first argument to this script and print it after the "is".

Guess what? There is also a $0, which is the name of the command (script in this case).

[Expert@FW]# echo "echo My Argument is \$0" > /tmp/script.sh
[Expert@FW]# bash /tmp/script.sh hello!
My Argument is /tmp/script.sh
[Expert@FW]#


See how that changed? Using this logic a script with the exact same contents could act differently if there was a change for how it was called.

Ok so i changed the script and now we have:

[Expert@FW]# cat /tmp/script.sh
if [ $0 == "hello" ] ; then
    echo My Argument is $1!
else
    echo "i don't know how $0 acts!"
fi
[Expert@FW]# bash script.sh
i don't know how script.sh acts!
[Expert@FW]#


Now lets make a copy of the script and call it hello.

[Expert@FW]# cp script.sh hello
[Expert@FW]# bash hello howdy
My Argument is howdy!
[Expert@FW]#


Ok so we've proven we can change how something reacts based purely on its file name!

In comes Busybox! Busybox is a swiss army knife. Its a single binary that has a lot of programs built into it. This is done for massive disk space savings.

[Expert@FW]# ls -l /bin/busybox
-rwxr-xr-x    1 105      80         745216 Dec 31  1969 /bin/busybox
[Expert@FW]#


754k. So whats in there?

[Expert@FW]# /bin/busybox
BusyBox v1.8.1 (2015-04-26 16:47:09 IDT) multi-call binary
Copyright (C) 1998-2006 Erik Andersen, Rob Landley, and others.
Licensed under GPLv2. See source distribution for full notice.

Usage: busybox [function] [arguments]...
   or: [function] [arguments]...

        BusyBox is a multi-call binary that combines many common Unix
        utilities into a single executable.  Most people will create a
        link to busybox for each function they wish to use and BusyBox
        will act like whatever it was invoked as!

Currently defined functions:
        [, [[, addgroup, adduser, adjtimex, ar, arp, arping, ash,
        awk, basename, bunzip2, bzcat, bzip2, cal, cat, catv,
        chattr, chgrp, chmod, chown, chpasswd, chpst, chroot,
        chrt, chvt, cksum, clear, cmp, comm, cp, cpio, crond,
        crontab, cryptpw, cut, date, dc, dd, deallocvt, delgroup,
        deluser, df, dhcprelay, diff, dirname, dmesg, dnsd, dos2unix,
        du, dumpkmap, dumpleases, echo, ed, egrep, eject, env,
        envdir, envuidgid, ether-wake, expand, expr, fakeidentd,
        false, fbset, fdflush, fdformat, fdisk, fgrep, find, fold,
        free, freeramdisk, fsck, fsck.minix, ftpget, ftpput, fuser,
        getopt, getty, grep, gunzip, gzip, halt, hdparm, head,
        hexdump, hostid, hostname, httpd, hwclock, id, ifconfig,
        ifdown, ifup, inetd, init, insmod, install, ip, ipaddr,
        ipcalc, ipcrm, ipcs, iplink, iproute, iprule, iptunnel,
        kbd_mode, kill, killall, killall5, klogd, last, length,
        less, linux32, linux64, linuxrc, ln, loadfont, loadkmap,
        logger, login, logname, logread, losetup, ls, lsattr,
        lsmod, lzmacat, makedevs, md5sum, mdev, mesg, mkdir, mkfifo,
        mkfs.minix, mknod, mkswap, mktemp, modprobe, more, mount,
        mountpoint, mt, mv, nameif, netstat, nice, nmeter, nohup,
        nslookup, od, openvt, passwd, patch, pidof, ping, ping6,
        pipe_progress, pivot_root, poweroff, printenv, printf,
        pscan, pwd, raidautorun, rdate, readahead, readlink, readprofile,
        realpath, reboot, renice, reset, resize, rm, rmdir, rmmod,
        route, rpm, rpm2cpio, run-parts, runlevel, runsv, runsvdir,
        rx, sed, seq, setarch, setconsole, setkeycodes, setlogcons,
        setsid, setuidgid, sh, sha1sum, slattach, sleep, softlimit,
        sort, split, start-stop-daemon, stat, strings, stty, su,
        sulogin, sum, sv, svlogd, swapoff, swapon, switch_root,
        sync, sysctl, syslogd, tail, tar, taskset, tcpsvd, tee,
        telnet, telnetd, test, tftp, time, top, touch, tr, traceroute,
        true, tty, ttysize, udhcpc, udhcpd, udpsvd, umount, uname,
        uncompress, unexpand, uniq, unix2dos, unlzma, unzip, uptime,
        usleep, uudecode, uuencode, vconfig, vi, vlock, watch,
        watchdog, wc, wget, which, who, whoami, xargs, yes, zcat,
        zcip

[Expert@FW]#



That is a lot of programs! So how does busybox know how to act? symbolic links!

[Expert@FW]# ls -l mv
lrwxrwxrwx    1 105      80              7 Dec 31  1969 mv -> busybox
[Expert@FW]#


So as you can see mv is a symbolic link to busybox.

You can feel free to poke around in there and see what else you can learn. Lets move on.

I started talking about the boot up process. Normally unix uses /etc/init.d/ stuff for booting. There are files in there but most of the heavy lifting is done in

/pfrm2.0/etc/cpInit


This is where firewall kernel modules are loaded and all kinds of things happen.

If you need run a script at boot up, you'll need to create the following.

/pfrm2.0/etc/userScript


Ok, so what else can we talk about? Where do your configuration changes go that are made from clish or the webui?

Right here! ( /flash )

[Expert@FW]# ls -l
drwxr-xr-x    2 root     root            0 Dec 27 11:42 ace
-rw-r--r--    1 root     root           35 Dec 27 11:45 expert_pass_
drwxr-xr-x   10 root     root            0 Dec 27 11:42 fw1
-r--r--r--    1 root     root          373 Apr 12 10:28 passwd
-r-xr-xr-x    1 105      80            950 Sep  2  2015 restore_future_settings_hook.sh
-rw-------    1 root     root          255 Apr 12 10:28 shadow
drwxr-xr-x    4 root     root            0 Dec 27 11:43 sofaware
-rw-r--r--    1 root     root       760832 Apr 16 10:58 system.db
drwxr-xr-x    2 root     root            0 Dec 27 11:42 tmp
-rw-r--r--    1 root     root         1122 Dec 28 05:45 top_last_day_report.json
-rw-r--r--    1 root     root         1123 Dec 26 18:01 top_last_hour_report.json
[Expert@FW]#


Notice a few things. shadow and passwd? these files are copied over to /etc on boot up or when changes are made via clish/webui.

The interesting one is system.db. This is a sqlite3 database. Want to read it? SURE!

echo .dump | sqlite3 system.db > /logs/system-db.txt


Now you can view all the table schemes.. schema?.. whatever.. database output!

Something else interesting. Gaia Embedded on all platforms has a built in switch, fully managed switch!
I'm not going to dive into that right now, but you can split the ports and do basically anything a normal layer 3 switch would do. Cool stuff.

Now.. something odd i've noticed. If for some reason your doing dynamic routing on Gaia Embedded, keep this in mind when trouble shooting routing issues. If for some reason routed crashes it won't be restarted (well it depends on which routed process crashes but lets just say all of them crash). BUT!! If you login via cli and issue a show route, it will pause, then restart routed under the sheets, THEN show you the output.

This can be VERY confusing as it will look like all the sudden a issue fixed itself before you've had a chance to look at it. Want to see this in action? Setup a lab, get OSPF running. Kill routed then do a show route from clish.

Speaking of crashes!

When a process crashes on Gaia Embedded the kernel will use this sysctl to figure out how to generate the core file.

[Expert@FW]# sysctl kernel.core_pattern
kernel.core_pattern = |/pfrm2.0/bin/core_dump.sh
[Expert@FW]#


This means the core file will be piped into the shell script /pfrm2.0/bin/core_dump.sh.

Lets look at that shell script.

[Expert@FW]# cat /pfrm2.0/bin/core_dump.sh
#!/bin/sh

cat > /logs/core
[Expert@FW]#


So... it pipes the core into cat and writes a file called /logs/core. I'm not following why they didn't just set kernel.core_pattern = /logs/core but sometimes its best to not ask questions. :)

Ok two things! You will only ever have a hit at while a single process crash and only the latest one because of this. That being said how do you know what process crashed? We only have a file called /logs/core.

We use the magic file command!

Lets tell sleep to go away most violently and check out core file. I'm going to tell sleep to sleep for 1000 seconds, then kill it with -6 (seg fault i think) %1 is the first job running in the background.

[Expert@FW]# sleep 1000 &
[1] 18008
[Expert@FW]# kill -6 %1
[Expert@FW]#
[1]+  Aborted                 (core dumped) sleep 1000
[Expert@FW]#

[Expert@FW]# ls -l /logs/core
-rw-r--r--    1 root     root       274432 Apr 16 11:56 /logs/core
[Expert@FW]# file /logs/core
/logs/core: ELF 32-bit LSB core file ARM, version 1 (SYSV), SVR4-style, from 'sleep'
[Expert@FW]#


If you were trouble shooting something at this point I would say, create a cpinfo (cpinfo -o /logs/`hostname`.cpinfo.gz -z) and then download that core file also. If you aren't faint of heart I would also say do backtrace on said core file as well. You'll need gdb to do that. You can request it from checkpoint or use mine from the tools page. More on doing a backtrace later.

I'll update this with anything else i can think of, but for now thats all folks!

No comments:

Post a Comment

Danger Will Robinson!