Pages

Sunday, August 29, 2010

wget: downloader extraordinaire

I always found browsers too bothersome or intrusive when downloading files. The task of downloading files via browsers always require too much clicking and configuring. I'd rather spend time on the task at hand rather than configuring the crap out of some unintuitive graphical user interface.

I always have a shell handy when I'm browsing, so I always download files using wget.

I use an alias to predefine the arguments in $HOME/.bashrc I usually need to download like this:


alias wget="$(which wget) -m -U 'Mozilla/5.0 (compatible; Konqueror/3.2; Linux)' \
-e robots=off --wait 1 -c -nd -nH"


I put a which in there because I'm too lazy to type /usr/bin/wget; the purpose of putting it there is to escape any other current aliases or scripts which might get called using the same invocation name, namely wget.

The arguments there are crafted to make wget ignore robots.txt, to make wget pretend to be a browser. This is generally frowned upon. People call this bad netiquette. As I said, I'd rather get the job done...

If you decide to download recursively, you'll need the following commands:

  • Download recursively with -r --level n, where n is the maximum recursion depth;
  • To reject all files except, you'll need -R<.ext>, where .ext is some type of file extension, for instance .mp3
  • To allow only a certain file extension, you'll need -A<.ext>.

Have fun!

Tuesday, August 24, 2010

Slackware 13.1 woes

Slackware, why have thou forsaken me?

At home:

The upgrade from 13.0 to 13.1 killed my /dev/sdX where / is mounted, apparently 13.1 has no love for reiserfs anymore. (On the other hand, my hard drive might be showing symptoms due to its age)

I hope it's nothing serious. Gotta fix it fast though.

At work:

Xorg on 13.1 does not like me changing keyboard settings from hal, after I done the following:

cp /usr/share/hal/fdi/policy/10osvendor/10-keymap.fdi /etc/hal/fdi/policy/
nano -w /etc/hal/fdi/policy/

Before:
<?xml version="1.0" encoding="ISO-8859-1"?>
<deviceinfo version="0.2">
  <device>
    <match key="info.capabilities" contains="input.keymap">
      <append key="info.callouts.add" type="strlist">hal-setup-keymap</append>
    </match>
    <match key="info.capabilities" contains="input.keys">
      <!-- Restore Ctrl-Alt-Bksp Xserver Zapping -->
      <!--<merge key="input.xkb.options" type="string">terminate:ctrl_alt_bksp</merge>-->

      <!-- Edit (as needed) these four lines in the copied fdi file -->
      <merge key="input.xkb.rules" type="string">base</merge>
      <merge key="input.xkb.model" type="string">evdev</merge>
      <merge key="input.xkb.layout" type="string">us</merge>
      <merge key="input.xkb.variant" type="string"/>
    </match>
  </device>
</deviceinfo>


After:
<?xml version="1.0" encoding="ISO-8859-1"?>
<deviceinfo version="0.2">
  <device>
    <match key="info.capabilities" contains="input.keymap">
      <append key="info.callouts.add" type="strlist">hal-setup-keymap</append>
    </match>
    <match key="info.capabilities" contains="input.keys">
      <!-- Restore Ctrl-Alt-Bksp Xserver Zapping -->
      <!--<merge key="input.xkb.options" type="string">terminate:ctrl_alt_bksp</merge>-->

      <!-- Edit (as needed) these four lines in the copied fdi file -->
      <merge key="input.xkb.rules" type="string">base</merge>
      <merge key="input.xkb.model" type="string">evdev</merge>
      <merge key="input.xkb.layout" type="string">us</merge>
      <merge key="input.xkb.variant" type="string">dvorak</merge>
    </match>
  </device>
</deviceinfo>


Although I like the zero-configuration philosophy behind it; that's why I refuse to set it through the traditional /etc/x11/xorg.conf.

Hald should be deprecated for udev really. The XML overhead is stupid if you can adapt things to your needs through command-line like this:

In $HOME/.bashrc add:

alias aoeu="setxkbmap -model evdev -layout dvorak -variant intl -option grp:win_switch"

Or if you have an EZ-Reach (tm) Typematrix 2030 keyboard like me, you can add this to your /etc/udev/rules.d/99-custom.rules:

SYSFS{idVendor}=="1234", SYSFS{idProduct}=="5678", MODE="660", SYMLINK+="input/typematrix%k", GROUP="plugdev", RUN+="setxkbmap -model evdev -layout dvorak -variant intl -option grp:win_switch ; loadkeys dvorak"

Run the lsusb command to get the idVendor and idProduct values, my lsusb output goes something like this:

...
Bus 004 Device 002: ID 047d:1020 Kensington Expert Mouse Trackball
...

If you own a  Kensington expert trackball (mouse) like me, then you'll probably want to borrow my fdi definition, in the following block of xml code:

<?qxml version='1.0' encoding='UTF-8'?>
<deviceinfo version='0.2'>
  <device>
    <match key='info.capabilities' contains='input.mouse'>
      <merge key='input.x11_driver' type='string'>mouse</merge>
      <merge key="input.x11_options.Device" type="string">/dev/ttyS0</merge>
      <merge key="input.x11_options.Protocol" type="string">ThinkingMouse</merge>
      <merge key="input.x11_options.Emulate3Buttons" type="string">false</merge>
      <merge key="input.x11_options.CorePointer" type="string">On</merge>
      <merge key="input.x11_options.Buttons" type="string">4</merge>
      <merge key="input.x11_options.ButtonMapping" type="string">1 2 3 4</merge>
    </match>
  </device>
</deviceinfo>

End rant.

Once you go slack, you never go back.

Friday, August 20, 2010

Simple script to alert you when websites are down

Suppose your clients and boss would like to know when a site you maintain should go down.

You can write your own script to achieve this goal, where you implement a light-weight http client in python-twisted... but why bother?

There is a thing called reinventing the wheel.


So what do we need?
  1. curl;
  2. logrotate;
  3. mutt;
  4. A list of urls;
  5. A log file to record visits by curl;
  6. A script called site-checker, which one puts in the /etc/cron.hourly/ and chmod 770 /etc/cron.hourly/site-checker:


    #!/bin/bash
    
    #set -e -x
    
    FILELOC=/home/user/bin/websites.list
    LOGFILE=/home/user/bin/log/site-checker.log
    WEBSITES=$(cat $FILELOC | xargs)
    UDATE=$(date -u)
    EMAILADDR=user@nowhere.com
    for w in $WEBSITES
    do
      echo "==Fetched $w on $UDATE==" >> $LOGFILE
      curl -sf $w || echo "Warning: $w is down on $UDATE" | mutt -s "[website down] $UDATE $w" $EMAILADDR $
    done
    

websites.list would contain something like this:

www.microsoft.com
www.google.com
www.sap.com
www.oracle.com

For mutt to send e-mails you either need to setup a MTA (mail-transfer agent) or add a .muttrc file to your $HOME, containing something like this:

set smtp_url="smtp://user@smtp.woohoo.com"
set smtp_pass="tuff2guess"

You also have to prevent your site-checker.log from growing too big with logrotate.

Type this in your bash shell:

echo -e "/home/user/bin/log/*.log {\n\
  daily\n\
  missingok\n\
  size=100k\n\
  rotate 5\n\
}" > /etc/logrotate.d/user-custom\


As with every single script on this blog, you must adapt the script to your own needs.

Happy hacking!


Update 21-dec-2010:
#!/bin/bash
FILELOC=/home/user/bin/websites.list
PEOPLELOC=/home/user/bin/people.list
LOGFILE=/home/user/bin/site-checker.log
WEBSITES=$(cat $FILELOC | xargs)
UDATE=$(date -u)

mail () {
  read message
  domainname=$1
  for addr in $(cat $PEOPLELOC | xargs)
  do
    echo $message | mutt $addr -s "[website down] $UDATE $domainname" 2> /dev/null
  done
}

for w in $WEBSITES
do
  touch $LOGFILE
  echo "==Fetched $w on $UDATE==" >> $LOGFILE
  curl -sf $w || echo "Warning: $w is down on $UDATE" | mail $w
done

Linux: conflict resolution, usb sound card vs. webcam

Here's what you do to resolve a conflict in alsa, caused by some device hijacking your default alsa device to play sound. This could happen to you right after you upgrade your kernel.

  1. Hit cat /proc/asound/cards to locate the offending device;
  2. You prevent the offending modules that belong to the offending device from loading, by blacklisting the in /etc/modprobe.d/blacklist;
  3. Check the actual device you actually want, by hitting cat /proc/asound/cards
    You'll find something like this:


    0 [XYZ123         ]: HDA-boo - HDA ...
                          HDA ... at 0xdbef8000 irq 22
    
  4. Attack /etc/asound.conf by adding these lines:

    pcm.!default {
            type hw
            card XYZ123
    }
    
    ctl.!default {
            type hw
            card XYZ123
    }
    

The last step sets your default audio alsa device.

Hack on!

Friday, August 13, 2010

E-mail relay: emailrelay

Imagine you work in a company with dozens of employees, e-mailing each other, e-mailing clients etc.
  1. You'd like to sample correspondences from time to time, you achieve this by storing the messages and attachments in a database (nosql or sql-based);
  2. ...  but you don't want the hassle of setting up a foll mta (mail transfer agent), with foll-blown policies, like postfix and the like.
  3. .... you also want out-going e-mails scanned for spam and viruses, to prevent from getting blacklisted by other ISPs (in case there's an malware infection somewhere in the local network);
  4. ... you'd like to get jiggy with new open protoclis like xmpp and transfer internal e-mails over xmpp as well, with attachments too;
  5. ... recognize and reroute internal e-mails by bypassing the ISP's smtp server and sending it to the Exchange server or what have you;
  6. ... you want to mount the database which stores the e-mails and apply some clustering algorithm on it to gather data while e-mails are in transit... in other words online learning;
  7. ... you also have VOIP over SIP, which you would like to be stored with the e-mails.

Relaying e-mail


How would you do all that?

Personally, as a programmer, my initial reaction was: I'll roll it myself (using erlang)! I don't have the time to actually do it of course. The most I can do with the given time is to write some bash files to glue existing projects together.

I googled a bit and found a nice project called emailrelay. This wee project does what it advertises. Even better, it's licensed under gpl 3.0.

Why reinvent the wheel? When all the components are already here? Spamassassin, clamav, cron, ejabberd, etc.

So what does it do exactly? It can relay e-mails directly, or do it in two steps: collect and relay. Why is this relevant? It's relevant because mailrelay allows you to use third-party software to manipulate the e-mails in-transit.

For instance, during the collect phase we defer the decision to relay them or not to spamassassin and clamav, and during the relay phase compare the recipient address to the local addresses, and thus bypass the ISP MTA by sending it directly to the recipient (by some protocol i.e. smtp, xmpp, what have you).

Another example, imagine you want to send yourself a TODO item via emailrelay and you want this item synchronized with your existing TODO lists. All you need to do is grab some api/library for your favourite scripting language and voila, you're almost done. You still need to write the glue code though.

You can pick your favourite programming language (e.g. bash, java, python, ruby...) and let mailrelay pass a parameter to your application or script (is there a difference?) and compose whatever behaviour or policy you want during the collect or relay phase.


Some benefits of this approach in contrast to a traditional full-blown MTA:
  • You can test each piece of software at any level;
  • You can adapt to any network topology depending on where the bottle-necks are and how your software pieces should operate optimally (i.e. serial or parallel on one or multiple computers, multiple processes, single processes, vmguest or real machine...);
  • You can mix and match pieces of technology to get what you want. See this example of spamassassin.

The drawbacks of this approach:
  • You have to write your own (standard) policies and (standard) behaviour;
  • No monitoring of crashing components (supervisor a la erlang, anyone?) between the cooperating processes;
Non-monolithic composable software. Ain't it grand?

We use couchdb to store the e-mails and attachments etc. and replicate it to other nodes etc.

Here's some code to get you started.

Daemon initialization code:

#!/bin/bash
set -e -x

ROOT=/home/administrator/email
/usr/local/etc/init.d/couchdb restart
if [ -e $ROOT/scripts/pid-emailrelay-server.pid ]; then
  kill `cat $ROOT/scripts/pid-emailrelay-server.pid`
fi
if [ -e $ROOT/scripts/pid-emailrelay-client.pid ]; then                  
  kill `cat $ROOT/scripts/pid-emailrelay-client.pid`
fi
#server
$(which emailrelay) --as-server --port 2022 --spool-dir $ROOT/spool \
  --filter $ROOT/scripts/emailrelay-server-filter.sh --filter-timeout 60 \
  --pid-file $ROOT/scripts/pid-emailrelay-server.pid --verbose
#client
$(which emailrelay) --as-server --no-smtp --forward --forward-to smtp.somewhere.org:smtp \
  --spool-dir $ROOT/spool --poll 60 \
  --pid-file $ROOT/scripts/pid-emailrelay-client.pid --verbose

## -d = --log --close-stderr
##    = -l    -e
## -q = --log --no-syslog --no-daemon --dont-serve --forward --forward-to <host:port>
##    = -l    -n          -t          -x           -f        -o           <host:port>

Filter script:

#!/bin/sh

set -e -x

ROOT=/home/administrator/email
$(which python) $ROOT/scripts/couch.py email $1 # couchdb python client script, first parameter is the couchdb db
clamdscan --no-summary --move=$ROOT/infected - < $1
$(which spamassassin) "$1" > "$1.tmp"
mv "$1.tmp" "$1"

exit 0

File destroyer / obliterator

I stole the idea here and I wrote a a practical script to embellish on the theme.

#!/bin/sh

#set -e -x
set -e

if [ -f $1 ]; then
  echo "Are you sure? Hit ctrl+c to abort."
  for i in `seq 5`; do
    echo $i && sleep 1;
  done
  # mac osx / bsd: BYTES=$(stat -f%z $1)
  BYTES=$(/usr/bin/du -b $1 | sed 's/\([0-9]\+\).*/\1/')
  /bin/dd if=/dev/urandom of=$1 bs=$BYTES count=1 conv=notrunc
  echo "$1 has been written over with random bits."
  /bin/rm -f $1
  echo "$1 has been removed from the filesystem."
fi

Warning! Achtung baby! Do not use this on sensitive files you may need in the future.

This got me thinking. Imagine you work for a secretive branch of some government and you have sensitive info on your drive you wouldn't want to be caught with dead, like passwords etc. You could build a kill switch for your portable computer(s) with the sensitive files. This kill switch is attached to an artificial pacemaker which senses if you're still alive or not, by checking your heart-beat. This portable computer gets signals from the pacemaker. When it ceases to receive signals, the kill switch is engaged, after 5 minutes with no keep-alive signals from the pacemaker, the portable computer is erased completely.

The trick is to not let the signal get intercepted. Hmmmm... I should write spy-novels with a technical flavour.