Tag Archives: sed

Rename files to reflect modified time while preserving extension

I’m undergoing a big project of scanning old family scrapbooks. I have two scanners involved, each with their own naming scheme, dumping files of various types (pdf, jpg) into the same folder. I need to have an accurate chronological view of when things were scanned (nothing was scanned at the exact same moment.) At this point the only way to get this information is to sort all the files by modified time,  but no other programs but a file manager look at files this way. I needed a way to rename all the files to reflect their modified time.

I found a couple different ways to do this, but settled on a bash for loop utilizing the stat, sed, and mv commands.

Challenge 1

Capture the extension of the files.

Using sed:

ls <filename> | sed 's/.*\(\..*\)$/\1/'

Challenge 2

Obtain the modified time of your file in an acceptable format. I do this using the stat command.

stat -c %y <filename>

%y is the best option here – it leaves no room for ambiguity. You could also choose %Y so the filenames aren’t so large and don’t contain colons (some systems struggle with this.) The downside do this is you only have to-the-second precision. In my case I had a few files that had the same epoch timestamp, which caused problems. More on how to format this can be found here and here.

Stringing it all together

I wrapped it all up in a bash for loop with appropriate variables. This was my final command, which I ran inside the directory I wanted to modify:

for file in *; do name=$(stat -c %y "$file"); ext=$(echo "$file" | sed 's/.*\(\..*\)$/\1/'); mv -n "$file" "$name$ext"; done

The for loop goes through each file one at a time and assigns it to the $file variable.  I then create the name variable which uses the stat command to obtain a precise date modified timestamp of the file. The ext variable is derived from the filename but only keeps the extension (using sed.) The last step uses mv -n (no clobber mode – don’t overwrite anything) to rename the original file to its date modified timestamp. The result: a directory where each file is named precisely when it was modified – a true chronology of what was scanned irrespective of file extension or which scanner created the file. Success.

Append users to powerbroker open RequireMembershipOf

The title isn’t very descriptive. I recently came across a need to script adding users & groups to the “RequireMembershipOf” directive of PowerBroker Open. PowerBroker is a handy tool that really facilitates joining a Linux machine to a Windows domain. It has a lot of configurable options but the one I was interested in was RequireMembershipOf – which as you might expect requires that the person signing into the Linux machine be a member of that list.

The problem with RequireMembershipOf is, as far as I can tell, it has no append function. It has an add function which frustratingly erases everything that was there before and includes only what you added onto the list. I needed a way to append a member to the already existing RequireMembershipOf list. My solution involves the usage of bash, sed, and a lot of regex. It boils down to two lines of code:

#take output of show require membership of, remove words multistring & local policy, replace spaces with carat (pbis space representation) and put results into variable (which automatically puts results onto a single line)

add=$(/opt/pbis/bin/config --show RequireMembershipOf | sed 's/\(multistring\)\|\(local policy\)//g' | sed 's/ /^/g')

#run RequireMembershipOf command with previous output and any added users

sudo /opt/pbis/bin/config RequireMembershipOf "$add" "<USER_OR_GROUP_TO_ADD>"

That did the trick.

Rename LVM group in CentOS7

I recently made the discovery that all my VMs have the same volume group name – the default that is given when CentOS is installed. My OCD got the best of me and I set out to change these names to reflect the hostname. The problem is if you rename the volume group containing the root partition, the system will not boot.

The solution is to run a series of commands to get things updated. Thanks to the centOS forums for the information. In my case I had already made the mistake of renaming the group and ending up with an unbootable system. This is what you have to do to get it working again:

Boot into a Linux rescue shell (from the installer DVD, for example) and activate the volume groups (if not activated by default)

vgchange -ay

Mount the root and boot volumes (replace VG_NAME with name of your volume group and BOOT_PARTITION with the device name of your boot partition, typically sda1)

mount /dev/VG_NAME/root /mnt
mount /dev/BOOT_PARTITION /mnt/boot

Mount necessary system devices into our chroot:

mount --bind /proc /mnt/proc
mount --bind /dev /mnt/dev
mount --bind /sys /mnt/sys

Chroot into our broken system:

chroot /mnt/ /bin/bash

Modify fstab and grub files to reflect new volume group name:

sed -i /etc/fstab -e 's/OLD_VG_NAME/NEW_VG_NAME/g'
sed -i /etc/default/grub -e 's/OLD_VG_NAME/NEW_VG_NAME/g'
sed -i /boot/grub2/grub.cfg -e 's/OLD_VG_NAME/NEW_VG_NAME/g'

Run dracut to modify boot images:

dracut -f

Remove your recovery boot CD and reboot into your newly fixed VM

exit
reboot

If you want to avoid having to boot into a recovery environment, do the following steps on the machine whose VG you want to rename:

Rename the volume group:

vgrename OLD_VG_NAME NEW_VG_VAME

Modify necessary boot files to reflect the new name:

sed -i /etc/fstab -e 's/OLD_VG_NAME/NEW_VG_NAME/g'
sed -i /etc/default/grub -e 's/OLD_VG_NAME/NEW_VG_NAME/g'
sed -i /boot/grub2/grub.cfg -e 
's/OLD_VG_NAME/NEW_VG_NAME/g'

Rebuild boot images:

dracut -f

Find top 10 requests returning 404 errors

I had a website where I was curious what the top 10 URLs that were returning 404s were along with how many hits those URLs got. This was after a huge site redesign so I was curious what old links were still trying to be accessed.

Getting a report on this can be accomplished with nothing more than the Linux command line and the log file you’re interested in. It involves combining grep, sed, awk, sort, uniq, and head commands. I enjoyed how well these tools work together so I thought I’d share. Thanks to this site for giving me the inspiration to do this.

This is the command I used to get the information I wanted:

grep '404' _log_file_ | sed 's/, /,/g' | awk {'print $7'} | sort | uniq -c | sort -n -r | head -10

Here is a rundown of each command and why it was used:

  • grep ‘404’ _log_file_ (replace with filename of your apache, tomcat, or varnish access log.) grep reads a file and returns all instances of what you want, in this case I’m looking for the number 404 (page not found HTTP error)
  • sed ‘s/, /,/g’ Sed will edit a stream of text in any way that you specify. The command I gave it (s/, /,/g) tells sed to look for instances of commas followed by spaces and replace them with just commas (eliminating the space after any comma it sees.) This was necessary in my case because sometimes the source IP address field has multiple IP addresses and it messed up the results. This may be optional if your server isn’t sitting behind any type of reverse proxy.
  • awk {‘print $7’} Awk has a lot of similar functions to sed – it allows you to do all sorts of things to text. In this case we’re telling awk to only display the 7th column of information (the URL requested in apache and varnish logs is the 7th column)
  • sort This command (absent of arguments) sorts our results alphabetically, which is necessary for the next command to work properly.
  • uniq -c This command eliminates any duplicates in the results. The -c argument adds a number indicating how many times that unique string was found.
  • sort -n -r Sorts the results in reverse alphabetical order. The -n argument sorts things numerically so that 2 follows 1 instead of 10. -r Indicates to reverse the order so the highest number is at the top of the results instead of the default which is to put the lowest number first.
  • head -10 outputs the top 10 results. This command is optional if you want to see all the results instead of the top 10. A similar command is tail – if you want to see the last results instead.

This was my output – exactly what I was looking for. Perfect.

2186 http://<sitename>/source/quicken/index.ini
2171 http://<sitename>/img/_sig.png
1947 http://<sitename>/img/email/email1.aspx
1133 http://<sitename>/source/quicken/index.ini
830 http://<sitename>/img/_sig1.png
709 https://<sitename>/img/email/email1.aspx
370 http://<sitename>/apple-touch-icon.png
204 http://<sitename>/apple-touch-icon-precomposed.png
193 http://<sitename>/About-/Plan.aspx
191 http://<sitename>/Contact-Us.aspx

Change the hostname on a Splunk Indexer

Recently I set about to change the hostname on a Splunk indexer. It should be pretty easy, right? Beware. It can be pretty nasty! Below is my experience.

I started with the basics.

  • hostname command
    hostname <newhostname>
  • Modify /etc/system/network to make it persistent (CentOS specific)
    sed -i 's/<old hostname>/<new hostname>/g' /etc/system/network
  • Inform Splunk of the hostname change
    sed -i 's/<old hostname>/<new hostname>/g' $SPLUNK_HOME/etc/system/local/server.conf
  • Restart Splunk

Sadly, that wasn’t the end of it. I noticed right away Splunk complained of a few things:

TcpOutputProc - Forwarding to indexer group default-autolb-group blocked for 300 seconds.
WARN TcpOutputFd - Connect to 10.0.0.10:9997 failed. Connection refused

Running

netstat -an | grep LISTEN

revealed that the server was not even listening on 9997 like it should be. I found this answer indicating it could be an issue with DNS tripping up on that server. I edited $SPLUNK_HOME/etc/system/local/inputs.conf with the following:

[splunktcp://9997]
connection_host = none

but I also noticed that after I ran the command a short time later it was no longer listening on 9997. Attempting to telnet from the forwarder to the indexer in question revealed the same results – works at first, then quit working. Meanwhile no events are getting stored on that indexer.

I was pulling my hair out trying to figure out what was happening. Finally I discovered this gem on Splunk Answers:

Are you using the deployment server in your environment? Is it possible your forwarders’ outputs.conf got deployed to your indexer?

On the indexer:
./splunk cmd btool outputs list –debug

Sure enough! after running

./splunk cmd btool outputs list --debug

I discovered this little gem of a stanza:

/opt/splunk/etc/apps/APP_Forwarders/default/outputs.conf [tcpout]

That shouldn’t’ have been there! Digging into my deployment server I discovered that I had a server class with a blacklist, that is, it included all deployment clients except some that I had listed. The blacklist had the old hostname, which meant when I changed the indexer’s hostname it no longer matched the blacklist and thus was deployed a forwarder’s configuration, causing a forwarding loop. My indexer was forwarding back to the forwarder everything it was getting from the forwarder, causing Splunk to shut down port 9997 on the offending indexer completely.

After getting all that set up I noticed Splunk was only returning searches from the indexers whose hostnames I had not changed. Everything looked good in the distributed search arena – status was OK on all indexers; yet I still was not getting any results from the indexer whose name I had changed, even though it was receiving events! This was turning into a problem. It was creating a blind spot.

Connections great, search status great, deployment status good.. I didn’t know what else to do. I finally thought to reload Splunk on the search head that had been talking to the server whose name I changed. Success! Something in the search head must have made it blind to the indexer once its name had changed. Simply restarting Splunk on the search head fixed it.

In short, if you’re crazy enough to change the name of one of your indexers in a distributed Splunk environment, make sure you do the following:

  • Change hostname on the OS
  • Change ServerName in Splunk config files
    • Add connection_host = none in inputs.conf (optional?)
  • Clean up your deployment server
    • Delete old hostname from clients phoning home
    • MAKE SURE the new hostname won’t be sucked up into an unwanted server class
  • Clean up your search head
    • Delete old hostname search peer
    • Add new hostname search peer
    • Restart search head
  • Profit

Upgrade Linux Mint 16 to 17.1

I realized recently that my desktop system is quite out of date. It has worked so well for so long that I didn’t realize for a while that it was end of support. I was running Linux Mint 16 – Petra.

Thanks to this site the upgrade was fairly painless – a few repository updates, upgrade, and reboot. Simple! The steps I took are below

Update all repositories

Use sed in conjunction with find to quickly and easily update all your repository files from saucy to trusty, and from petra to rebecca, making a backup of files modified.

sudo find /etc/apt/sources.list.d/ -type f -exec sed --in-place=.bak 's/saucy/trusty/' {} \;
sudo find /etc/apt/sources.list.d/ -type f -exec sed --in-place=.bak 's/petra/rebecca/' {} \;

Update your system

This took a while. It had to download 1.5GB of data and install it.

sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get upgrade

Cleanup

After running the upgrade I had a notice that I had many packages that were installed but no longer required. To remove unnecessary packages after the upgrade:

sudo apt-get autoremove

Install new language settings:

sudo apt-get install mintlocale

Install gvfs-backends (for Thunar)

sudo apt-get install gvfs-backends

Reboot

Flawless! It worked on the first try. Awesome.

Manipulate dd output with sed, tr, and awk

While improving a backup script I came across a need to modify the information output by the dd command. FreeNAS, the system the script runs on, does not have a version of dd that has the human readable option. My quest: make the output from dd human readable.

I’ve only partially fulfilled this quest at this point, but the basic functionality is there. The first step is to redirect all output of the dd command (both stdout and stderr) to a variable (this particular syntax is for bash)

DD_OUTPUT=$(dd if=/dev/zero of=test.img bs=1000000 count=1000 2>&1)

The 2>&1 redirects all output to the variable instead of the console.

The next step is to use sed to remove unwanted information (number records in and out)

sed -r '/.*records /d'

Next, remove any parenthesis. This is due to the parenthesis from dd output messing up the math that’s going to be done later.

tr -d '()'

-d ‘()’ simply tells tr to remove any instance of the characters between the single quotes.

And now, for awk. Awk allows you to manipulate certain pieces of a text file. Each section, separated by a space, is known as a record. Awk lets us edit some parts of text while keeping others intact. My expertise with awk is that of a n00b so I’m sure there is a more efficient way of doing this; nevertheless here is my solution:

awk '{print ($1/1024)/1024 " MB" " " $3 " " $4 " " $5*60 " minutes"  " (" ($7/1024)/1024 " MB/sec)" }'

Since dd outputs its information in bytes I’m having it divide by 1024 twice so the resulting number is now in megabytes. I also have it divide the seconds by 60 to return the number of minutes dd took. Additionally I’m re-inserting the parenthesis removed by the tr command now that the math is correctly done.

The last step is to pipe all of this together:

DD_OUTPUT=$(dd if=/dev/zero of=test.img bs=1000000 count=10000 2>&1)
DD_HUMANIZED=$(echo "$DD_OUTPUT" | sed -r '/.*records /d' | tr -d '()' | awk '{print ($1/1024)/1024 " MB" " " $3 " " $4 " " $5/60 " minutes"  " (" ($7/1024)/1024 " MB/sec)" }')

After running the above here are the results:

Original output:

echo "$DD_OUTPUT"
10000+0 records in
10000+0 records out
10000000000 bytes transferred in 103.369538 secs (96740299 bytes/sec)

Humanized output:

echo "$DD_HUMANIZED"
9536.74 MB transferred in 1.72283 minutes (92.2587 MB/sec)

The printf function of awk would allow us to round the calculations but I couldn’t quite get the syntax to work and have abandoned the effort for now.

Of course all this is not necessary if you have the correct dd version. The GNU version of dd defaults to human readability; certain BSD versions have the option of passing the msgfmt=human argument per here.


Update: I discovered that the awk method above will only print the one line it finds and ignore all other lines, which is less than ideal for scripting. I updated the awk syntax to do a search and replace (sub) instead so that it will print all other lines as well:

awk '{sub(/.*bytes /, $1/1024/1024" MB "); sub(/in .* secs/, "in "$5/60" mins "); sub(/mins .*/, "mins (" $7/1024/1024" MB/sec)"); printf}')

My new all in one line is:

DD_HUMANIZED=$(echo "$DD_OUTPUT" | sed -r '/.*records /d' | tr -d '()' | awk '{sub(/.*bytes /, $1/1024/1024" MB "); sub(/in .* secs/, "in "$5/60" mins "); sub(/mins .*/, "mins (" $7/1024/1024" MB/sec)"); printf}')

Verify backup integrity with rsync, sed, cat, and tee

Recently it became apparent that a large data transfer I did might have had some errors. I wanted to find an easy way to compare the source and destination to make sure that they were identical. My solution: rsync, sed, cat and tee

I have used rsync quite a bit but did not know about the –checksum flag until recently. When you run rsync with –checksum, it takes much longer, but it effectively does something similar to what a ZFS scrub does – it runs a checksum of every source file and compares it with the checksum of each destination file.  If there is a mismatch, rsync will overwrite the destination file with the source file to correct it.

In my situation I performed a large data migration from my old mdadam-based RAID array to my brand new ZFS array. During the transfer the disks were acting very strange, and at one point one of the disks even popped out of the array. The culprit turned out to be a faulty SATA controller. I bought a cheap 4 port SATA controller from Amazon for my new ZFS array. Do not do this! Spring the cash out for a better controller. The cheap ones, this one at least, only caused headache. I removed it and used the on-board SATA ports on my motherboard and the issues went  away.

All of those shennanigans made me wonder if there was corrupt data on my new ZFS array. A ZFS scrub repaired 15.5G of data! While I’m sure that fixed a lot of the issues, I realized there probably was still some corruption. This is how I verified it

rsync -Pahn --checksum /path/to/source /path/to/destination | tee migration.txt

-P shows progress, -a means archive, -h is for human readable measurements, and -n means dry run (don’t actually copy anything)

Tee is a cool utility that allows you to redirect output of a command both to a file and to standard output. This is useful if you want to see the verification take place in real time but also want to analyze it later.

After the comparison (which took a while!) I wanted to see the discrepancies. the -P flag lists each directory rsync checks as well as which files it detected. You can use sed in conjunction with cat to weed out the unwanted lines (directory listings) so that only the files with discrepancies are left.

 cat pictures.txt | sed '/\/$/d' | tee pictures-truncated.txt

The sed regex simply looks for any line ending in a / (directory listing) and removes that line. What is left is the files in question. You can combine the entire thing into one line like so

rsync -Pahn --checksum /path/to/source /path/to/destination | sed '/\/$/d' | tee migration.txt

In my case I wanted to compare discrepencies with rsync and make decisions on if I wanted to actually fix the issues. If you are 100% sure the source is OK to remove the destination completely, you can simply run

rsync -Pah --checksum --delete /path/to/source /path/to/destination