Use regex to select only certain files via s3cmd

Recently I wanted to download only a small subset of thousands of files from an S3 “folder.”

The s3cmd docs were lacking with regard to how to select only a subset.

This is how I downloaded only log files from 11PM:
s3cmd get --recursive --rinclude ".*2014-10-28-23.*" --rexclude ".*" s3://my-bucket-of-logs/folder/2014-10-28/

Note that I used both rinclude and rexclude. If you only use rinclude (or include), it won’t restrict only to what you specified, but it will ensure what you specified is taken. So you have to exclude everything first (--rexclude ".*"), THEN include only what you want (--rinclude ".*2014-10-28-23.*")

3 Replies to “Use regex to select only certain files via s3cmd”

  1. Hi Garren,

    I am trying to do a similar task. but it doesnt seem to work for me.

    $ s3cmd ls –recursive –rexclude ‘.*’ –rinclude ‘.*logs_.*_20150505.*’ s3://data/ | wc –l

    any thoughts or corrections?

    1. Hi Narayan,

      What message(s) if any do you get when running the command? The command requires two hyphens before each argument (so –recursive instead of -recursive), but that could just be a formatting issue with WordPress or something.

      What/how many records are you expecting to be returned and what (if any) records are found?

      So I just ran my same command from above and then modified it to be like yours (using ls).

      What I found fascinated me…

      Firstly, here’s the command I used in similar vein to yours (with one key exception):
      s3cmd get --recursive --rexclude ".*" --rinclude ".*2014-10-28-23.*specific_log_type.*" s3://my-bucket-of-logs/folder/2014-10-28/

      The key part here is that this command worked because I was using a get and not an ls. Running that same command as ls resulted in getting the entire directory listing, not excluding anything nor restricting to just my rinclude.

      Takeaway:
      s3cmd with ls does not seem to actually respect the rinclude or rexclude – these commands were tested on s3cmd 1.5.2. To further vindicate my concern for s3cmd taking arguments for commands that are not actually supported, I passed a valid (for get) --force on an ls even though –force makes no sense for ls

  2. Hi Garren,

    Thanks for the reply. Yup I used 2 hyphens. not one.

    Seems like those options (–rinclude and –rexclude) only works with “s3cmd” get and not “s3cmd ls”. Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *