The ugrep file pattern searcher

a more powerful, ultra fast, user-friendly, compatible grep (that is also completely free!)

ugrep release 5.1

ugrep installs on

how fast is ugrep?

ugrep screenshot with TUI

Search with a TUI (shown) or from the command line with grep-compatible options. You can also Google search (shown) and fuzzy search your files. Search (nested) zip/7z/tar/pax/cpio archives, tarballs and compressed files, search and hexdump binary, search PDF, doc, docx, and much more.

 

Commands

Search for patterns in files with the ug and ugrep commands, where

ug
for user-friendly interactive use, with an optional .ugrep configuration file with your preferences located in the working directory or in your home directory;
the ug+ command also searches pdfs, documents, e-books, image metadata
ug --save-config OPTIONS
saves a new .ugrep file in the working directory using the current .ugrep configuration and by copying the relevant OPTIONS (if any) to the new .ugrep file
ugrep
same as ug, but does not use a .ugrep configuration file: ugrep works best in shell scripts;
the ugrep+ command also searches pdfs, documents, e-books, image metadata

Examples:

ug PATTERN
recursively search for files matching PATTERN
ug PATTERN FILE
search lines in FILE matching PATTERN
ug PATTERN DIR
search files in DIR matching PATTERN, excluding sub-directories (like ls DIR takes a DIR to list)
ug -r PATTERN DIR
recursively search files in DIR matching PATTERN, excluding symlinks
ug -rS PATTERN DIR
recursively search files in DIR matching PATTERN, including symlinks to files (option -S), but not to directories
ug -R PATTERN DIR
recursively search files in DIR matching PATTERN, including symlinks to files and directories
ug -3 PATTERN DIR
recursively search files in DIR matching PATTERN as -r, but up to 3 levels, i.e. DIR/, DIR/one/, and DIR/one/two/
ug --save-config --ignore-files --ignore-binary --no-tree
save a .ugrep configuration file that lets ug obey .gitignore rules in recursive searches, ignore binary files (-I) in searches and turn off directory tree output
ug -%% -j -w -Q
recursively Google search files (option -%%) with smart ignore case (option -j) regex patterns matching words (option -w) in the interactive query TUI (option -Q)
  1. the regex syntax is the standard POSIX ERE, same as egrep, but supporting Unicode
  2. patterns match Unicode and may include newline breaks \n and \R to match multiple lines as a single match; some examples: the pattern "foo.*\n.*\n.*baz" matches a line with foo, a second line and a third line with baz, the pattern "foo(.*\n)*?.*bar" lazily matches one or more lines from foo to bar
  3. quote "PATTERN" or 'PATTERN' to prevent globbing of the pattern by the shell that may expand *, ? and [a-z] into pathnames
  4. Windows Command Prompt does not parse ' to quote patterns; you must use " instead
  5. Windows PowerShell does not parse "" (empty pattern); you must specify --match instead
  6. an empty pattern "" matches every line, same as option --match
  7. multiple FILE and DIR pathname arguments may be specified as search targets; if none are provided, the working directory is recursively searched
  8. standard input is searched if standard input is not a terminal, such as a pipe redirect
  9. to replace grep: alias grep='ug -G'; alias egrep='ug -E'; alias fgrep='ug -F' or copy/symlink ugrep to grep, egrep and fgrep; it emulates according to these names
 

Options

Ugrep is compatible with GNU grep and supports GNU grep command-line options. But ugrep also offers new options and features. In general, command-line options can be mixed and specified in any order. Long options --OPTION may start with --no-OPTION to disable them. All short options have long alternatives. This page shows short options for the most part. Specify --stats to output a final summary search report of options, patterns, and search statistics.  

List matching files

-l
list matching files
-l -m5,
list files that have at least 5 matching lines (-m5, with the comma is the same as --min-count=5)
-l --max-files=3
list only the first 3 matching files
-L
list non-matching files, same as -lv i.e. option -v inverts matching
-c
count matching lines in files
-cv
count non-matching lines in files; option -v inverts matching
-cu
count all pattern matches by ungrouping multiple matches from lines (option -u)
-cm1,
count matching lines in files, but skip files with zero matches (-m1, with comma is --min-count=1)
  1. if you never want -c to output zero match counts, then add min-count=1 to your ~/.ugrep file (outputing zero match counts is a GNU grep behavior)
  2. to disable directory tree-based listings, specify --no-tree or permanently add no-tree to your ~/.ugrep file
  3. listings are sorted by name; to sort by date/time or by size, specify --sort=changed or --sort=size
 

Displaying matches, match info, match context

-H
always output the filename; normally, a filename is not output when searching a single specified file
-n
output the line number of a match
-k
output the column number of a match
-b
output the byte offset of a match
-u
ungroup multiple matches from lines to count and output each match separately
-C3
output matching lines with 3 lines as context before (option -B3) and after (option -A3)
-y
output matching lines with the rest of the file as context (--any-line or --passthru)
-o
output only the matching part
-o -C20
output only the matching part with the matching line as context before (option -B20) and after (option -A20) to fit 40 characters
--width
truncate lines to the terminal window width; --width=40 truncates to 40 characters
 

Pattern matching modes

-F
search for matching strings, not regex patterns, like GNU fgrep or grep -F
-G
BRE pattern syntax, like GNU grep or grep -G
-P
Perl regex pattern search with PCRE, see also ug --help regex
-Z
fuzzy search with the default ERE pattern syntax
-U
non-Unicode ASCII/binary search like GNU grep; patterns such as \xa3 match a byte, not the U+00a3 multi-byte code point
-Y
empty-matching patterns such as x*y*z* match all lines like GNU grep, instead of returning useful matches
-i
ignore case in matching patterns
-j
smart ignore case, enables -i when patterns are specified in lower case
-w
patterns must match as words and not be part of words
-x
patterns must match whole lines from start to end
-v
invert pattern matching; output lines that do not match
-e PATTERN
explicitly specify PATTERN; -e is used to specify multiple patterns and when specifying a pattern after the FILE argument
-N PATTERN
do not match PATTERN when combined with -e; for example -e "[0-9]+" -N "0+" matches nonzero numbers
-f FILE
read (additional) patterns from FILE
-f cpp/names
if cpp/names is not a local file, then read built-in C++ name-matching patterns (installed in /.../share/ugrep/patterns/cpp/names)
 

The interactive TUI

-Q
start TUI to specify search patterns and options interactively
-Q -e PATTERN
start TUI and search for PATTERN
  1. additional options, files and directories can be specified on the command line to start the TUI search
  2. ALT-key toggles the option letter corresponding to the key press, for example ALT-L lists matching files (option -l) and SHIFT-ALT-C shows context (option -C3)
  3. ALT-key in MacOS terminal is OPTION-key when "Use Option as Meta key" is enabled in Terminal Preferences/Profiles/Keyboard
  4. navigate to directories and files with Tab, then SHIFT-Tab to go back and restore previous options and patterns (if changed)
  5. use the cursor keys, PgUp, PgDn and the scroll wheel to scroll the search results
  6. CTRL-S jumps to the next directory or file in the results, CTRL-W jumps back
  7. CTRL-T toggles the split screen file viewer; option --split starts the TUI with the split screen
  8. CTRL-Y displays a file in a pager
  9. CTRL-Z displays help and the active search options
  10. ENTER enters output selection mode to select lines to output when exiting the TUI (selections are kept until TUI exits or until a new search is performed)
 

Googling files

-% "foo bar"
search files for lines matching both regex patterns foo and bar anywhere on the ame line
-%% "foo bar"
find files matching both patterns foo and bar anywhere in the same file (-%% is the same as --bool --files)
-% "foo -bar"
search files for lines matching pattern foo that do not match bar anywhere on the same line
-%% "foo -bar"
find files matching pattern foo that do not match bar anywhere in the same file (-%% is the same as --bool --files)
-% "foo bar|baz"
search files for lines matching both patterns foo and bar|baz anywhere on the same line
-% "foo -(bar|baz)"
search files for lines matching pattern foo that do not match bar|baz anywhere on the same line
-% "foo AND NOT (bar OR baz)"
same as above, this time using AND-OR-NOT operators
-% "foo -bar -baz"
same as above, in normalized form (ugrep's internal CNF)
-% 'foo "-bar baz"'
search files for lines matching both patterns foo and -bar baz, where "-bar baz" is quoted to match literally "as is"
-F -% "*foo* bar?"
search files for lines matching both fixed (option -F) strings *foo* and bar? anywhere on the same line
  1. option -% (--bool) can be combined with any pattern matching modes -F, -G, -P, -Z and other options
  2. operators AND, OR and NOT may also be used when properly spaced
  3. white space is a logical AND (lowest precedence)
  4. a | is a logical OR (taking higher precedence than AND)
  5. white space followed by a - is a logical NOT (taking highest precedence)
  6. quote strings in a pattern with " to match literally "as is"
  7. group patterns with ( ) parentheses
  8. the default search mode is --lines to match lines; option --files switches the search mode to find files
 

Fuzzy search

-Z
approximately match patterns up to one extra, missing or replaced character in the input
-Z2
approximately match patterns up to two extra, missing or replaced characters in the input
-Z+2
approximately match patterns up to two extra characters in the input
-Z-2
approximately match patterns up to two missing characters in the input
-Z~2
approximately match patterns up to two replaced characters in the input
-Z+-2
approximately match patterns up to two extra or missing characters in the input
-Z+-~2
same as -Z2: approximately match patterns up to two extra, missing or replaced characters in the input
-c -Z
count approximate matches in files
-c -Zbest2
count -Z2 approximate matches in files, but only keep the best matches, i.e. if a file has at least one exact match, then only exact matches are counted
-c -Zbest2 --sort=best
count the best approximate matches in files and sort by best matching files for each (sub)directory searched
  1. the first character or characters that make up a pattern always match; to approximately match the first character(s), replace it with a . or .?
  2. no whitespace may be given between -Z and its argument
 

Archives and compressed files

-z
also search zip/7z/tar/pax/cpio archives, tarballs and gz/Z/bz/bz2/lzma/xz/lz4/zstd/brotli compressed files
-z --zmax=2
also search archives, tarballs and compressed files stored within archives (max 2 levels)
-z -I --zmax=2
same as above, but ignore binary files and also those in (nested) archives and compressed files
-z -tc,cpp
search C and C++ source code files and also those in archives, see also ug -tlist for a list of file types
-z -g"*.txt,*.md"
search files matching the globs *.txt and *.md and also those in archives, see also ug --help globs
-z -g"^bak/"
exclude all bak directories from the search and skip those in archives, see also ug --help globs
 

Binary files and devices

-I
ignore binary files and exclude them from searches
-W
hexdump the binary matches, keeping text matches as is
-X
hexdump all matches
-UX
hexdump with 8-bit binary patterns instead of Unicode character-based patterns (option -U)
--hexdump=4a
hexdump in 4 columns and output a * for hex lines that are identical to the previous line (a)
--hexdump=4ch
hexdump in 4 columns, no character column (c), no hex spacing (h)
--hexdump=4aC3
hexdump in 4 columns with 3 hex lines of context before and after (C3 or B3A3)
-Dread
also read special devices to search them; danger: can get stuck on a non-responsive device!
 

Exclusions and inclusions

-@
(--all) search all files except hidden: cancel previous restrictions; restrictions specified after this option are still applied, e.g. -@I searches all non-binary files
-.
(--hidden) include hidden files in searches; normally, hidden files are excluded from searching
-I
ignore binary files and exclude them from searches
-p
never follow symlinks, even when specified on the command line
-r
search recursively without following symlinks
-rS
search recursively following symlinks to files, but not to directories (option -S)
-R
search recursively following symlinks to files and directories
-tc,cpp
only search C and C++ source code files, see also ug -tlist for a list of file types
-Ohpp,cpp
shorthand for -g"*.hpp,*.cpp" with filename extension globs to search .hpp and .cpp files
-g"*.hpp,*.cpp"
only search .hpp and .cpp files with the specified glob patterns, see also ug --help globs
-g"src/"
only recursively search src directories with the specified glob pattern ending in a / for directories, see also ug --help globs
-g"^*.txt,^bak/"
do not search .txt files and bak directories with the specified negated glob patterns, see also ug --help globs
--iglob="^*.txt,^bak/"
same as above, but with case-insensitive glob matching (option --glob-ignore-case applies to all globs)
-K10,99
only search files from line 10 up to and including line 99
-m1
output only the first matching line (same as --max-count=1)
-m2,9
only search files with at least two matching lines and output up to and including 9 matching lines
-m2,9 -u
only search files with at least two matches and output up to and including 9 matches
-3
recursively search up to three directory levels deep, i.e. one/, one/two/, and one/two/three/
-2-3
only recursively search sub-directories at two to three levels deep, i.e. one/two/, and one/two/three/
--max-files=3
only return matches for the first three matching files (in the current --sort order)
--ignore-files
obey .gitignore rules in recursive searches
--exclude-fs=PATH
do not search the file system associated with PATH (a mounted directory or mount point)
--exclude-fs
only descend into the file systems associated with the specified file and directory search targets, exclude all other
--include-fs=.
only search the file system associated with . (. is the PATH), i.e. ignores all mounted and special devices
--exclude-from=FILE
do not search the files and directories specified as globs in FILE, see also ug --help globs
--include-from=FILE
only search the files and directories specified as globs in FILE, see also ug --help globs
--filter="COMMANDS"
filter files first before searching them by executing a utility on a file based on its type, see also ug --help filter
  1. to let ug ignore binary files by default, add ignore-binary to your ~/.ugrep file
  2. to let ug obey .gitignore rules in recursive searches by default, add ignore-files to your ~/.ugrep file
 

Formatted output

--csv
CSV output format
--json
JSON output format
--xml
XML output format
--format="FORMAT"
custom output formatting, see also ug --help format
  1. formatting can be combined with other options, such as -n to include line numbers
 

Pretty things, more or less

--pretty
enable -n, -T, --color, --tree, --heading, --break and --sort when output is sent to a terminal
--tree
list files in a directory tree for options -l and -c
--heading
output the file name as a heading of a matching file
--break
output an empty line between matching files
-T
tabulate line and column numbers to add spacing
--color
colorize the output when displayed on a terminal (default)
--colors=COLORS
specify a color pallette COLORS, see also ug --help colors
--hyperlink=+
embed hyperlinks in the output when sent to a terminal, with linked line/column numbers when =+ is specified
--pager
output to a pager, default is more or less
--pager=COMMAND
output to COMMAND as a pager
--tag
output matches as ___match___ instead of colorizing them, where --tag=TAG,TAG outputs TAGmatchTAG
--replace="FORMAT"
replace matches in the output by FORMAT, see also ug --help format
--separator=SEP
specify SEP to separate line and column numbers from the match
--group-separator=SEP
specify SEP to separate context for options -ABC
  1. ug enables --pretty and --sort by default
  2. ugrep only enables --color by default
 

Getting help

--help WHAT
display help on WHAT you are looking for
--help count
display help on option -c (--count) and -m [MIN,][MAX] (--min-count=MIN, --max-count=MAX)
--help regex
display help with details on regex patterns
--help globs
display help with details on glob patterns, e.g. for option -g
 

Regex

. any character except \n
a the character a
ab the string ab
a|b a or b
a* zero or more a's
a+ one or more a's
a? zero or one a
a{3} 3 a's
a{3,} 3 or more a's
a{3,7} 3 to 7 a's
a*? zero or more a's lazily
a+? one or more a's lazily
a?? zero or one a lazily
a{3}? 3 a's lazily
a{3,}? 3 or more a's lazily
a{3,7}?3 to 7 a's lazily
\. escapes . to match .
\Q...\E the literal string ...
\f form feed
\n newline
\r carriage return
\R any Unicode line break
\t tab
\v vertical tab
\X any character and \n
\cZ control character ^Z
\0 NUL
\0ddd octal character code ddd
\xhh hex character code hh
\x{hhhh}Unicode code point U+hhhh
\u{hhhh}Unicode code point U+hhhh
[abc-e] one character a,b,c,d,e
[^abc-e] one char not a,b,c,d,e,\n
[[:alnum:]] a-z,A-Z,0-9
[[:alpha:]] a-z,A-Z
[[:ascii:]] ASCII char \x00-\x7f
[[:blank:]] space or tab
[[:cntrl:]] control characters
[[:digit:]] 0-9
[[:graph:]] visible characters
[[:lower:]] a-z
[[:print:]] visible chars and space
[[:punct:]] punctuation characters
[[:space:]] space,\t,\v,\f,\r
[[:upper:]] A-Z
[[:word:]] a-z,A-Z,0-9,_
[[:xdigit:]] 0-9,a-f,A-F
\p{Class}one character in Class
\P{Class}one char not in Class
\d a digit
\D a non-digit
\h a space or tab
\H not a space or tab
\s a whitespace except \n
\S a non-whitespace
\w a word character
\W a non-word character
^ begin of line anchor
$ end of line anchor
\A begin of file anchor
\Z end of file anchor
\b word boundary
\B non-word boundary
\< start of word boundary
\> end of word boundary
(?=...) lookahead (-P)
(?!...) negative lookahead (-P)
(?<=...)lookbehind (-P)
(?<!...)negative lookbehind (-P)
(...) capturing group (-P)
(...) non-capturing group
(?:...) non-capturing group
(?<X>...)capturing, named X (-P)
\1 matches group 1 (-P)
\g{10} matches group 10 (-P)
\g{X} matches group name X (-P)
(?#...) comments ... are ignored
  1. (-P): this pattern requires option -P for PCRE (Perl regular expressions)
  2. ERE (Extended Regular Expression) syntax is the default regex pattern syntax of ugrep (as shown)
  3. BRE (Basic Regular Expression) syntax with option -G replaces | with \|, + with \+, ? with \?, ( ) with \( \), and { } with \{ \}
  4. (negated) character classes such as \s and [^abc-e] do not match a newline \n
  5. explicitly specify a \n or a \R in a pattern such as "go[\s\n]up" to match multiple lines as a single match
  6. specify option -P to match Unicode instead of ASCII word boundaries \b, \B, \< and \> (Unicode word boundary matching will likely become the default in a future update)
 

Globs

Ugrep supports gitignore-style globbing for all glob-related options -g, --iglob=, --exclude=, --include=, -include-dir=, --exclude-dir=, --include-from=, --exclude-from=, and --ignore-files, where

* matches anything except /
? matches any one character except /
[abc-e] matches one character a,b,c,d,e
[^abc-e]matches one character not a,b,c,d,e,/
[!abc-e]matches one character not a,b,c,d,e,/
/ when used at the start of a glob, matches the working directory
**/ matches zero or more directories on a path
/** when at the end of a glob, matches all paths after the /
\? matches a ? or any other character specified after the backslash
  1. to prevent the shell from expanding globs, you must quote globs like "*.cpp" in command-line options such as -g"*.cpp",
  2. a glob pattern starting with a ^ or a ! inverts matching: instead of matching a filename or directory name, the directory or file is ignored and excluded from the search
  3. when a glob pattern contains a /, the full pathname is matched, otherwise, the basename of a file or directory is matched in recursive searches
  4. when a glob pattern starts with a /, the glob matches files and directories from the working directory path, not recursively
  5. when a glob pattern ends with a /, the glob matches directories, not files
 

File types

The -t or --file-type= argument is a comma-separated list of file types. A file type is associated with one or more filename extensions, internally using option -O to match filename extensions. For capitalized file types, the search is expanded to include files with matching file signature magic bytes, internally using option -M. When a type is preceded by a ! or a ^, excludes files of the specified type.

actionscript=-O as,mxml
ada=-O ada,adb,ads
asm=-O asm,s,S
asp=-O asp
aspx=-O master,ascx,asmx,aspx,svc
autoconf=-O ac,in
automake=-O am,in
awk=-O awk
Awk=-O awk
-M '#!\h*/.*\Wg?awk(\W.*)?\n'
basic=-O bas,BAS,cls,frm,ctl,vb,resx
batch=-O bat,BAT,cmd,CMD
bison=-O y,yy,ymm,ypp,yxx
c=-O c,h,H,hdl,xs
c++=-O cpp,CPP,cc,cxx,CXX,h,hh,H,hpp,hxx,Hxx,HXX
clojure=-O clj
cpp=-O cpp,CPP,cc,cxx,CXX,h,hh,H,hpp,hxx,Hxx,HXX
csharp=-O cs
css=-O css
csv=-O csv
dart=-O dart
Dart=-O dart
-M '#!\h*/.*\Wdart(\W.*)?\n'
delphi=-O pas,int,dfm,nfm,dof,dpk,dproj,groupproj,bdsgroup,bdsproj
elisp=-O el
elixir=-O ex,exs
erlang=-O erl,hrl
fortran=-O for,ftn,fpp,f,F,f77,F77,f90,F90,f95,F95,f03,F03
gif=-O gif
Gif=-O gif
-M 'GIF87a|GIF89a'
go=-O go
groovy=-O groovy,gtmpl,gpp,grunit,gradle
gsp=-O gsp
haskell=-O hs,lhs
html=-O htm,html,xhtml
jade=-O jade
java=-O java,properties
jpeg=-O jpg,jpeg
Jpeg=-O jpg,jpeg
-M '\xff\xd8\xff[\xdb\xe0\xe1\xee]'
js=-O js
json=-O json
jsp=-O jsp,jspx,jthm,jhtml
julia=-O jl
kotlin=-O kt,kts
less=-O less
lex=-O l,ll,lmm,lpp,lxx
lisp=-O lisp,lsp
lua=-O lua
m4=-O m4
make=-O mk,mak
-g makefile,Makefile,Makefile.Debug,Makefile.Release
markdown=-O md
matlab=-O m
node=-O js
Node=-O js
-M '#!\h*/.*\Wnode(\W.*)?\n'
objc=-O m,h
objc++=-O mm,h
ocaml=-O ml,mli,mll,mly
parrot=-O pir,pasm,pmc,ops,pod,pg,tg
pascal=-O pas,pp
pdf=-O pdf
Pdf=-O pdf
-M '\x25\x50\x44\x46\x2d'
perl=-O pl,PL,pm,pod,t,psgi
Perl=-O pl,PL,pm,pod,t,psgi
-M '#!\h*/.*\Wperl(\W.*)?\n'
php=-O php,php3,php4,phtml
Php=-O php,php3,php4,phtml
-M '#!\h*/.*\Wphp(\W.*)?\n'
png=-O png
Png=-O png
-M '\x89PNG\x0d\x0a\x1a\x0a'
prolog=-O pl,pro
python=-O py
Python=-O py
-M '#!\h*/.*\Wpython[23]?(\W.*)?\n'
r=-O R
rpm=-O rpm
Rpm=-O rpm
-M '\xed\xab\xee\xdb'
rst=-O rst
rtf=-O rtf
Rtf=-O rtf
-M '\{\rtf1'
ruby=-O rb,rhtml,rjs,rxml,erb,rake,spec
-g Rakefile
Ruby=-O rb,rhtml,rjs,rxml,erb,rake,spec
-g Rakefile
-M '#!\h*/.*\Wruby(\W.*)?\n'
rust=-O rs
scala=-O scala
scheme=-O scm,ss
shell=-O sh,bash,dash,csh,tcsh,ksh,zsh,fish
Shell=-O sh,bash,dash,csh,tcsh,ksh,zsh,fish
-M '#!\h*/.*\W(ba|da|t?c|k|z|fi)?sh(\W.*)?\n'
smalltalk=-O st
sql=-O sql,ctl
svg=-O svg
swift=-O swift
tcl=-O tcl,itcl,itk
tex=-O tex,cls,sty,bib
text=-O text,txt,TXT,md,rst
tiff=-O tif,tiff
Tiff=-O tif,tiff
-M '\x49\x49\x2a\x00|\x4d\x4d\x00\x2a'
tt=-O tt,tt2,ttml
typescript=-O ts,tsx
verilog=-O v,vh,sv
vhdl=-O vhd,vhdl
vim=-O vim
xml=-O xml,xsd,xsl,xslt,wsdl,rss,svg,ent,plist
Xml=-O xml,xsd,xsl,xslt,wsdl,rss,svg,ent,plist
-M '<\?xml '
yacc=-O y
yaml=-O yaml,yml
zig=-O zig,zon
 

Filters

A filter utility is associated with one or more filename extensions using the syntax --filter="ext1,ext2,ext3:command". Options to the specified command may be included. The special option % is expanded into the pathname of the file to search. Filters are applied first when the filename extension matches one of the specified filters, then the output of the filter is searched. Some examples:

--filter="pdf:pdftotext % -"
search PDF files, like ug+
--filter="doc:antiword %"
search documents, like ug+
--filter="odt,docx,epub,rtf:pandoc --wrap=preserve -t plain % -o -"
search documents and e-books, like ug+
--filter="gif,jpg,jpeg,mpg,mpeg,png,tiff:exiftool %"
search image metadata, like ug+
--filter="odt,doc,docx,rtf,xls,xlsx,ppt,pptx:soffice --headless --cat %"
search documents, spreadsheets and presentations (this is slow)
--filter="pem:openssl x509 -text,cer,crt,der:openssl x509 -text -inform der"
search certificates
--filter="jis:iconv -f SHIFT-JIS -t UTF-8"
search .jis files encoded in Shift-JIS format converted to UTF-8
  1. a filter utility should be a command or a script that produces standard output (to search)
  2. instead of a filename extension alone, it is also possible to specify a file's "magic bytes" regex pattern with --filter-magic-label="LABEL:MAGIC" to associate the MAGIC regex pattern when found at the start of a file with a LABEL to be used as a filename extension in a --filter="LABEL:command"
  3. UTF-8, UTF-16 and UTF-32 input is automatically searched and does not require a filter
  4. the Shift-JIS conversion in the example is a special case, option --encoding= supports the arguments binary, ASCII, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, LATIN1, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16, MAC, MACROMAN, EBCDIC, CP437, CP850, CP858, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, CP1258, KOI8-R, KOI8-U, KOI8-RU
 

Colors

The --colors= argument is a colon-separated list of parameters, such as --colors=sl=hy:mt=hyB, where

sl=selected lines
cx=context lines
rv swaps the sl= and cx= capabilities when -v is specified
mt=matching text in any matching line
ms=matching text in a selected line, the substring mt= by default
mc=matching text in a context line, the substring mt= by default
fn=file names
ln=line numbers
cn=column numbers
bn=byte offsets
se=separators
hl hyperlink file names, same as --hyperlink
qp=TUI prompt
qe=TUI errors
qr=TUI regex
qm=TUI regex meta characters
ql=TUI regex lists and literals
qb=TUI regex braces

Multiple SGR codes may be specified for a single parameter when separated by a semicolon, for example --colors="mt=1;31" specifies bright red. For quick and easy color specification, the corresponding single-letter color names may be used in place of numeric SGR codes and semicolons are not required to separate color names, for example --colors=mt=hr specifies bright red. Color letters and numeric codes may be mixed. The following SGR codes have corresponding letter designations:

0 nnormal font and color 2 f faint (not widely supported)
1 hhighlighted bold font 21 H highlighted bold off
4 uunderline 24 U underline off
7 iinvert video 27 I invert off
30 kblack text 90 +kbright gray text
31 rred text 91 +rbright red text
32 ggreen text 92 +gbright green text
33 yyellow text 93 +ybright yellow text
34 bblue text 94 +bbright blue text
35 mmagenta text 95 +mbright magenta text
36 ccyan text 96 +cbright cyan text
37 wwhite text 97 +wbright white text
40 Kblack background 100 +Kbright gray background
41 Rdark red background 101 +Rbright red background
42 Gdark green background 102 +Gbright green background
43 Ydark yellow backgrounda 103 +Ybright yellow background
44 Bdark blue background 104 +Bbright blue background
45 Mdark magenta background 105 +Mbright magenta background
46 Cdark cyan background 106 +Cbright cyan background
47 Wdark white background 107 +Wbright white background

The default color scheme is cx=33: mt=1;31: fn=1;35: ln=1;32: cn=1;32: bn=1;32: se=36: qp=1;32: qe=1;37;41: qm=1;32: ql=36: qb=1;35  

Custom output formatting

Formatted output and match replacement puts you in full control of the output. In fact, CSV (--csv), JSON (--json) and XML (--xml) are all produced this way.

--format-begin="FORMAT"format string for beginning the search
--format-open="FORMAT" format string for opening a file when a match was found
--format="FORMAT" format string for each match in a file
--format-close="FORMAT"format string for closing a file when a match was found
--format-end="FORMAT" format string for ending the search
--replace="FORMAT" replace matches in the output with the specified formatted string

where FORMAT may contain any text and the following format fields:

%F if option -H is used: the file pathname and separator
%[TEXT]F if option -H is used: TEXT, the file pathname and separator
%f the file pathname
%a the file basename without directory path
%p the directory path to the file
%z the pathname in a (compressed) archive, without { and }
%H if option -H is used: the quoted pathname and separator, \" and \\ replace " and \
%[TEXT]H if option -H is used: TEXT, the quoted pathname and separator, \" and \\ replace " and \
%h the quoted file pathname, \" and \\ replace " and \
%N if option -n is used: the line number and separator
%[TEXT]N if option -n is used: TEXT, the line number and separator
%n the line number of the match
%K if option -k is used: the column number and separator
%[TEXT]K if option -k is used: TEXT, the column number and separator
%k the column number of the match
%B if option -b is used: the byte offset and separator
%[TEXT]B if option -b is used: TEXT, the byte offset and separator
%b the byte offset of the match
%T if option -T is used: TEXT and a tab character
%[TEXT]T if option -T is used: TEXT and a tab character
%t a tab character
%[SEP]$ set field separator to SEP for the rest of the format fields
%[TEXT]< if the first match: TEXT
%[TEXT]> if not the first match: TEXT
%, if not the first match: a comma, same as %[,]<
%: if not the first match: a colon, same as %[:]>
%; if not the first match: a semicolon, same as %[;]>
%| if not the first match: a vertical bar, same as %[|]>
%S if not the first match: separator, see also %[SEP]$
%[TEXT]S if not the first match: TEXT and separator, see also %[SEP]$
%s the separator, see also %[TEXT]S and %[SEP]$
%~ a newline character
%+ if option --heading is used: %F and a newline character, suppress all %F and %H afterward
%m the number of matches, sequential (or number of matching files with --format-end)
%M the number of matching lines (or number of matching files with --format-end)
%O the matching line is output as is (a raw string of bytes)
%o the match is output as is (a raw string of bytes)
%Q the matching line as a quoted string, \" and \\ replace " and \
%q the match as a quoted string, \" and \\ replace " and \
%C the matching line formatted as a quoted C/C++ string
%c the match formatted as a quoted C/C++ string
%J the matching line formatted as a quoted JSON string
%j the match formatted as a quoted JSON string
%V the matching line formatted as a quoted CSV string
%v the match formatted as a quoted CSV string
%X the matching line formatted as XML character data
%x the match formatted as XML character data
%w the width of the match, counting (wide) characters
%d the size of the match, counting bytes
%e the ending byte offset of the match
%Z the edit distance cost of an approximate match with option -Z
%u select unique lines only unless option -u is used
%1 %2 ... %9 the first regex group capture of the match, and so on up to group %9, requires option -P
%[NUM]# the regex group capture NUM; requires option -P
%[NUM]b the byte offset of the group capture NUM; requires option -P
%[NUM]e the ending byte offset of the group capture NUM; requires option -P
%[NUM]d the byte length of the group capture NUM; requires option -P
%[NUM1|NUM2|...]# the first group capture NUM in the list that matched; requires option -P
%[NUM1|NUM2|...]b the byte offset of the first group capture NUM in the list that matched; requires option -P
%[NUM1|NUM2|...]e the ending byte offset of the first group capture NUM in the list that matched; requires option -P
%[NUM1|NUM2|...]d the byte length of the first group capture NUM in the list that matched; requires option -P
%[NAME]# the NAMEd group capture; requires option -P and capturing pattern (?<NAME>PATTERN)
%[NAME]b the byte offset of the NAMEd group capture; requires option -P and capturing pattern (?<NAME>PATTERN)
%[NAME]e the ending byte offset of the NAMEd group capture; requires option -P and capturing pattern (?<NAME>PATTERN)
%[NAME]d the byte length of the NAMEd group capture; requires option -P and capturing pattern (?<NAME>PATTERN)
%[NAME1|NAME2|...]# the first NAMEd group capture in the list that matched; requires option -P and capturing pattern (?<NAME>PATTERN)
%[NAME1|NAME2|...]b the byte offset of the first NAMEd group capture in the list that matched; requires option -P and capturing pattern (?<NAME>PATTERN)
%[NAME1|NAME2|...]e the ending byte offset of the first NAMEd group capture in the list that matched; requires option -P and capturing pattern (?<NAME>PATTERN)
%[NAME1|NAME2|...]d the byte length of the first NAMEd group capture in the list that matched; requires option -P and capturing pattern (?<NAME>PATTERN)
%G all group capture indices/names of the match
%[TEXT1|TEXT2|...]G all TEXT indexed by group capture indices that matched; requires option -P
%g the group capture index of the match or 1
%[TEXT1|TEXT2|...]g the TEXT indexed by the first group capture index that matched; requires option -P
%% the percentage sign
  1. output for --csv is internally produced with
    --format-open='%+'
    --format='%[,]$%H%N%K%B%V%~%u'
    
  2. output for --json is internally produced with
    --format-begin='['
    --format-open='%,%~  {%~    %[,%~    ]$%["file": ]H"matches": ['
    --format='%,%~      { %[, ]$%["line": ]N%["column": ]K%["offset": ]B"match": %J }%u'
    --format-close='%~    ]%~  }'
    --format-end='%~]%~'
    
  3. output for --xml is internally produced with
    --format-begin='<grep>%~'
    --format-open='  <file%[]$%[ name=]H>%~'
    --format='    <match%[\"]$%[ line=\"]N%[ column=\"]K%[ offset=\"]B>%X</match>%~%u'
    --format-close='  </file>%~'
    --format-end='</grep>%~'
    
  4. to output replaced matches in a file while keeping the rest of the file unchanged, use option --replace="FORMAT" and -y (--any-line or --passthru)
  5. to replace matches with corresponding text substitutions, you can use -P "(PATTERN1)|(PATTERN2)|...|(PATTERNn)" --replace="%[TEXT1|TEXT2|...|TEXTn]g" for example -P -iw "(one)|(two)|(three)" --replace="%[ūnum|duo|tria]g"
 

Indexing

The new ugrep-indexer tool indexes a directory tree to accelerate cold searching. File searching is generally slow when a file system on a drive is not cached in memory, i.e. when most files are "cold". Indexing accelerates recursive searching by performing a quick check on precomputed indexes to only search those files that may match.

Indexed-based search with ugrep is safe and never skips new or updated files that may now match. If any files and directories are added or changed after indexing, then ugrep will search these additions and changes made to the file system by comparing file and directory time stamps to the indexing time stamp. When many files were added or changed, then you may want to re-index to bring the indexes up to date. Re-indexing is incremental, it will not take as much time as the initial indexing process.

Please note that indexing is effective for large file systems on slower storage media or when searching many zip and tarball archives. Indexing won't speed up regular file searching on fast nVME SSDs, for example.

ugrep-indexer -Iz -v
recursively (re-)index the working directory tree, ignore binary files (option -I), index archives and compressed files (option -z), showing verbose output (option -v)
ugrep-indexer -Iz -v PATH
same as above, but (re-)index the specified directory tree PATH
ugrep-indexer -f -0 -Iz -v PATH
force full re-indexing with lowest index match accuracy to minimize index files (option -0 for zero, default is -5 for five)
ugrep-indexer -c PATH
check the directory tree PATH indexes, the default is to check the working directory tree
ugrep-indexer -d PATH
delete the hidden index files from the directory tree PATH, the default is to delete index files from the working directory tree
ug --index -Iz OPTIONS PATTERN
perform an index-based recursive search, ignore binary files (option -I), also search archives and compressed files (option -z)
ug --index -r -Iz OPTIONS PATTERN PATH
same as above, but perform an index-based recursive search on the specified directory tree PATH
ug --index OPTIONS PATTERN FILE
search FILE, but not using an index (only recursive searching is accelerated)
  1. ugrep-indexer option -v reports progress; to create a log, redirect ugrep-indexer -v output to a log file
  2. ugrep-indexer option -S follows symlinks to files; indexing never follows symlinks to directories
  3. ugrep-indexer option --ignore-files obeys .gitignore rules
  4. ugrep-indexer options -z --zmax=2 indexes nested archives and tarballs (two levels)
  5. ug option --index works with all other search options, except for options -P, -Z, -v and --filter
  6. ug option --stats reports index-based search details, including false positives; false positives are reduced with higher indexing accuracy and/or by using more specific search patterns
 

Bugs

If you found a bug or an issue, then please report it at https://github.com/Genivia/ugrep/issues  

License

Ugrep is open source BSD-3 licensed:

Permissions
✔️ commercial use
✔️ modification
✔️ distribution
✔️ private use
Limitations
❌ liability
❌ warranty
Conditions
ⓘ include license
ⓘ copyright notice

Ugrep is written by Robert A. van Engelen, Copyright (c) 2024 Robert A. van Engelen, Genivia Inc.

The ugrep author received the 🏆 Google Peer Bonus Award 2022 for developing ugrep

Ugrep project repo: https://github.com/Genivia/ugrep ⭐️ thank you for starring the project!

Ugrep uses the RE/flex regex library: https://github.com/Genivia/RE-flex

Ugrep -P uses the PCRE2 library: https://www.pcre.org


See also: gnu grep, bsd grep, git grep, pcre grep, agrep, ack, ag, rg, sift

Last updated: Tue Apr 2, 2024