Add "How To Use Your Email Client For Physical Mail" post
parent
faf474273e
commit
1adf37a91d
@ -1,2 +1,3 @@
|
||||
posts/use-your-mail-client-for-physical-mail/index.html
|
||||
posts/simple-static-blog/index.html
|
||||
posts/introduction/index.html
|
||||
|
@ -0,0 +1,128 @@
|
||||
<!DOCTYPE HTML>
|
||||
<html>
|
||||
<head>
|
||||
<title>Use Your Email Client For Physical Mail</title>
|
||||
<meta charset="UTF-8">
|
||||
</head>
|
||||
|
||||
<style type="text/css">
|
||||
html {
|
||||
font-family: Helvetica, Arial, sans-serif;
|
||||
color: #5b4636;
|
||||
background-color: #f4ecd8;
|
||||
}
|
||||
|
||||
body {
|
||||
padding: 1em;
|
||||
margin: auto;
|
||||
}
|
||||
|
||||
@media only all and (pointer: coarse), (pointer: none) {
|
||||
body {
|
||||
font-size: 5.5vmin;
|
||||
}
|
||||
}
|
||||
|
||||
@media only all and (pointer: fine) {
|
||||
body {
|
||||
font-size: calc(16px + 0.6vmin);
|
||||
min-width: 500px;
|
||||
max-width: 50em;
|
||||
}
|
||||
}
|
||||
|
||||
aside {
|
||||
width: 30%;
|
||||
min-width: 10em;
|
||||
background-color: rgba(0,0,0, 0.1);
|
||||
float: right;
|
||||
padding: 1em;
|
||||
margin: 1em;
|
||||
}
|
||||
</style>
|
||||
|
||||
<body>
|
||||
<a href="../../blog.html">Home</a>
|
||||
<article>
|
||||
<h1>How To Use Your Email Client For Physical Mail</h1>
|
||||
<p>
|
||||
Whether it's to re-read a conversation, find a plane ticket I ordered or check
|
||||
when a meeting was planned, I often find myself looking up old emails. It's
|
||||
usually easy to do so because email clients are designed for the task: Many of
|
||||
them support full-text search and some even complement that with advanced
|
||||
tagging and categorization systems. To be honest I have become completely
|
||||
dependent on those features for my day to day operation. Having full-text
|
||||
search and some sort of categorization for mail can be a huge time
|
||||
saver. Wouldn't it be nice if we had all of that functionality to deal with
|
||||
physical mail as well? I thought it would, so I set out to find a way to
|
||||
achieve just that. Turns out it's pretty simple!
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The main objective here is to transform our physical mail into an email
|
||||
that can be received, indexed and read by our email client of choice. Now,
|
||||
one way to do that would be to type the contents of our mail into an email
|
||||
by hand, but <i>ain't nobody got time for that!</i>. The (more appealing)
|
||||
alternative is to use a document scanner. I have a single purpose scanner
|
||||
unit from Canon that I hook up to my laptop for just this purpose.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
It isn't as simple as just emailing a scanned document to ourselves
|
||||
though: email clients are smart, but they can't understand a word of text
|
||||
in our PDF or JPEG of a physical document. They need content to be in
|
||||
plain text form in order to provide us with some of their best features
|
||||
like full-text search. We'll have to somehow transform our scanned
|
||||
documents into plain text that we can include in our email. To do this, we
|
||||
can use tesseract. Tesseract is an optical character recognition (OCR)
|
||||
engine, meaning that it can recognize text in images and extract it for
|
||||
us. Installing it should be easy on Debian derivative distros like
|
||||
Ubuntu. My laptop is running Debian unstable so I just ran <code>apt
|
||||
install tesseract</code> and started using it. Using it is as easy as
|
||||
upening up a terminal and typing <code>tesseract FILE.jpg
|
||||
OUTPUT</code>. That command will save all the text that tesseract is able
|
||||
to recognize in the image FILE.jpg to a file called OUTPUT.txt.
|
||||
</p>
|
||||
|
||||
<aside>
|
||||
<i>
|
||||
Side note: I am Dutch, so most of my physical mail is in Dutch. To
|
||||
make tesseract better understand my mail I installed the
|
||||
tesseract-ocr-nld package using <code>apt install
|
||||
tesseract-ocr-nld</code>. You can check what other language packs are
|
||||
available by using <code>apt search tesseract-ocr</code>.
|
||||
</i>
|
||||
</aside>
|
||||
|
||||
<p>
|
||||
All we have to do from there is copy-paste the contents of that file into
|
||||
an email and send it to ourselves! Depending on the formatting of the
|
||||
input document, the output may not always be pleasant to read. We can
|
||||
account for this by including the original document as an attachment to
|
||||
the email. That way we get the best of both worlds: we can use the search
|
||||
functionality of our email client to find the document, and then read it
|
||||
in its original form by opening the attachment.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
This is all easy enough, but I'm lazy. I didn't feel like opening up my
|
||||
email client and doing manual copy-pasting, so I decided to automate the
|
||||
process a little further. I have postfix setup on my system to relay to my
|
||||
mail server, so I can simply use the <code>mail</code> command to send emails without a
|
||||
GUI mail client. I combined that with tesseract in a little bash
|
||||
script. The script iterates through all of its arguments and interprets
|
||||
them as filenames of scanned documents. It calls tesseract to extract text
|
||||
from them, concatenates the results, attaches the files to an email and
|
||||
sends it to my personal email address. Now all I have to do is run the
|
||||
script with filenames of some documents and my job is done. If anyone is
|
||||
interested in an actual program that does the same thing and doesn't
|
||||
require you to setup postfix, let me know! I might consider authoring one
|
||||
if it's useful to more people than just myself. The script I'm currently
|
||||
using can be found <a href="scan-to-mailpile.bash.html">here (pretty)</a>
|
||||
and <a href="scan-to-mailpile.bash">here (raw)</a>, but I don't recommend
|
||||
using it if you don't fully understand its contents, it's not a polished
|
||||
user experience 🤓.
|
||||
</p>
|
||||
</article>
|
||||
</body>
|
||||
</html>
|
@ -0,0 +1 @@
|
||||
Mon 17 Feb 2020 11:55:42 AM CET
|
@ -0,0 +1,57 @@
|
||||
#!/bin/bash
|
||||
|
||||
if ! [[ $# -ge 1 ]]; then
|
||||
echo 'Usage: scan-to-mailpile ...FILES' >&2
|
||||
|
||||
exit
|
||||
fi
|
||||
|
||||
if ! type_output="$(type readlink mktemp pdftotext tesseract mail mimetype basename cat 2>&1)"; then
|
||||
printf 'scan-to-mailpile: Some required commands are missing, lookup results:\n%s\n' \
|
||||
"$type_output" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
tmpdir=$(mktemp -d) || exit $?
|
||||
|
||||
printf -v trap 'rm -vr %q' "$tmpdir"
|
||||
trap "$trap" EXIT
|
||||
|
||||
printf 'Changing directory: '
|
||||
pushd "$tmpdir" || exit $?
|
||||
|
||||
declare -a file_args=()
|
||||
|
||||
{
|
||||
for file in "$@"; do
|
||||
file="$(readlink -f "$file")" || exit $?
|
||||
|
||||
# Note: pdftotext will not work for scanned documents, so those should just be
|
||||
# saved as image files before feeding them to this script.
|
||||
##
|
||||
# It will however work fine for other types of PDFs.
|
||||
if [[ "$file" == *.pdf ]]; then
|
||||
pdftotext "$file" /dev/fd/1 || exit $?
|
||||
else
|
||||
tesseract "$file" stdout || exit $?
|
||||
fi
|
||||
|
||||
mime="$(mimetype -b "$file")" || exit $?
|
||||
|
||||
attachment_args+=(--content-type="$mime" --attach="$file")
|
||||
done
|
||||
} > ./outfile.txt
|
||||
|
||||
cat ./outfile.txt
|
||||
|
||||
file1="$(basename "$1")"
|
||||
|
||||
read -i "${file1%.*}" -rep 'What should the subject of the email be? ' subject
|
||||
|
||||
mail --subject="$subject" \
|
||||
"${attachment_args[@]}" \
|
||||
--content-type="text/plain" \
|
||||
--content-filename="content.txt" \
|
||||
user@example.com < ./outfile.txt
|
||||
|
||||
popd
|
@ -0,0 +1,129 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
|
||||
<!-- Created by htmlize-1.56 in css mode. -->
|
||||
<html>
|
||||
<head>
|
||||
<title>scan-to-mailpile.bash</title>
|
||||
<style type="text/css">
|
||||
<!--
|
||||
body {
|
||||
color: #f6f3e8;
|
||||
background-color: #242424;
|
||||
}
|
||||
.builtin {
|
||||
/* font-lock-builtin-face */
|
||||
color: #e5786d;
|
||||
}
|
||||
.comment {
|
||||
/* font-lock-comment-face */
|
||||
color: #99968b;
|
||||
}
|
||||
.comment-delimiter {
|
||||
/* font-lock-comment-delimiter-face */
|
||||
color: #99968b;
|
||||
}
|
||||
.flyspell-duplicate {
|
||||
/* flyspell-duplicate */
|
||||
text-decoration: underline;
|
||||
}
|
||||
.flyspell-incorrect {
|
||||
/* flyspell-incorrect */
|
||||
text-decoration: underline;
|
||||
}
|
||||
.keyword {
|
||||
/* font-lock-keyword-face */
|
||||
color: #8ac6f2;
|
||||
font-weight: bold;
|
||||
}
|
||||
.negation-char {
|
||||
}
|
||||
.sh-escaped-newline {
|
||||
/* sh-escaped-newline */
|
||||
color: #95e454;
|
||||
}
|
||||
.sh-quoted-exec {
|
||||
/* sh-quoted-exec */
|
||||
color: #fa8072;
|
||||
}
|
||||
.string {
|
||||
/* font-lock-string-face */
|
||||
color: #95e454;
|
||||
}
|
||||
.variable-name {
|
||||
/* font-lock-variable-name-face */
|
||||
color: #cae682;
|
||||
}
|
||||
|
||||
a {
|
||||
color: inherit;
|
||||
background-color: inherit;
|
||||
font: inherit;
|
||||
text-decoration: inherit;
|
||||
}
|
||||
a:hover {
|
||||
text-decoration: underline;
|
||||
}
|
||||
-->
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<pre>
|
||||
<span class="comment-delimiter"> #</span><span class="comment">!/bin/</span><span class="keyword">bash</span><span class="comment">
|
||||
</span>
|
||||
<span class="keyword"> if</span> <span class="negation-char">!</span> [[ $<span class="variable-name">#</span> -ge 1 ]]; <span class="keyword">then</span>
|
||||
<span class="builtin">echo</span> <span class="string">'Usage: scan-to-mailpile ...FILES'</span> >&2
|
||||
|
||||
<span class="keyword">exit</span>
|
||||
<span class="keyword"> fi</span>
|
||||
|
||||
<span class="keyword"> if</span> <span class="negation-char">!</span> <span class="variable-name">type_output</span>=<span class="string">"$(</span><span class="sh-quoted-exec">type</span><span class="string"> readlink mktemp pdftotext tesseract mail mimetype basename cat 2>&1)"</span>; <span class="keyword">then</span>
|
||||
<span class="builtin">printf</span> <span class="string">'scan-to-mailpile: Some required commands are missing, lookup results:\n%s\n'</span> <span class="sh-escaped-newline">\</span>
|
||||
<span class="string">"$type_output"</span> >&2
|
||||
<span class="keyword">exit</span> 1
|
||||
<span class="keyword"> fi</span>
|
||||
|
||||
<span class="variable-name"> tmpdir</span>=$(<span class="sh-quoted-exec">mktemp</span> -d) || <span class="keyword">exit</span> $<span class="variable-name">?</span>
|
||||
|
||||
<span class="builtin"> printf</span> -v trap <span class="string">'rm -vr %q'</span> <span class="string">"$tmpdir"</span>
|
||||
<span class="keyword"> trap</span> <span class="string">"$trap"</span> EXIT
|
||||
|
||||
<span class="builtin"> printf</span> <span class="string">'Changing directory: '</span>
|
||||
<span class="builtin"> pushd</span> <span class="string">"$tmpdir"</span> || <span class="keyword">exit</span> $<span class="variable-name">?</span>
|
||||
|
||||
<span class="builtin"> declare</span> -a <span class="variable-name">file_args</span>=()
|
||||
|
||||
{
|
||||
<span class="keyword">for</span> file<span class="keyword"> in</span> <span class="string">"$@"</span>; <span class="keyword">do</span>
|
||||
<span class="variable-name">file</span>=<span class="string">"$(</span><span class="sh-quoted-exec">readlink</span><span class="string"> -f "$file")"</span> || <span class="keyword">exit</span> $<span class="variable-name">?</span>
|
||||
|
||||
<span class="comment-delimiter"># </span><span class="comment">Note: </span><span class="comment"><span class="flyspell-duplicate">pdftotext</span></span><span class="comment"> will not work for scanned documents, so those should just be
|
||||
</span> <span class="comment-delimiter"># </span><span class="comment">saved as image files before feeding them to this script.
|
||||
</span> <span class="comment-delimiter">##</span><span class="comment">
|
||||
</span> <span class="comment-delimiter"># </span><span class="comment">It will however work fine for other types of </span><span class="comment"><span class="flyspell-incorrect">PDFs</span></span><span class="comment">.
|
||||
</span> <span class="keyword">if</span> [[ <span class="string">"$file"</span> == *.pdf ]]; <span class="keyword">then</span>
|
||||
pdftotext <span class="string">"$file"</span> /dev/fd/1 || <span class="keyword">exit</span> $<span class="variable-name">?</span>
|
||||
<span class="keyword">else</span>
|
||||
tesseract <span class="string">"$file"</span> stdout || <span class="keyword">exit</span> $<span class="variable-name">?</span>
|
||||
<span class="keyword">fi</span>
|
||||
|
||||
<span class="variable-name">mime</span>=<span class="string">"$(</span><span class="sh-quoted-exec">mimetype</span><span class="string"> -b "$file")"</span> || <span class="keyword">exit</span> $<span class="variable-name">?</span>
|
||||
|
||||
<span class="variable-name">attachment_args</span>+=(--content-type=<span class="string">"$mime"</span> --attach=<span class="string">"$file"</span>)
|
||||
<span class="keyword">done</span>
|
||||
} > ./outfile.txt
|
||||
|
||||
cat ./outfile.txt
|
||||
|
||||
<span class="variable-name"> file1</span>=<span class="string">"$(</span><span class="sh-quoted-exec">basename</span><span class="string"> "$1")"</span>
|
||||
|
||||
<span class="builtin"> read</span> -i <span class="string">"${file1%.*}"</span> -rep <span class="string">'What should the subject of the email be? '</span> subject
|
||||
|
||||
mail --subject=<span class="string">"$subject"</span> <span class="sh-escaped-newline">\</span>
|
||||
<span class="string">"${attachment_args[@]}"</span> <span class="sh-escaped-newline">\</span>
|
||||
--content-type=<span class="string">"text/plain"</span> <span class="sh-escaped-newline">\</span>
|
||||
--content-filename=<span class="string">"content.</span><span class="string"><span class="flyspell-duplicate">txt</span></span><span class="string">"</span> <span class="sh-escaped-newline">\</span>
|
||||
user@example.com < ./outfile.txt
|
||||
|
||||
<span class="builtin">popd</span>
|
||||
</pre>
|
||||
</body>
|
||||
</html>
|
Loading…
Reference in New Issue