Add "How To Use Your Email Client For Physical Mail" post

master
Hugo Thunnissen 4 years ago
parent faf474273e
commit 1adf37a91d

@ -39,7 +39,14 @@
</div>
<h1>Blog</h1>
<div><a href="posts/simple-static-blog/index.html"><h2 style="margin-bottom: 0.1em;"> Creating a Simple Static Blog </h2></a>
<div><a href="posts/use-your-mail-client-for-physical-mail/index.html"><h2 style="margin-bottom: 0.1em;"> How To Use Your Email Client For Physical Mail </h2></a>
<i style="font-size: 0.8em;">Mon 17 Feb 2020 11:55:42 AM CET</i><p style="margin-top: 0.5em;">Whether it&#39;s to re-read a conversation, find a plane ticket I ordered or check
when a meeting was planned, I often find myself looking up old emails. It&#39;s
usually easy to do so because email clients are designed for the task: Many of
them support full-text search and some even complement that with advanced
tagging and categorization systems. To be honest I have become completely ... <a href="posts/use-your-mail-client-for-physical-mail/index.html">Continue reading</a></p>
</div>
<hr><div><a href="posts/simple-static-blog/index.html"><h2 style="margin-bottom: 0.1em;"> Creating a Simple Static Blog </h2></a>
<i style="font-size: 0.8em;">Sat 08 Feb 2020 12:14:16 PM CET</i><p style="margin-top: 0.5em;">I love personal websites. It&#39;s amazing that people can share content with the
entire world just by writing some text and throwing it behind a web server. I
wanted to know what that is like, so I set out to create a personal website of

@ -5,12 +5,18 @@
<link>https://hugot.nl/blog.html</link>
<description>Hugo's personal blog</description>
<language>en-us</language>
<pubDate>Sat 08 Feb 2020 07:32:46 PM CET</pubDate>
<lastBuildDate>Sat 08 Feb 2020 07:32:46 PM CET</lastBuildDate>
<pubDate>Mon 17 Feb 2020 11:55:42 AM CET</pubDate>
<lastBuildDate>Mon 17 Feb 2020 11:55:42 AM CET</lastBuildDate>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<generator>Hugo's Custom Bash Script</generator>
<managingEditor>social@hugot.nl</managingEditor>
<webMaster>infra@hugot.nl</webMaster>
<item><title> How To Use Your Email Client For Physical Mail </title><link>https://hugot.nl/posts/use-your-mail-client-for-physical-mail/index.html</link><description>Whether it&#39;s to re-read a conversation, find a plane ticket I ordered or check
when a meeting was planned, I often find myself looking up old emails. It&#39;s
usually easy to do so because email clients are designed for the task: Many of
them support full-text search and some even complement that with advanced
tagging and categorization systems. To be honest I have become completely</description><pubDate>Mon 17 Feb 2020 11:55:42 AM CET</pubDate><guid isPermaLink="false"> How To Use Your Email Client For Physical Mail MjYyNjk3NDk5NCA0MDkzCg==</guid>
</item>
<item><title> Creating a Simple Static Blog </title><link>https://hugot.nl/posts/simple-static-blog/index.html</link><description>I love personal websites. It&#39;s amazing that people can share content with the
entire world just by writing some text and throwing it behind a web server. I
wanted to know what that is like, so I set out to create a personal website of

@ -1,2 +1,3 @@
posts/use-your-mail-client-for-physical-mail/index.html
posts/simple-static-blog/index.html
posts/introduction/index.html

@ -0,0 +1,128 @@
<!DOCTYPE HTML>
<html>
<head>
<title>Use Your Email Client For Physical Mail</title>
<meta charset="UTF-8">
</head>
<style type="text/css">
html {
font-family: Helvetica, Arial, sans-serif;
color: #5b4636;
background-color: #f4ecd8;
}
body {
padding: 1em;
margin: auto;
}
@media only all and (pointer: coarse), (pointer: none) {
body {
font-size: 5.5vmin;
}
}
@media only all and (pointer: fine) {
body {
font-size: calc(16px + 0.6vmin);
min-width: 500px;
max-width: 50em;
}
}
aside {
width: 30%;
min-width: 10em;
background-color: rgba(0,0,0, 0.1);
float: right;
padding: 1em;
margin: 1em;
}
</style>
<body>
<a href="../../blog.html">Home</a>
<article>
<h1>How To Use Your Email Client For Physical Mail</h1>
<p>
Whether it's to re-read a conversation, find a plane ticket I ordered or check
when a meeting was planned, I often find myself looking up old emails. It's
usually easy to do so because email clients are designed for the task: Many of
them support full-text search and some even complement that with advanced
tagging and categorization systems. To be honest I have become completely
dependent on those features for my day to day operation. Having full-text
search and some sort of categorization for mail can be a huge time
saver. Wouldn't it be nice if we had all of that functionality to deal with
physical mail as well? I thought it would, so I set out to find a way to
achieve just that. Turns out it's pretty simple!
</p>
<p>
The main objective here is to transform our physical mail into an email
that can be received, indexed and read by our email client of choice. Now,
one way to do that would be to type the contents of our mail into an email
by hand, but <i>ain't nobody got time for that!</i>. The (more appealing)
alternative is to use a document scanner. I have a single purpose scanner
unit from Canon that I hook up to my laptop for just this purpose.
</p>
<p>
It isn't as simple as just emailing a scanned document to ourselves
though: email clients are smart, but they can't understand a word of text
in our PDF or JPEG of a physical document. They need content to be in
plain text form in order to provide us with some of their best features
like full-text search. We'll have to somehow transform our scanned
documents into plain text that we can include in our email. To do this, we
can use tesseract. Tesseract is an optical character recognition (OCR)
engine, meaning that it can recognize text in images and extract it for
us. Installing it should be easy on Debian derivative distros like
Ubuntu. My laptop is running Debian unstable so I just ran <code>apt
install tesseract</code> and started using it. Using it is as easy as
upening up a terminal and typing <code>tesseract FILE.jpg
OUTPUT</code>. That command will save all the text that tesseract is able
to recognize in the image FILE.jpg to a file called OUTPUT.txt.
</p>
<aside>
<i>
Side note: I am Dutch, so most of my physical mail is in Dutch. To
make tesseract better understand my mail I installed the
tesseract-ocr-nld package using <code>apt install
tesseract-ocr-nld</code>. You can check what other language packs are
available by using <code>apt search tesseract-ocr</code>.
</i>
</aside>
<p>
All we have to do from there is copy-paste the contents of that file into
an email and send it to ourselves! Depending on the formatting of the
input document, the output may not always be pleasant to read. We can
account for this by including the original document as an attachment to
the email. That way we get the best of both worlds: we can use the search
functionality of our email client to find the document, and then read it
in its original form by opening the attachment.
</p>
<p>
This is all easy enough, but I'm lazy. I didn't feel like opening up my
email client and doing manual copy-pasting, so I decided to automate the
process a little further. I have postfix setup on my system to relay to my
mail server, so I can simply use the <code>mail</code> command to send emails without a
GUI mail client. I combined that with tesseract in a little bash
script. The script iterates through all of its arguments and interprets
them as filenames of scanned documents. It calls tesseract to extract text
from them, concatenates the results, attaches the files to an email and
sends it to my personal email address. Now all I have to do is run the
script with filenames of some documents and my job is done. If anyone is
interested in an actual program that does the same thing and doesn't
require you to setup postfix, let me know! I might consider authoring one
if it's useful to more people than just myself. The script I'm currently
using can be found <a href="scan-to-mailpile.bash.html">here (pretty)</a>
and <a href="scan-to-mailpile.bash">here (raw)</a>, but I don't recommend
using it if you don't fully understand its contents, it's not a polished
user experience 🤓.
</p>
</article>
</body>
</html>

@ -0,0 +1,57 @@
#!/bin/bash
if ! [[ $# -ge 1 ]]; then
echo 'Usage: scan-to-mailpile ...FILES' >&2
exit
fi
if ! type_output="$(type readlink mktemp pdftotext tesseract mail mimetype basename cat 2>&1)"; then
printf 'scan-to-mailpile: Some required commands are missing, lookup results:\n%s\n' \
"$type_output" >&2
exit 1
fi
tmpdir=$(mktemp -d) || exit $?
printf -v trap 'rm -vr %q' "$tmpdir"
trap "$trap" EXIT
printf 'Changing directory: '
pushd "$tmpdir" || exit $?
declare -a file_args=()
{
for file in "$@"; do
file="$(readlink -f "$file")" || exit $?
# Note: pdftotext will not work for scanned documents, so those should just be
# saved as image files before feeding them to this script.
##
# It will however work fine for other types of PDFs.
if [[ "$file" == *.pdf ]]; then
pdftotext "$file" /dev/fd/1 || exit $?
else
tesseract "$file" stdout || exit $?
fi
mime="$(mimetype -b "$file")" || exit $?
attachment_args+=(--content-type="$mime" --attach="$file")
done
} > ./outfile.txt
cat ./outfile.txt
file1="$(basename "$1")"
read -i "${file1%.*}" -rep 'What should the subject of the email be? ' subject
mail --subject="$subject" \
"${attachment_args[@]}" \
--content-type="text/plain" \
--content-filename="content.txt" \
user@example.com < ./outfile.txt
popd

@ -0,0 +1,129 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<!-- Created by htmlize-1.56 in css mode. -->
<html>
<head>
<title>scan-to-mailpile.bash</title>
<style type="text/css">
<!--
body {
color: #f6f3e8;
background-color: #242424;
}
.builtin {
/* font-lock-builtin-face */
color: #e5786d;
}
.comment {
/* font-lock-comment-face */
color: #99968b;
}
.comment-delimiter {
/* font-lock-comment-delimiter-face */
color: #99968b;
}
.flyspell-duplicate {
/* flyspell-duplicate */
text-decoration: underline;
}
.flyspell-incorrect {
/* flyspell-incorrect */
text-decoration: underline;
}
.keyword {
/* font-lock-keyword-face */
color: #8ac6f2;
font-weight: bold;
}
.negation-char {
}
.sh-escaped-newline {
/* sh-escaped-newline */
color: #95e454;
}
.sh-quoted-exec {
/* sh-quoted-exec */
color: #fa8072;
}
.string {
/* font-lock-string-face */
color: #95e454;
}
.variable-name {
/* font-lock-variable-name-face */
color: #cae682;
}
a {
color: inherit;
background-color: inherit;
font: inherit;
text-decoration: inherit;
}
a:hover {
text-decoration: underline;
}
-->
</style>
</head>
<body>
<pre>
<span class="comment-delimiter"> #</span><span class="comment">!/bin/</span><span class="keyword">bash</span><span class="comment">
</span>
<span class="keyword"> if</span> <span class="negation-char">!</span> [[ $<span class="variable-name">#</span> -ge 1 ]]; <span class="keyword">then</span>
<span class="builtin">echo</span> <span class="string">'Usage: scan-to-mailpile ...FILES'</span> &gt;&amp;2
<span class="keyword">exit</span>
<span class="keyword"> fi</span>
<span class="keyword"> if</span> <span class="negation-char">!</span> <span class="variable-name">type_output</span>=<span class="string">"$(</span><span class="sh-quoted-exec">type</span><span class="string"> readlink mktemp pdftotext tesseract mail mimetype basename cat 2&gt;&amp;1)"</span>; <span class="keyword">then</span>
<span class="builtin">printf</span> <span class="string">'scan-to-mailpile: Some required commands are missing, lookup results:\n%s\n'</span> <span class="sh-escaped-newline">\</span>
<span class="string">"$type_output"</span> &gt;&amp;2
<span class="keyword">exit</span> 1
<span class="keyword"> fi</span>
<span class="variable-name"> tmpdir</span>=$(<span class="sh-quoted-exec">mktemp</span> -d) || <span class="keyword">exit</span> $<span class="variable-name">?</span>
<span class="builtin"> printf</span> -v trap <span class="string">'rm -vr %q'</span> <span class="string">"$tmpdir"</span>
<span class="keyword"> trap</span> <span class="string">"$trap"</span> EXIT
<span class="builtin"> printf</span> <span class="string">'Changing directory: '</span>
<span class="builtin"> pushd</span> <span class="string">"$tmpdir"</span> || <span class="keyword">exit</span> $<span class="variable-name">?</span>
<span class="builtin"> declare</span> -a <span class="variable-name">file_args</span>=()
{
<span class="keyword">for</span> file<span class="keyword"> in</span> <span class="string">"$@"</span>; <span class="keyword">do</span>
<span class="variable-name">file</span>=<span class="string">"$(</span><span class="sh-quoted-exec">readlink</span><span class="string"> -f "$file")"</span> || <span class="keyword">exit</span> $<span class="variable-name">?</span>
<span class="comment-delimiter"># </span><span class="comment">Note: </span><span class="comment"><span class="flyspell-duplicate">pdftotext</span></span><span class="comment"> will not work for scanned documents, so those should just be
</span> <span class="comment-delimiter"># </span><span class="comment">saved as image files before feeding them to this script.
</span> <span class="comment-delimiter">##</span><span class="comment">
</span> <span class="comment-delimiter"># </span><span class="comment">It will however work fine for other types of </span><span class="comment"><span class="flyspell-incorrect">PDFs</span></span><span class="comment">.
</span> <span class="keyword">if</span> [[ <span class="string">"$file"</span> == *.pdf ]]; <span class="keyword">then</span>
pdftotext <span class="string">"$file"</span> /dev/fd/1 || <span class="keyword">exit</span> $<span class="variable-name">?</span>
<span class="keyword">else</span>
tesseract <span class="string">"$file"</span> stdout || <span class="keyword">exit</span> $<span class="variable-name">?</span>
<span class="keyword">fi</span>
<span class="variable-name">mime</span>=<span class="string">"$(</span><span class="sh-quoted-exec">mimetype</span><span class="string"> -b "$file")"</span> || <span class="keyword">exit</span> $<span class="variable-name">?</span>
<span class="variable-name">attachment_args</span>+=(--content-type=<span class="string">"$mime"</span> --attach=<span class="string">"$file"</span>)
<span class="keyword">done</span>
} &gt; ./outfile.txt
cat ./outfile.txt
<span class="variable-name"> file1</span>=<span class="string">"$(</span><span class="sh-quoted-exec">basename</span><span class="string"> "$1")"</span>
<span class="builtin"> read</span> -i <span class="string">"${file1%.*}"</span> -rep <span class="string">'What should the subject of the email be? '</span> subject
mail --subject=<span class="string">"$subject"</span> <span class="sh-escaped-newline">\</span>
<span class="string">"${attachment_args[@]}"</span> <span class="sh-escaped-newline">\</span>
--content-type=<span class="string">"text/plain"</span> <span class="sh-escaped-newline">\</span>
--content-filename=<span class="string">"content.</span><span class="string"><span class="flyspell-duplicate">txt</span></span><span class="string">"</span> <span class="sh-escaped-newline">\</span>
user@example.com &lt; ./outfile.txt
<span class="builtin">popd</span>
</pre>
</body>
</html>
Loading…
Cancel
Save