You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
145 lines
6.5 KiB
HTML
145 lines
6.5 KiB
HTML
<!DOCTYPE HTML>
|
|
<html>
|
|
<head>
|
|
<title>Use Your Email Client For Physical Mail</title>
|
|
<meta charset="UTF-8">
|
|
</head>
|
|
|
|
<style type="text/css">
|
|
a:visited {
|
|
color: #c2e;
|
|
}
|
|
|
|
a {
|
|
color: #0095dd;
|
|
}
|
|
|
|
html {
|
|
font-family: Helvetica, Arial, sans-serif;
|
|
color: #5b4636;
|
|
background-color: #f4ecd8;
|
|
}
|
|
|
|
body {
|
|
padding: 1em;
|
|
margin: auto;
|
|
}
|
|
|
|
aside {
|
|
width: 30%;
|
|
min-width: 10em;
|
|
background-color: rgba(0,0,0, 0.1);
|
|
float: right;
|
|
padding: 1em;
|
|
margin: 1em;
|
|
}
|
|
|
|
|
|
@media only all and (pointer: coarse), (pointer: none) {
|
|
body {
|
|
font-size: 5.5vmin;
|
|
}
|
|
|
|
aside {
|
|
width: inherit;
|
|
}
|
|
}
|
|
|
|
@media only all and (pointer: fine) {
|
|
body {
|
|
font-size: calc(16px + 0.6vmin);
|
|
min-width: 500px;
|
|
max-width: 50em;
|
|
}
|
|
}
|
|
|
|
</style>
|
|
|
|
<body>
|
|
<a href="../../blog.html">Home</a>
|
|
<article>
|
|
<h1>How To Use Your Email Client For Physical Mail</h1>
|
|
<p>
|
|
Whether it's to re-read a conversation, find a plane ticket I ordered or
|
|
check when a meeting was planned, I often find myself looking up old
|
|
emails. It's usually easy to do so because email clients are designed for
|
|
the task: Many of them support full-text search and some even complement
|
|
that with neat tagging and categorization systems. To be honest I have
|
|
become completely dependent on those features for my day to day
|
|
life. Having full-text search and some sort of categorization for email
|
|
can be a huge time saver. When it comes to physical mail however, I still
|
|
have to browse through stacks of paper to (hopefully) find what I'm
|
|
looking for. I figured that it'd be nice to use my fancy email client to
|
|
deal with physical mail as well, so I found a way to do just that. Turns
|
|
out it's pretty simple!
|
|
</p>
|
|
|
|
<p>
|
|
The main objective here is to transform our physical mail into an email
|
|
that can be received, indexed and read by our email client of choice. Now,
|
|
one way to do that would be to type the contents of our mail into an email
|
|
by hand, but <i>ain't nobody got time for that!</i>. The (more appealing)
|
|
alternative is to use a document scanner. I have a single purpose scanner
|
|
unit from Canon that I hook up to my laptop for just this purpose.
|
|
</p>
|
|
|
|
<p>
|
|
It isn't as simple as just emailing a scanned document to ourselves
|
|
though: email clients are smart, but they can't understand a word of text
|
|
in our PDF or JPEG of a physical document. They need content to be in
|
|
plain text form in order to provide us with some of their best features
|
|
like full-text search. We'll have to somehow transform our scanned
|
|
documents into plain text that we can include in our email. To do this, we
|
|
can use tesseract. Tesseract is an optical character recognition (OCR)
|
|
engine, meaning that it can recognize text in images and extract it for
|
|
us. Installing it should be easy on Debian derivative distros like
|
|
Ubuntu. My laptop is running Debian unstable so I just ran <code>apt
|
|
install tesseract</code> and started using it. Using it is as easy as
|
|
upening up a terminal and typing <code>tesseract FILE.jpg
|
|
OUTPUT</code>. That command will save all the text that tesseract is able
|
|
to recognize in the image FILE.jpg to a file called OUTPUT.txt.
|
|
</p>
|
|
|
|
<aside>
|
|
<i>
|
|
Side note: I am Dutch, so most of my physical mail is in Dutch. To
|
|
make tesseract better understand my mail I installed the
|
|
tesseract-ocr-nld package using <code>apt install
|
|
tesseract-ocr-nld</code>. You can check what other language packs are
|
|
available by using <code>apt search tesseract-ocr</code>.
|
|
</i>
|
|
</aside>
|
|
|
|
<p>
|
|
All we have to do from there is copy-paste the contents of that file into
|
|
an email and send it to ourselves! Depending on the formatting of the
|
|
input document, the output may not always be pleasant to read. We can
|
|
account for this by including the original document as an attachment to
|
|
the email. That way we get the best of both worlds: we can use the search
|
|
functionality of our email client to find the document, and then read it
|
|
in its original form by opening the attachment.
|
|
</p>
|
|
|
|
<p>
|
|
This is all easy enough, but I'm lazy. I didn't feel like opening up my
|
|
email client and doing manual copy-pasting, so I decided to automate the
|
|
process a little further. I have postfix setup on my system to relay to my
|
|
mail server, so I can simply use the <code>mail</code> command to send emails without a
|
|
GUI mail client. I combined that with tesseract in a little bash
|
|
script. The script iterates through all of its arguments and interprets
|
|
them as filenames of scanned documents. It calls tesseract to extract text
|
|
from them, concatenates the results, attaches the files to an email and
|
|
sends it to my personal email address. Now all I have to do is run the
|
|
script with filenames of some documents and my job is done. If anyone is
|
|
interested in an actual program that does the same thing and doesn't
|
|
require you to setup postfix, let me know! I might consider authoring one
|
|
if it's useful to more people than just myself. The script I'm currently
|
|
using can be found <a href="scan-to-mailpile.bash.html">here (pretty)</a>
|
|
and <a href="scan-to-mailpile.bash">here (raw)</a>, but I don't recommend
|
|
using it if you don't fully understand its contents, it's not a polished
|
|
user experience 🤓.
|
|
</p>
|
|
</article>
|
|
</body>
|
|
</html>
|