diff --git a/blog.html b/blog.html index 9cae946..490e518 100644 --- a/blog.html +++ b/blog.html @@ -39,7 +39,14 @@

Blog

-

Creating a Simple Static Blog

+

How To Use Your Email Client For Physical Mail

+Mon 17 Feb 2020 11:55:42 AM CET

Whether it's to re-read a conversation, find a plane ticket I ordered or check +when a meeting was planned, I often find myself looking up old emails. It's +usually easy to do so because email clients are designed for the task: Many of +them support full-text search and some even complement that with advanced +tagging and categorization systems. To be honest I have become completely ... Continue reading

+
+

Creating a Simple Static Blog

Sat 08 Feb 2020 12:14:16 PM CET

I love personal websites. It's amazing that people can share content with the entire world just by writing some text and throwing it behind a web server. I wanted to know what that is like, so I set out to create a personal website of diff --git a/feed.xml b/feed.xml index e1cfdac..8f42f30 100644 --- a/feed.xml +++ b/feed.xml @@ -5,12 +5,18 @@ https://hugot.nl/blog.html Hugo's personal blog en-us - Sat 08 Feb 2020 07:32:46 PM CET - Sat 08 Feb 2020 07:32:46 PM CET + Mon 17 Feb 2020 11:55:42 AM CET + Mon 17 Feb 2020 11:55:42 AM CET http://blogs.law.harvard.edu/tech/rss Hugo's Custom Bash Script social@hugot.nl infra@hugot.nl + How To Use Your Email Client For Physical Mail https://hugot.nl/posts/use-your-mail-client-for-physical-mail/index.htmlWhether it's to re-read a conversation, find a plane ticket I ordered or check +when a meeting was planned, I often find myself looking up old emails. It's +usually easy to do so because email clients are designed for the task: Many of +them support full-text search and some even complement that with advanced +tagging and categorization systems. To be honest I have become completelyMon 17 Feb 2020 11:55:42 AM CET How To Use Your Email Client For Physical Mail MjYyNjk3NDk5NCA0MDkzCg== + Creating a Simple Static Blog https://hugot.nl/posts/simple-static-blog/index.htmlI love personal websites. It's amazing that people can share content with the entire world just by writing some text and throwing it behind a web server. I wanted to know what that is like, so I set out to create a personal website of diff --git a/posts.txt b/posts.txt index 4958a3e..90a4ee4 100644 --- a/posts.txt +++ b/posts.txt @@ -1,2 +1,3 @@ +posts/use-your-mail-client-for-physical-mail/index.html posts/simple-static-blog/index.html posts/introduction/index.html diff --git a/posts/use-your-mail-client-for-physical-mail/index.html b/posts/use-your-mail-client-for-physical-mail/index.html new file mode 100644 index 0000000..9f1e9a2 --- /dev/null +++ b/posts/use-your-mail-client-for-physical-mail/index.html @@ -0,0 +1,128 @@ + + + + Use Your Email Client For Physical Mail + + + + + + + Home +

+

How To Use Your Email Client For Physical Mail

+

+ Whether it's to re-read a conversation, find a plane ticket I ordered or check + when a meeting was planned, I often find myself looking up old emails. It's + usually easy to do so because email clients are designed for the task: Many of + them support full-text search and some even complement that with advanced + tagging and categorization systems. To be honest I have become completely + dependent on those features for my day to day operation. Having full-text + search and some sort of categorization for mail can be a huge time + saver. Wouldn't it be nice if we had all of that functionality to deal with + physical mail as well? I thought it would, so I set out to find a way to + achieve just that. Turns out it's pretty simple! +

+ +

+ The main objective here is to transform our physical mail into an email + that can be received, indexed and read by our email client of choice. Now, + one way to do that would be to type the contents of our mail into an email + by hand, but ain't nobody got time for that!. The (more appealing) + alternative is to use a document scanner. I have a single purpose scanner + unit from Canon that I hook up to my laptop for just this purpose. +

+ +

+ It isn't as simple as just emailing a scanned document to ourselves + though: email clients are smart, but they can't understand a word of text + in our PDF or JPEG of a physical document. They need content to be in + plain text form in order to provide us with some of their best features + like full-text search. We'll have to somehow transform our scanned + documents into plain text that we can include in our email. To do this, we + can use tesseract. Tesseract is an optical character recognition (OCR) + engine, meaning that it can recognize text in images and extract it for + us. Installing it should be easy on Debian derivative distros like + Ubuntu. My laptop is running Debian unstable so I just ran apt + install tesseract and started using it. Using it is as easy as + upening up a terminal and typing tesseract FILE.jpg + OUTPUT. That command will save all the text that tesseract is able + to recognize in the image FILE.jpg to a file called OUTPUT.txt. +

+ + + +

+ All we have to do from there is copy-paste the contents of that file into + an email and send it to ourselves! Depending on the formatting of the + input document, the output may not always be pleasant to read. We can + account for this by including the original document as an attachment to + the email. That way we get the best of both worlds: we can use the search + functionality of our email client to find the document, and then read it + in its original form by opening the attachment. +

+ +

+ This is all easy enough, but I'm lazy. I didn't feel like opening up my + email client and doing manual copy-pasting, so I decided to automate the + process a little further. I have postfix setup on my system to relay to my + mail server, so I can simply use the mail command to send emails without a + GUI mail client. I combined that with tesseract in a little bash + script. The script iterates through all of its arguments and interprets + them as filenames of scanned documents. It calls tesseract to extract text + from them, concatenates the results, attaches the files to an email and + sends it to my personal email address. Now all I have to do is run the + script with filenames of some documents and my job is done. If anyone is + interested in an actual program that does the same thing and doesn't + require you to setup postfix, let me know! I might consider authoring one + if it's useful to more people than just myself. The script I'm currently + using can be found here (pretty) + and here (raw), but I don't recommend + using it if you don't fully understand its contents, it's not a polished + user experience 🤓. +

+
+ + diff --git a/posts/use-your-mail-client-for-physical-mail/publish_date.txt b/posts/use-your-mail-client-for-physical-mail/publish_date.txt new file mode 100644 index 0000000..63e6d1e --- /dev/null +++ b/posts/use-your-mail-client-for-physical-mail/publish_date.txt @@ -0,0 +1 @@ +Mon 17 Feb 2020 11:55:42 AM CET diff --git a/posts/use-your-mail-client-for-physical-mail/scan-to-mailpile.bash b/posts/use-your-mail-client-for-physical-mail/scan-to-mailpile.bash new file mode 100644 index 0000000..20dbb36 --- /dev/null +++ b/posts/use-your-mail-client-for-physical-mail/scan-to-mailpile.bash @@ -0,0 +1,57 @@ +#!/bin/bash + +if ! [[ $# -ge 1 ]]; then + echo 'Usage: scan-to-mailpile ...FILES' >&2 + + exit +fi + +if ! type_output="$(type readlink mktemp pdftotext tesseract mail mimetype basename cat 2>&1)"; then + printf 'scan-to-mailpile: Some required commands are missing, lookup results:\n%s\n' \ + "$type_output" >&2 + exit 1 +fi + +tmpdir=$(mktemp -d) || exit $? + +printf -v trap 'rm -vr %q' "$tmpdir" +trap "$trap" EXIT + +printf 'Changing directory: ' +pushd "$tmpdir" || exit $? + +declare -a file_args=() + +{ + for file in "$@"; do + file="$(readlink -f "$file")" || exit $? + + # Note: pdftotext will not work for scanned documents, so those should just be + # saved as image files before feeding them to this script. + ## + # It will however work fine for other types of PDFs. + if [[ "$file" == *.pdf ]]; then + pdftotext "$file" /dev/fd/1 || exit $? + else + tesseract "$file" stdout || exit $? + fi + + mime="$(mimetype -b "$file")" || exit $? + + attachment_args+=(--content-type="$mime" --attach="$file") + done +} > ./outfile.txt + +cat ./outfile.txt + +file1="$(basename "$1")" + +read -i "${file1%.*}" -rep 'What should the subject of the email be? ' subject + +mail --subject="$subject" \ + "${attachment_args[@]}" \ + --content-type="text/plain" \ + --content-filename="content.txt" \ + user@example.com < ./outfile.txt + +popd diff --git a/posts/use-your-mail-client-for-physical-mail/scan-to-mailpile.bash.html b/posts/use-your-mail-client-for-physical-mail/scan-to-mailpile.bash.html new file mode 100644 index 0000000..f2d1045 --- /dev/null +++ b/posts/use-your-mail-client-for-physical-mail/scan-to-mailpile.bash.html @@ -0,0 +1,129 @@ + + + + + scan-to-mailpile.bash + + + +
+ #!/bin/bash
+ 
+ if ! [[ $# -ge 1 ]]; then
+     echo 'Usage: scan-to-mailpile ...FILES' >&2
+ 
+     exit
+ fi
+ 
+ if ! type_output="$(type readlink mktemp pdftotext tesseract mail mimetype basename cat 2>&1)"; then
+     printf 'scan-to-mailpile: Some required commands are missing, lookup results:\n%s\n' \
+            "$type_output" >&2
+     exit 1
+ fi
+ 
+ tmpdir=$(mktemp -d) || exit $?
+ 
+ printf -v trap 'rm -vr %q' "$tmpdir"
+ trap "$trap" EXIT
+ 
+ printf 'Changing directory: '
+ pushd "$tmpdir" || exit $?
+ 
+ declare -a file_args=()
+ 
+ {
+     for file in "$@"; do
+         file="$(readlink -f "$file")" || exit $?
+ 
+         # Note: pdftotext will not work for scanned documents, so those should just be
+         # saved as image files before feeding them to this script.
+         ##
+         # It will however work fine for other types of PDFs.
+         if [[ "$file" == *.pdf ]]; then
+             pdftotext "$file" /dev/fd/1 || exit $?
+         else
+             tesseract "$file" stdout  || exit $?
+         fi
+ 
+         mime="$(mimetype -b "$file")" || exit $?
+ 
+         attachment_args+=(--content-type="$mime" --attach="$file")
+     done
+ } > ./outfile.txt
+ 
+ cat ./outfile.txt
+ 
+ file1="$(basename "$1")"
+ 
+ read -i "${file1%.*}" -rep 'What should the subject of the email be? ' subject
+ 
+ mail --subject="$subject" \
+      "${attachment_args[@]}" \
+      --content-type="text/plain" \
+      --content-filename="content.txt" \
+     user@example.com < ./outfile.txt
+
+popd
+
+ +