I just run into a problem with a client and I thought I would like to share. The task was to find out, why the storage of an ec2 instance had been bloated and was running out of storage.

I had already deleted their temp files and other files that were not needed but there was still so much wasted space. Here is a quick way to measure what kind of files are using up your disk:

NOTE: In order to find all files on the system I will use sudo. If you do not want to include system files into the search simply remove it.

Finding all files over 25 M on the system

This will find all files over 25N on your disk and tell you where they are

$ sudo find / -size +25M  | xargs du -sch


81M     /var/log/journal/cvbcvbcvb/system@3191ceb6dc084d3aaee91fab0a8d0693-0000000000dbb966-0005c3699397ec10.journal
81M     /var/log/journal/cvbcvbcvb/system@3191ceb6dc084d3aaee91fab0a8d0693-0000000000eebbf2-0005c7b345a63945.journal
81M     /var/log/journal/cvbcvbcvb/system@ef587e7a7d6f4b4ba611ba066d639fa1-0000000000000001-0005d2ed11b55c07.journal
<....>
27M     /usr/lib/x86_64-linux-gnu/libicudata.so.65.1
26M     /usr/lib/x86_64-linux-gnu/libicudata.so.60.2
36M     /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m-pic.a
37M     /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.a
2.1G    /swapfile
8.6G    total

I have abbreviated the output but as you can see I found quite a lot of journal files that could go. Otherwise we can see that all files over 25N on our drive take up around 8.6 Gb total. I got rid of the journals but still was not yet satisfied. I knew that the system also serves pdf files from s3 and caches them for further use.

Finding all PDF documents on the system

These files will never be over 25 MB so by default they will not be included into our previous search and since there will be a lot of files in pdf filesize range, we need to optimize our search:

$ sudo find / -iname '*.pdf'| xargs du -sch

<....>

184K    /usr/share/texmf/doc/context/third/simpleslides/styles/HorizontalStripes-blue.pdf
184K    /usr/share/texmf/doc/context/third/simpleslides/styles/NarrowStripes-red.pdf
184K    /usr/share/texmf/doc/context/third/simpleslides/styles/Framed-square.pdf
184K    /usr/share/texmf/doc/context/third/simpleslides/styles/NarrowStripes-green.pdf
<....>
4.0K    /usr/share/texlive/texmf-dist/tex/latex/pdfscreen/overlay0.pdf
8.0K    /usr/share/texlive/texmf-dist/tex/latex/pdfscreen/button.pdf
4.0K    /usr/share/texlive/texmf-dist/tex/latex/pdfscreen/overlay8.pdf
68K     /usr/share/texlive/texmf-dist/tex/latex/pdfscreen/left.pdf
4.0K    /usr/share/texlive/texmf-dist/tex/latex/pdfscreen/but.pdf
<....>
28K     /usr/share/texlive/texmf-dist/tex/latex/beamer/beamericononline.pdf
4.0K    /usr/share/texlive/texmf-dist/tex/latex/beamer/beamericonarticle.pdf
4.0K    /usr/share/texlive/texmf-dist/tex/latex/beamer/beamericonbook.20.pdf
4.0K    /usr/share/texlive/texmf-dist/tex/latex/beamer/beamericonarticle.20.pdf
24K     /usr/share/texlive/texmf-dist/tex/latex/beamer/beamericononline.20.pdf
4.7G    total

To my surprise, I found that not only were there a lot of unneeded pdf files, but texlive was installed with its complete documentation so Investigated further:

$ sudo find /usr/share/texmf/doc/ -iname '*.pdf'| xargs du -sch

<....>

64K     /usr/share/doc/texlive-doc/xelatex/sexam/exam_with_wexam_ar-DZ.pdf
60K     /usr/share/doc/texlive-doc/xelatex/sexam/bac_template-DZ.pdf
584K    /usr/share/doc/texlive-doc/xelatex/sexam/sexam_wexam_doc_ar.pdf
172K    /usr/share/doc/texlive-doc/xelatex/persian-bib/Persian-bib-userguide.pdf
36K     /usr/share/doc/texlive-doc/xelatex/xecyr/rubibtex-ex-x.pdf
16K     /usr/share/doc/texlive-doc/xelatex/xecyr/listings-utf8-ex.pdf
32K     /usr/share/doc/texlive-doc/xelatex/xecyr/xecyr-ex7-ru-x.pdf
20K     /usr/share/doc/texlive-doc/xelatex/xecyr/xecyr-doc-ru.pdf
44K     /usr/share/doc/texlive-doc/xelatex/xecyr/xecyr-ex5-ru-x.pdf
12K     /usr/share/doc/texlive-doc/xelatex/xecyr/pict2e-ex.pdf
8.0K    /usr/share/doc/texlive-doc/xelatex/xecyr/xecyr-ex4-ru-x.pdf
52K     /usr/share/doc/texlive-doc/xelatex/xecyr/xecyr-ex1-ru-x.pdf
16K     /usr/share/doc/texlive-doc/xelatex/xecyr/xecyr-ex6-ru-x.pdf
44K     /usr/share/doc/texlive-doc/xelatex/xecyr/xecyr-ex2-ru-x.pdf
24K     /usr/share/doc/texlive-doc/xelatex/xecyr/rumakeindex-ex-x.pdf
20K     /usr/share/doc/texlive-doc/xelatex/xecyr/xecyr-ex3-ru-x.pdf
232K    /usr/share/doc/texlive-doc/xelatex/xepersian/xepersian-doc.pdf
<....>
12K     /usr/share/doc/texlive-doc/xelatex/bidi-atbegshi/test-RTL.pdf
12K     /usr/share/doc/texlive-doc/xelatex/bidi-atbegshi/test-LTR.pdf
24K     /usr/share/doc/texlive-doc/xelatex/bidi-atbegshi/bidi-atbegshi-doc.pdf
112K    /usr/share/doc/texlive-doc/xelatex/mynsfc/mynsfc.pdf
116K    /usr/share/doc/texlive-doc/xelatex/mynsfc/my-nsfc-proposal.pdf
56K     /usr/share/doc/texlive-doc/xelatex/quran/quran-test2.pdf
32K     /usr/share/doc/texlive-doc/xelatex/quran/quran-test1.pdf
20K     /usr/share/doc/texlive-doc/xelatex/quran/uthmanitext.pdf
36K     /usr/share/doc/texlive-doc/xelatex/quran/quran-test.pdf
272K    /usr/share/doc/texlive-doc/xelatex/quran/quran-doc.pdf
20K     /usr/share/doc/texlive-doc/xelatex/quran/defaulttext.pdf
72K     /usr/share/doc/texlive-doc/xelatex/xeindex/xeindex.pdf
24K     /usr/share/doc/texlive-doc/xelatex/facture/exemple.pdf
24K     /usr/share/doc/texlive-doc/xelatex/facture/exemplesansremise.pdf
20K     /usr/share/doc/texlive-doc/xelatex/facture/exemplesansTVA.pdf
96K     /usr/share/doc/texlive-doc/xelatex/facture/facture.pdf
56K     /usr/share/doc/texlive-doc/xelatex/ptext/ptext.pdf
100K    /usr/share/doc/texlive-doc/xelatex/bidishadowtext/bidishadowtext-doc.pdf
20K     /usr/share/doc/texlive-doc/xelatex/bidishadowtext/bidishadowtext-demo.pdf
614M    total

A whooping 614M of texlive documentation. Since no one ever is going to look at them on the server I could get right of those too. NOTE: You should not do it this way. The correct way is to actually not install tex-live with documentation, but if its to late its to late. Please be advised though that a lot of tex-live files can be included into other files as well.

sudo find /usr/share/doc/texlive-doc/ -iname '*.pdf' -exec rm {} \;
<....>

And that is how this game works.

for now I will just post some more examples without output:

find all log files on the server and their size

$ sudo find / -iname '*.log'| xargs du -sch
<....>

find all zip files on the server and their size

$ sudo find / -iname '*.zip'| xargs du -sch
<....>

find all rar files on the server and their size

$ sudo find / -iname '*.rar'| xargs du -sch
<....>

find all iso files on the server and their size

$ sudo find / -iname '*.iso'| xargs du -sch
<....>

You get the idea ;)

PS: Another thing I found is that another piece of software was never rotating its logs and thus also filling up server space. But thats for another time.

Find is a great tool and you should definitely check out its documentation: https://man7.org/linux/man-pages/man1/find.1.html