• 0 Posts
  • 21 Comments
Joined 5 months ago
cake
Cake day: January 25th, 2024

help-circle
  • Most open source tool have the same thing that it feels like it’s made by engineers. I think that’s because it’s true, most FOSS tools are made by engineers for engineers. Because most project start with someone needing something and then creating it and sharing it.

    Chances of a programmer needing something and then making it is a lot higher, than an artist needing it and then making it as then there’d be a need to have the necessary skills to make the software. As someone not from CS field I’ve seen how much of redundant programs are present for CS related tasks while barely some exists for other fields because the overlap of programmer and that field is low specifically FOSS programmers. And a few programmers that field would have don’t have the high level software development skills, so most open source tools made by them are “works on my machine, or works for this specific task” even though with less than 1% more effort they could have made a generalized tool.






  • If by editing you mean adding texts (forms) and signing, then Firefox, xournal++, rnote etc.

    If you mean changing the pdf content, then libre office draw for textual pdf, inkscape for graphical pdf.

    I also just open PDF in text editor (or with qpdf’s qdf format) and edit certain things. I don’t recommend it but due to certain recent events I had to change some font data from PDF and that was the best solution.




  • System 76 laptop has fingerprint sensor. They don’t say it has one cuz it’s not supported.

    And since it’s designed to be used as a tap/scan, and power button only on hard restart/shutdown it’s hard to press to stop it being pressed on fingerprint scan, the hardware not being supported means you have to press the power button a lot instead of fingerprint.


  • For the OCR, have you tried tesseract? For printed documents it can take image input and generate a pdf with selectable text. I don’t OCR much but it has been useful when I tried a few times.

    You might be able to have a script that takes the scanner input into tesseract and output a pdf. It only works on a single image per run so I had to make script to run it on whole pdf by separating it and stitching it back together.



  • Someone already talked about the XY problem, so I’ll say this.

    Why sound notification instead of notification content? If your notification program (dunst in my case) have pattern matching or calling scripts based on patterns and the script has access to which app, notification title, contents etc. then it’s just about calling something in your bash script.

    And any time you wanna add that functionality to something else, add one more line with a different pattern or add a condition in your script. Comparing text is lot more reliable than audio.

    Of course your use case could be completely different, so maybe give some examples of use case so people can give you different ways to solve that instead of just the one you’re thinking of.


  • Yeah sure, I’ll compile it in my OS. For any other OS, either I’m not knowledgeable about the tools available, and many of them that I am not going to spend money to acquire. If providing the binary a developer compiles for themselves would solve it, we’d not have that problem at all.

    I specifically hate when program or libraries are only in compiled form, and then I get an error messages talking about an absolute path it has with some usernames I’ve never seen before, and no way to correct it as there’s no code. Turns out when people pass compiled versions to the OS they don’t use themselves they don’t encounter the errors and think it works fine.







  • Hi there, I did say it’s easily doable, but I didn’t have a script because I run things based on the image before OCR manually (like the negating the dark mode I tried in this script; when doing manually it’s just one command as I know whether it’s dark mode of not myself; similar for the threshold as well).

    But here’s a one I made for you:

    #!/usr/bin/env bash
    
    # imagemagic has a cute little command for importing screen into a file
    import -colorspace gray /tmp/screenshot.png
    mogrify /tmp/screenshot.png -color-threshold "100-200"
    # extra magic to invert if the average pixel is dark
    details=`convert /tmp/screenshot.png -resize 1x1 txt:-`
    total=`echo $details | awk -F, '{print $4}'`
    value=`echo $details | awk '{print $7}'`
    darkness=$(( ${value#_(%_)} * 100 / $total ))
    if (( $darkness < 50 )); then
       mogrify -negate /tmp/screenshot.png
    fi
    
    # now run the OCR
    text=`tesseract /tmp/screenshot.png -`
    echo $text | xclip -selection c
    notify-send OCR-Screen "$text"
    

    So the middle part is to accommodate images in dark mode. It negates it based on the threshold that you can change. Without that, you can just have import for screen capture, tesseract for running OCR. and optionally pipe it to xclip for clipboard or notify-send for notification.

    In my use case, I have keybind to take a screenshot like this: import png:- | xclip -selection c -t image/png which gives me the cursor to select part of the screen and copies that to clipboard. I can save that as an image (through another bash script), or paste it directly to messenger applications. And when I need to do OCR, I just run tesseract in the terminal and copy the text from there.


  • Not for handwritten text, but for printed fonts, getting OCR is as easy as just making a box in screen with current technology. So I don’t think we need AI things for that.

    Personally I use tesseract. I have a simple bash script that when run let’s me select a rectangle in screen, save that image and run OCR in a temp folder and copy that text to clipboard. Done.

    Edit: for extra flavor you can also use notify-send to send that text over a notification so you know what the OCR produced without having to paste it.