10.16.2009

spell.lsp: An AutoLISP accessible spell checker and MTEXT formatting removal tool

Download spell.lsp
License: Public Domain, zlib if you want.

SCOWL (Spell Checker Oriented Word Lists)

There have been some requests on the discussion groups for a LISP accessible spell checker (not really, a few posts from years back, I just wanted to see how slow it would be). I figured that it'd be a pretty easy task, considering there are probably some free dictionary files out there, and there are (SCOWL from Kevin's Word List Page). It was an interesting day project, but making a function to remove mtext formatting proved to be a little more challenging than I initially thought. I know, there's already the UnFormat function in StripMtext written by John Uhden and Steve Doman, but it's somewhat unreliable. At the very least, it doesn't remove \\p*; alignment codes. I made my own function that uses foreachs and vl-string-searches instead of wcmatch. I'm not 100% sure on which is more efficient, but I'm pretty sure wcmatch is pretty costly with how broad it can be. My function hinges on the fact that there are three kinds of codes, codes to be totally removed, codes that contain text to remain, and codes that need to be replaced with something else, like a newline character.

Anyway, here's a quick guide on the functions included:

;; str2lst converts a string into a list
(str2lst "a,b,cd,e,fgh" ",")
;; returns '("a" "b" "cd" "e" "fgh")

;; lst2str converts a list into a string
(lst2str '("hello" "there" "mister.") " ")
;; returns "hello there mister."

;; loadWords takes a list of dictionary files and loads all the words into a list
(loadWords '("somedictionary.txt" "anotherdictionary.txt"))
;; returns a list of words
;; files should be one word per line, lower-case

;; checkWord takes a word and a dictionary and checks if the word is in the dictionary
;; basically a wrapper for vl-position, returns nil if the word is spelled right
(checkWord "ahasdh" myDictionary)
;; returns T since "ahasdh" is not a word

;; checkText takes text, separates into words, checks them and returns a list of misspelled words
(checkText "This is a test sentence." myDictionary)
;; returns nil, as all of those words are spelled right

;; remMtextFmt removes mtext formatting
(remMtextFmt "\\P\\Lhello\\l {\\fVerdana;\\W1.7x;there}")
;; returns "\nhello there"

The command "CT" (c:ct) is an implementation of the functions I've created. It allows you to select multiple text and mtext entities and returns the misspelled words.

Anyway, feel free to use this code to add some spell checking functionality to your applications, or just to have a better mtext stripper. The nice thing about this is that you can make a custom dictionary easily to include the common words in your drawings that aren't in SCOWL. Speaking of SCOWL, I used the first four files, as shown in the .lsp file. The more higher number file you use, the more words you're including. They're sorted by how often the words are used. First four does pretty well, might be able to get away with just the first two. Using four creates a 54,000 item list. You'd think that would present a problem, but checking three paragraphs of mtext only takes a fraction of a second.

You could use ssget to grab all the text in the drawing:

(ssget "_X" '((0 . "TEXT,MTEXT")))

Anyway, happy lisping, and comments are welcome.

0 comments:

Post a Comment