EditorProgramming-mini-HOWTO Madhu M Kurup < madhumkurup AT yahoo DOT com > 0.1 alpha November 2000 ABSTRACT This HOWTO describes how to go about writing a text editor on the Linux operating system. Most of the information can also be profitably applied to other operating systems and other text editors. This is meant for Engineering students and other interested folks who have to either create an editor or feel the urge to do so. Does your current editor give you an itch to code? Then read on. ______________________________________________________________________ Table of Contents 1. Introduction 1.1 Why this document? 1.2 Copyright 1.3 Feedback 1.4 Credits and Thanks 1.5 Versions 2. Editors 2.1 What is An Editor 2.2 Line, Screen, Graphical 2.2.1 Line 2.2.2 Screen 2.2.3 Graphical 2.3 What kind of Editor is recommended? 2.4 What is the functionality to be provided? 2.4.1 What are the additional cool things I can put in? 2.4.2 What are the really really cool things I can have fun with? 3. Development of an Editor on Linux 3.1 GCC 3.2 GDB 3.3 Make 3.4 NCurses 3.4.1 Uconio.h 3.5 Gvim / Glimmer / Emacs 3.6 Xwpe 3.7 KDevelop 4. Implementation Details 4.1 Language: 4.2 Data Structures: 4.2.1 Characters 4.2.2 Words 4.2.3 Statically Allocated Lines of Text 4.2.4 Dynamically Allocated Lines of Text 4.2.5 Paragraphs 4.2.6 Pages 4.3 Which Data structure should I use? 4.4 Should I use Uconio.h 5. Open Source 5.1 Licences 5.2 Releasing your editor 6. Help 6.1 Books 6.2 Lists 7. URLs ______________________________________________________________________ 11.. IInnttrroodduuccttiioonn 11..11.. WWhhyy tthhiiss ddooccuummeenntt?? This document was written to help new students of the Engineering colleges in India and specfically VTU, Karnataka who have to write an editor as part of their course work. This text editor is the one of the practical projects of a course titled System Software and is to be written for the Linux platform. This document will help you gain some perspective into the different requirements that a simple tool like an editor can possess. Some of the different approaches are discussed. Some of the places where documentation is accessible are also introduced. 11..22.. CCooppyyrriigghhtt The Linux EditorProgramming-mini-HOWTO is copyright (C) 2000 by Madhu M Kurup. Linux HOWTO documents may be reproduced and distributed in whole or in part, in any medium physical or electronic, as long as this copyright notice is retained on all copies. Commercial redistribution is allowed and encouraged; however, the author like to be notified of any such distributions. All translations, derivative works, or aggregate works incorporating any Linux HOWTO documents must be covered under this copyright notice. That is, you may not produce a derivative work from a HOWTO and impose additional restrictions on its distribution. Exceptions to these rules may be granted under certain conditions; please contact the Linux HOWTO coordinator at the address given below. In short, we wish to promote dissemination of this information through as many channels as possible. However, we do wish to retain copyright on the HOWTO documents, and would like to be notified of any plans to redistribute the HOWTOs. If you have questions, please contact Tim Bynum, the Linux HOWTO coordinator, at linux-howto@sunsite.unc.edu via email. 11..33.. FFeeeeddbbaacckk Please send me any corrections, questions, comments, suggestions, or additional material. I would like to improve this HOWTO! Tell me exactly what you don't understand, or what could be clearer. You can reach me at madhumkurup AT yahoo DOT com via email. Please include the version number of the EditorProgramming-mini-HOWTO when writing, this is version 0.1. 11..44.. CCrreeddiittss aanndd TThhaannkkss Greets and thanks to All @ Exocore. Thanks for all the fish and fun folks! Biju Chacko first proposed that someone write this doc, so he is mainly responsible for any errors in this doc! In addition, the Development team from IT.COM was a great help - Sharat Chandra, Mithun, Sridar, Kartic, Sushant, Harish, Sibin thanks, folks. Greets also go out to UVCE2000. 11..55.. VVeerrssiioonnss The latest version of this document should be available at its Online home . This document was written using Lyx , a WYSIWYM editor and frontend for LaTeX. 22.. EEddiittoorrss An editor is a program that allows a user to enter, modify, save and restore text files. Text files are usually defined as files that store ASCII content. Some examples of editors on the Linux platform include vi, gvim, joe, emacs and so on. Some more detail is available at Zdwebopedia . 22..11.. WWhhaatt iiss AAnn EEddiittoorr In terms of system software, an editor is a basic element of the operating system that enhances the usability of the machine. An editor allows users to manipulate data at a slightly higher level of abstraction, to deal with elements such as documents without wondering too much about the details of how data is stored on the filesystem, how data can be represented on screen, etc. 22..22.. LLiinnee,, SSccrreeeenn,, GGrraapphhiiccaall Classically editors can be separated into some well established categories. They are: 22..22..11.. LLiinnee This is the oldest and most archaic form of editors. In this type, only a single line of text is allowed to be edited at a time. Examples abound, but are dangerous to name for fear of imitation. To be avoided at all costs. 22..22..22.. SSccrreeeenn In this, the most common form, a screen of text is displayed and the user can edit elements and the display is dynamically updated on screen. A user is able to move within the document and the screen will be correspondingly updated. These are usually rather fast and rather stable(Ever seen vi dump core ? ;) Examples: vi, emacs, joe 22..22..33.. GGrraapphhiiccaall In this most advanced form, there is a large (and excessive?) use of icons, the mouse and pretty colors. The net result is that if you are not careful, there is more eye candy and less productive. Most such editors tend to be WYSIWYG (What You See Is What You Get). Some large, well known and expensive editors are often preferred by Gooeey (GUI) users. Some are light and fast. Lyx is also a graphical editor, but WYSIWYM (What You See Is What You Mean). Plug for my favourite editor - gvim is != eye candy and it rocks! Examples: gvim, xemacs, abiword, WordPerfect 2000 22..33.. WWhhaatt kkiinndd ooff EEddiittoorr iiss rreeccoommmmeennddeedd?? Various kinds of editors can be written including barebones text versions with no color right up extremely sophisticated ones using graphics (GTK/Glade). The purpose of writing an editor is for me atleast to appreciate what the programming aspects might be. Towards that end, a simple text editor that is difficult to write will teach you much more than a sophisticated graphics based editor that is trivial to write. I believe it was Sharat Chandra from BMS who once proved that it required only two mouseclicks to write a functioning editor in GTK using Glade. You can't learn too much from two mouse clicks - no matter how hard you try. You can take a peek at Glade at the Glade homepage . 22..44.. WWhhaatt iiss tthhee ffuunnccttiioonnaalliittyy ttoo bbee pprroovviiddeedd?? 1. Edit +o Text - allow the poor user to enter text +o Menus - access functions via menus +o Cursor Movement - allow easy movement in the document +o Insert / Delete - toggle functionality while moving +o Commonly used keys, PageUp/Dn, Home/End - as normally used 2. File Management Ability +o Reasonably large files - file operations should work without running out of memory +o Save, load, Save as, Open, Close - typical functions that should be supported +o Print - should be easy enough ;) 3. Block commands +o Mark blocks - Dynamic marking, or static block begin / block end marking. Highlight marked blocks. +o Cut, Paste, Copy - Typical block functions 4. Edit commands +o Find, Replace, Find and Replace - Well known and obvious functions +o Spell Checker - Well..... look at the Unix commands for more help 5. Colors +o Please choose sensible color combinations - some taste please +o Bright colors are to be used with care - dark backgrounds and light forgrounds are usually ok. 6. Help +o Shortcuts - keyboard and otherwise +o Online help - F1 for help? +o Manual - Documentation explaining unusual features etc 22..44..11.. WWhhaatt aarree tthhee aaddddiittiioonnaall ccooooll tthhiinnggss II ccaann ppuutt iinn?? There are lots of nice features (!=bugs ;) that you could add which include. Note this could be a wishlist! +o Customization - Color, Fonts, Locales, Distributions? +o Mouse control - The use of the mouse along with the keyboard for input +o Multiple Windows - The ability to edit multiple windows of text at the same time. +o Man page - Conform to the standards for man pages 22..44..22.. WWhhaatt aarree tthhee rreeaallllyy rreeaallllyy ccooooll tthhiinnggss II ccaann hhaavvee ffuunn wwiitthh?? If you really have no better way of spending your time... Think about these and if you have more or have developed something cool, drop me a line. +o A screensaver - When you get bored of having nothing to type, it would help to have cute graphics appear! +o A game - When you really really get bored of the screensaver, it would help to be able to play some game and then get back to work...... if that helps that is! +o Ability to handle VERY large files - Some form of virtual memory must be used here, keeping most of the file on secondary storage and only keeping smaller parts of the entire text in actual phyisical memory. 33.. DDeevveellooppmmeenntt ooff aann EEddiittoorr oonn LLiinnuuxx 33..11.. GGCCCC GCC is the Gnu C/C++ compiler that ships by default with nearly every distribution. This is a excellant industrial strength compiler with lots of facilities and frontends for supporting different languages. To compile C or C++ code please use the respective commands ______________________________________________________________________ gcc .c g++ .cpp ______________________________________________________________________ For more information on gcc visit the GCC Home Page . Or on your Linux machine type the following at a command prompt or xterm window or whatever. ______________________________________________________________________ man gcc info gcc ______________________________________________________________________ 33..22.. GGDDBB GDB is the GNU Debugger that allows you to debug your code using breakpoints, watches, flags etc. This will also help you identify all those annoying Segmentation faults (seg-fault, core dumped!) that you will soon generate. For more information on gdb you can read up the GNU GDB Manual . Or on your Linux machine type the following at a command prompt or xterm: ______________________________________________________________________ man gdb info gdb ______________________________________________________________________ There is also another very pretty and useful frontend for GDB called DDD which usually ships with the Suse CD's. Most IDE's also contain some form of front end for GDB. 33..33.. MMaakkee This is a tool that maintains dependencies and solves the problems of how to cleanly compile, distribute and reuse a software. This software will be shipped by default on any Linux distribution. For more information on Make you can read up the GNU Make Manual . Or on your Linux machine type the following at a command prompt or xterm window or whatever: ______________________________________________________________________ man make info make ______________________________________________________________________ 33..44.. NNCCuurrsseess This is a library of functions that can be used to manipulate text elements on the screen. This library is large extensive and very well documented. This software will be shipped by default on any Linux distribution. For more information on Ncurses visit theNCurses Official Homepage . On your local machine type ______________________________________________________________________ man ncurses info ncurses ______________________________________________________________________ The man page lists all the routines that ncurses provides. For more help on each of the individual functions listed, you could get further information by using the man pages. For example, for information on the beep routine is specified as ______________________________________________________________________ beep curs_beep (3X) ______________________________________________________________________ Where beep is the actual routine and curs_beep is the name of the man page. For the man page for this function, type: ______________________________________________________________________ man curs_beep ______________________________________________________________________ 33..44..11.. UUccoonniioo..hh UConio.h is a wrapper written so as to help programmers who are used to and comfortable with the old conio.h style of programming. This is a project started by Pablo J Vidal and is available at his website. 33..55.. GGvviimm // GGlliimmmmeerr // EEmmaaccss These are supposed to be (!) text editors but usually can be effectively used as programming IDE's. All the three editors mentioned have syntax highlighting and intelligent macros, so most people do program using them directly. 33..66.. XXwwppee This is a very effective IDE for text mode programming in the Linux platform. For all the programmers used to the BorlandC interface, this is very similar to the classic Turbo/Borland C. This has been placed under GPL and is freely available. This software needs to be downloaded and is not usually not installed on a typical Linux machine. This would be the recommended way to program your editor. More information and the Xwpe binaries are available at the Xwpe homepage. Xwpe is a small download (xwpe-1.5.23a is 215K)and certainly worth the effort. 33..77.. KKDDeevveelloopp This is a fine graphical IDE for programming and supports a number of high end features including CVS, automatic makefile generation and other useful playthings. This software is by default installed on the newer distributions (RH7.0, Suse7.0). KDevelop is a large download - you are warned. For more information on Kdevelop visit the KDevelop homepage. 44.. IImmpplleemmeennttaattiioonn DDeettaaiillss Here are all the knotty issues of how to actually make the hard decisions and start work. This section is highly inflammatory, so if you don't agree with what I say, feel free to go ahead and do whatever you like. You are perfectly entitled to your ridiculous opinions. ;). Jokes apart, these are *my* opinions and YMMV (Your Mileage May Vary). 44..11.. LLaanngguuaaggee:: If you are sure you can use either C or C++, only then a choice exists. Most of the time, however C is used for writing a text editor. The C++ libraries for Ncurses are available. If you are interested in writing object oriented programs, remember that is equally possible to write good object oriented programs in C. My personal recommendation is that you use C and it will expose you to a distinctly elemental and therefore tougher style of programming. C++ is my favourite language and I must say that it makes even difficult things easy. The best way in my opinion to learn object oriented programming would be to use C and object oriented concepts. However, opinions differ. Writing an editor is both a labour of love and a learning process. Neither love nor learning, I must note, are enjoyed if they are too easy. 44..22.. DDaattaa SSttrruuccttuurreess:: Typically, most editors created used a linked list in one form or the other. However, there are a large variety of data structures used in the heart of the linked list. A partial summary of these are: 44..22..11.. CChhaarraacctteerrss Each node in the linked list holds a single character of data. Representation and operations such as insert and delete are simple to write. The structure is resonably fast to display. Evaluation : Most Flexible. Very expensive in terms of memory usage 44..22..22.. WWoorrddss Each node in the linked list holds a single word. Since a word is of undertermined lenght, each node in the linked list must lead to a dynamically allocated memory area. Operations are somewhat simple to execute. Translation of this format to display is complex and irritating. Evaluation : Gross 44..22..33.. SSttaattiiccaallllyy AAllllooccaatteedd LLiinneess ooff TTeexxtt Each node in the linked list holds a character array of a fixed size. Each such line maps directly onto a line to be displayed on screen. Operations are sometimes difficult to code. Translation of this format for display is fast and easy. Evaluation : Effective, fast. Slightly expensive in terms of memory. 44..22..44.. DDyynnaammiiccaallllyy AAllllooccaatteedd LLiinneess ooff TTeexxtt Each node in the linked list holds a pointer which dynamically stores the content of each line. Each line maps directly onto a line to be displayed on screen. Operations are usually rather difficult to code. Translation of this format to the display is fast, easy. Another major advantage is that if the screen is resized, this structure can adjust without too much problems. Evaluation : Effective, fast. Optimum memory, but more difficult to code. 44..22..55.. PPaarraaggrraapphhss Each node in the linked list holds a pointer which dynamically stores the content of a paragraph. Each paragraph may or may not map to areas on the screen. Operations are usually rather complex. Translation of this format is not simple. Evaluation : Urgh. Need I say more? 44..22..66.. PPaaggeess Each node in the linked list holds a pointer which dynamically stores the content of a page at at time. Each page may or may not map to the display. Operations are complex. Translation of this format to the display is also difficult. The one advantage is that this format allows for the use of virtual memory and managing very large files. Evaluation : Slow. Difficult to code and modify. 44..33.. WWhhiicchh DDaattaa ssttrruuccttuurree sshhoouulldd II uussee?? Depends on how much fun / work / pain you want to have while writing your editor. You could any one of the ones listed above and write a excellent editor. I would however suggest that either the static / dynamic line based linked list is ideal in terms of complexity and efficiency. If you have a particularly interesting data structure that is rather different from the ones listed above, send it in to me, I'll add it to the list above! 44..44.. SShhoouulldd II uussee UUccoonniioo..hh One word answer. NO. Conio.h was described best by a senior member of Linux India who called it a ugly hack. And that is a fine description. Uconio.h are just plain wrappers to Ncurses. By the time you can convert functions to Uconio, you might as well have learnt ncurses and got some platform independence in the bargain. Not too many functions are supported by Uconio and you may not be all that well versed with Borland's API to the screen in any case. Ncurses is the recommended solution. 55.. OOppeenn SSoouurrccee Considering that nearly all the tools that you are using are either Free Software or Open Source, you may want to also add to the pool of software by placing your editor / whatever into the open domain. 55..11.. LLiicceenncceess There are a variety of software licences that are available. The GPL or the GNU Public Licence is the most favoured free software licence. There are also other licences either termed Open Source or BSD style licences. If you are interested in the philosophy behind free software and other such esoteric subjects please visit the GNU Philosophy Page . Another interesting site that would be of interest would be ESR's website where some information on Open Source (as a different stream from Free Software) is available. 55..22.. RReelleeaassiinngg yyoouurr eeddiittoorr If you want other people to use your software, you will need to release in a fashion that is usable without too much of trouble. Apart from a simple format like .tar.gz consider creating a RPM or .DEB file. Once you are done with enough testing to consider your editor stable enough, you may (at your own risk!) want to post it the grand daddy of all Linux/Unix software references. This is the venerable Freshmeat . In addition, you should maintain your own web page for the software with help, instructions and other general information (a photograph of you could be appropriate on the page, not in the software!) 66.. HHeellpp If you are looking for help, there are tons of places to turn to, though not all of them will immediately be accessible. Remember that Unix is actually really user friendly, except that it chooses it's friends with GREAT care. Please don't mail me asking me how to write your editor, that's precisely why I put this up! My own editor is here but it was written for DOS and is old AND unmaintained code. The source is available though if you want to play with it and look at the architecture and poke fun at it as well. 66..11.. BBooookkss These books are generic books that are of use in programming on Linux. The first book is a gentle and skin deep introduction to a large number of the elements of Linux programming and is highly recommended for newbies to buy. It has a section on ncurses, but is severly limited. The rest of the books are mainly solid references that you should have lying around, but not the ideal books to start learning anything. +o _B_e_g_i_n_n_i_n_g _L_i_n_u_x _P_r_o_g_r_a_m_m_i_n_g, Richard Stones, Neil Mathews, Wrox Publishers +o _U_n_i_x _N_e_t_w_o_r_k _P_r_o_g_r_a_m_m_i_n_g, W Richard Stevens, Prentice Hall (PH) +o _A_d_v_a_n_c_e_d _P_r_o_g_r_a_m_m_i_n_g _i_n _t_h_e _U_n_i_x _E_n_v_i_r_o_n_m_e_n_t, W Richard Stevens, PH +o _T_h_e _D_e_s_i_g_n _o_f _t_h_e _U_n_i_x _O_p_e_r_a_t_i_n_g _S_y_s_t_e_m, Maurice J Bach, PH +o _T_h_e _C _p_r_o_g_r_a_m_m_i_n_g _l_a_n_g_u_a_g_e _(_2_n_d_), Brian W. Kernighan and Dennis Ritchie, PH +o _T_h_e _C_+_+ _p_r_o_g_r_a_m_m_i_n_g _l_a_n_g_u_a_g_e _(_3_r_d), Bjarne Stroustrup, Addison Wesley Programming +o _P_r_o_g_r_a_m_m_i_n_g _i_n _P_e_r_l _(_3_r_d_), Larry Wall and Randal L Schwartz, O'Reilly +o _P_r_o_g_r_a_m_m_i_n_g _P_e_a_r_l_s, John Bentley +o Any book by _D_o_n_a_l_d _E_. _K_n_u_t_h +o _A_l_i_c_e _i_n _W_o_n_d_e_r_l_a_n_d_, _L_e_w_i_s _C_a_r_r_o_l If your favourite book is missing from here, send me a note telling me why is should be there and I'll see if it can be fit in somewhere as well. 66..22.. LLiissttss Linux India has three lists which are meant for three different reasons. These are 1. Linux India Help or LIH meant for configuration and help 2. Linux India General or LIG meant for non technical and other issues about Linux 3. Linux India Programmers or LIP meant for programming problems. Unless for very extenuating reasons, LIP should be the only list on which you should ask programming questions. For more information, please go to the main Lists page. And oh yes, before asking a question please read the archives. And before you post you might want read up on some etiquette at at Biju's etiquette help . 77.. UURRLLss +o A Well written introduction to Ncurses with C +o ESR's article in LJ on Ncurses +o The Linux Programmer's Bouncepoint +o Linux Documentation Site +o RPM Search +o GNU Main Page +o The Linux Kernel Mailing List - Kernel Traffic +o Programmer of the Month Contest If you have been to a URL that you felt was particularly helpful and useful, perhaps you could share that information. Mail the URL to me and I shall put it up here as well.